** Accelerating quantum transport simulations on massively parallel computing architectures; **Mauro Calderara, Sacha Brück, Pierre Ferry, Mathieu Luisier (ETH Zurich)

The continuous reduction of the transistor size and the increase of their count per integrated circuit have both contributed to a significant improvement of portable electronic devices such as laptops, cell phones, or digital camera. As a consequence of this miniaturization process, the dimensions of the currently manufactured transistors do not exceed a couple of nanometers, their active region is composed of a countable number of atoms, and quantum mechanical effects have started to strongly influence their behavior. To accurately predict the characteristics of not-yet-fabricated nano-transistors, there is a strong need for simulation approaches that simultaneously capture their material and transport properties at a quantum mechanical level.

Density-functional theory (DFT) represents the most accurate method to satisfy this demand, but its computational burden usually restricts its application to small atomic systems. Empirical models such as tight-binding (TB) allow for the simulation of larger structures and the inclusion of complex scattering mechanisms but they lack predictability when surface effects become important or when new material combinations should be investigated. Although ab-initio and empirical quantum transport (QT) approaches have different domains of application, they both need large computational resources to efficiently run and can therefore benefit from massively parallel computing architectures.

To simulate electron transport through a nanostructure, the Schroedinger equation must be solved with open boundary conditions (OBCs) for different electron energies that are independent in the ballistic case but tightly coupled in the presence of electron-phonon scattering for example. In tight-binding computing the OBCs is facilitated by the small bandwidth of the Hamiltonian, shifting the main computational challenge to the solution of the Schroedinger equation, either in a so-called Wave Function (WF) formalism[1] or in the well-known Non-equilibrium Green's Function (NEGF) framework. In DFT transport calculations the OBCs induce the highest computational costs as the non-hermitian generalized eigenvalue problems associated with them are non-trivial to parallelize and generally outweigh the solution of the Schroedinger equation in terms of computational cost. The highly parallel FEAST algorithm[2] is very well-suited to fulfill this task.

As key results we will present DFT- and TB-based quantum transport simulations of realistic nano-devices as well as algorithmic innovations leveraging graphical processing units (GPUs). A single computer aided design tool will be used for that purpose. Combined with the CP2K[3] package, it can perform ab-initio transport simulations of nanowire transistors. In standalone configuration, it can treat a wide range of nanostructures at the tight-binding level.

The Cray XC30 Piz Daint at CSCS has been chosen as benchmark platform. We will show that on this system, the ballistic simulation of large nanowire and ultra-thin-body transistors expressed in a TB basis can be accelerated by a factor of 2.5 when the 8 cores per node of Piz Daint are assisted with one GPU. With electron-phonon scattering, speed ups larger than 40 can be reached when comparing 1 CPU with 1 GPU in the NEGF formalism. In the DFT case, a supercomputer like Piz Daint enables quantum transport simulations of nanowires with more than 10'000 atoms. Furthermore, due to the extremely good scalability of the FEAST algorithm, up to 128 threads or 2 GPUs can be assigned to the same energy point. Knowing that a typical device simulation includes more than 1000 such points, it appears that one single DFT-based QT run has the potential to scale up to the full dimensions of Piz Daint.

[1] M. Luisier et al., Phys. Rev. B 74, 205323 (2006). [2] E. Polizzi, Phys. Rev. B 79, 115112 (2009). [3] J. VandeVondele et al., Comput. Phys. Commun. 167, 103 (2005).