Iterative Solution of Linear Systems
Researching and developing iterative linear solvers, including multigrid and multilevel methods.
Work in FASTMath in the area of iterative solution of linear systems focuses on commonly used methods such as algebraic multigrid methods, Krylov subspace methods, and lower level kernels that can be used in many different packages. Efforts have improved both the solver algorithms themselves and the implementation of those algorithms.
New Matrix Capabilities. We designed and implemented a new structured-grid rectangular matrix capability in hypre to support the development of semi-structured linear solvers that can improve performance for applications that have structure to be exploited. We have developed a composite matrix class for AMR (multilevel) problems in Chombo, and the construction of linearization of Chombo's block structured AMR operators, for use in PETSc's algebraic solvers, such as algebraic multigrid and direct solvers.
Improvement of Key Matrix Kernels. We have developed parallel graph coloring algorithms, within KokkosKernels, for multicore and manycore architectures. These thread-scalable colorings do not sacrifice the quality of the coloring. The colorings are used for multicolored Gauss-Seidel smoothers and preconditioners for Intel Xeon Phi and GPU architectures. This is used as the smoother in the MueLu algebraic multigrid package. This is also the preconditioner for the Nalu application (wind energy). A new thread-scalable, hierarchical algorithm was also developed for the sparse matrix-matrix multiplication kernel. This is key to the performance of multigrid setup for several applications such as FELIX/Albany, Nalu, and Drekar (MHD).
Performance Improvements in Algebraic Multigrid. We developed and implemented novel communication-reducing techniques within hypre for algebraic multigrid solvers that can speed up linear solve times in parallel by more than 2x. These include: a non-Galerkin approach that sparsifies coarse-grid matrices without sacrificing convergence; and a multi-additive AMG algorithm that exploits a theoretical result to inherit the parallelization benefits of additive MG and convergence of multiplicative MG. We provided a parallel implementation of an AMG algorithm for elasticity problems that interpolates rigid body modes leading to much increased scalability. We made several threading improvements to our AMG solvers that significantly speed up performance with OpenMP. These improvements have benefited the Lattice QCD application in SciDAC.
Robust Interpolation and Restriction Operators in AMG. We are developing efficient and robust algebraic multigrid (AMG) algorithms for solving linear systems that are often found challenging for standard AMG methods. Examples include systems derived from linear elasticity problems and nonsymmetric systems derived from convection-dominated convection-diffusion problems. In our new AMG algorithms, we focus on constructing interpolation operators or, alternatively, restriction operators that can be more robust compared with those obtained from classical Ruge-Stüben AMG. Our new interpolation algorithm is akin to AMGe and element-free AMGe methods, where the interpolation weights are computed by solving local algebraic equations that incorporate predefined near null space vectors. Our restriction algorithm, called local approximate ideal restriction (AIR), is generated by locally approximating the "ideal'' restriction operator in combination with simple interpolation matrices of low complexities. The work on AIR is performed in collaboration with the University of Colorado at Boulder. Numerical results for the aforementioned linear systems demonstrate the superiority of the new solvers in terms of both convergence rates and parallel efficiency.
Non-Galerkin AMG. The work  introduced a new group of non-Galerkin methods that systematically remove entries from coarse-grid matrices after the hierarchy is formed, instead of during the hierarchy setup phase, as the first non-Galerkin algorithm did. We explored this option because if the heuristic identifying unimportant entries from the first non-Galerkin algorithm is used too aggressively, then AMG convergence can suffer. Thus, this new group of non-Galerkin methods counteracts this, by retaining the original Galerkin hierarchy and allowing entries to be reintroduced into the hierarchy if convergence is too slow. This enables a balance between communication cost and convergence, as necessary. The most promising of these new non-Galerkin methods (called "hybrid Galerkin" in ) allows solve phase speedups of around 25%-50% for the considered test problems.
 A. Bienz, R.D. Falgout, W. Gropp, L.N. Olson and J.B. Schroder, "Reducing Parallel Communication in Algebraic Multigrid Through Sparsification," SIAM J. Sci. Comput., pp. 332-357. 38 (2016).
Sandia National Labs, P.O. Box 969, Mailstop 9159, Livermore, CA 94551-0969
Division of Mathematics and Computer Science, Argonne National Laboratory, 9700 South Cass Avenue, Building 240, Argonne, IL 60439