Linear Algebra: Packages

One popular application of supercomputers is the use of their computing power to solve large systems of linear equations. To that end, several program libraries have been developed that allow computers to perform common operations in linear algebra. Below are some popular linear algebra software packages in use.

BLAS
LINPACK
LAPACK

BLAS

BLAS (Basic Linear Algebra Subprograms) is an API standard for publishing libraries to perform basic linear algebra operations such as vector and matrix multiplication. They are routines that provide standard building blocks for performing basic vector and matrix operations. Each routine has a name which specifies the operation, the type of matrices involved and their precisions.
Numerical linear algebra, in particular the solution of linear systems of equations, linear least square problems, eigenvalue problems and singular value problems, is fundamental to most calculations in scientific computing. Computer scientists have frequently chosen to implement certain low level operations (such as dot product or the matrix vector product) as separate subprograms. This approach encourages structures programming and improves self-documenting qualities of software (by forcing it to specify basic building blocks). BLAS is a library of all of those basic routines, a library that can be used by for performance reasons for personal code and also in order to develop a software around a core of common routines. BLAS consists of three levels:

Level 1:

Level 1 BLAS performs scalar, vector and vector-vector operations.
$Level 1$

Level 2:

Level 2 BLAS performs matrix-vector operations.
$Level 2$

Level 3:

Level 3 BLAS performs matrix-matrix operations. (This level contains the GMM algorithm on which many high quality softwares rely).
$Level 3$

Because the BLAS are efficient, portable, and widely available, they are commonly used in the development of high quality linear algebra software, (for example- LAPACK). The library provides a low-level layer which corresponds directly to the C-language blas standard, referred to here as “cblas”, and a higher-level interface for operations on GSL vectors and matrices.

Information for this section was taken from the following two links:
The BLAS Technical Forum- including introduction, libraries, references and interfaces for BLAS can be found here.
An example of a GNU implementation of BLAS in C can be found here.

LINPACK

LINPACK, originally written for Fortran, is a library of linear algebra functions and subroutines. It was created in the 1970s by Jim Bunch, Jack Dongarra, Clee Moler, and Gilbert Stewart. Intended for use on supercomputers, LINPACK solves systems of linear equations using matrices and numerical methods. Examples of included subroutines include Gaussian elimination, finding the determinant/inverse of a square matrix, and transformations into diagonal matrices. These algorithms use column-based operations to take advantage of locality of reference. LINPACK relies heavily on the General Matrix Multiply (GEMM) algorithm in the BLAS package.

LINPACK for FORTRAN can be found here.

LINPACK Benchmarks

The software package has been used in tests to determine the speed of a supercomputer. The benchmarks measure the amount of floating-point operations (multiplications and additions) performed by the computer per second (FLOPS). Early versions of the benchmarks include the LINPACK 100 and 1000, where computers solved a system of n linear equations for a total of 2n³/3 + 2n² operations, where n = 100 and 1000, respectively. LINPACK 1000 allows changes to the algorithms used, as long as calculation accuracy remains the same.

For parallel computers, another benchmark is available, called the High Performance Computing Linpack Benchmark, or HPLinpack. Though there are some restrictions to the algorithms that can be used, different implementations of the benchmark can be used and the size of the problem itself can be expanded as needed to achieve optimal results. After the test is complete, results include the size of the problem used and the maximum number of calculations achieved per second, in FLOPS.

Top500 uses this benchmark in its list of the 500 fastest supercomputers in the world. Although they admit no one statistic can measure the peak overall performance of any one supercomputer, the LINPACK benchmark allows them to examine the "performance of a dedicated system for solving a dense system of linear equations." They allow the use of different types of algorithms, excluding those that express their answers less precise than the IEEE 64-bit floating point standard. For more information on how Top500 uses LINPACK, visit their page.

A popular implementation of the benchmark, called HPL ("Highly Parallel LINPACK"), has been made available by Dongarra et. al. To use it, computers must have an implementation of MPI, as well as either BLAS or VSIPL, the Vector Signal Image Processing Library. For more details about HPL, see their implementation page.

LAPACK

LAPACK (Linear Algebra PACKage) is a Fortran library of linear algebra functions. It was originally written in 1992 to replace LINPACK and EISPACK in then-modern vector computers with shared memory. This is done by limiting the amount of data movement by using block matrix operations such as matrix multiplication. Similarly to LINPACK it was written to solve simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems. It has since been improved upon for distributed memory systems in ScaLAPACK and PLAPACK.

LAPACK vendors' implementations can be found here.

LAPACK compared to LINPACK and EISPACK

LAPACK was created with the purpose of optimizing LINPACK and EISPACKs performance on shared-memory vector computers and parallel processors which LINPACK was not build around. LINPACK is not efficient on modern architectures because their memory access patters disreguard multi-layered memory hierarchies of the machine, and instead spend too much time moving data. LAPACK optimizes this by changing the algotithms to use block matrix multiplication in the time consuming loops. These block matrix operations are then optimized for each individual architecture. Although this makes the code less portable it becomes much more efficient.

LAPACK routines also relies on more BLAS calls than their LINPACK counterpart. These added calls are use level 3 BLAS routines. These are fortran subprograms designed to perform matrix multiplication. Level 1 BLAS calls used in LINPACK are left unmodified. The reason being BLAS is very well coded, most high performance computers have well coded BLAS implimentations written specifically for that hardware.