The Performance in Millions of Floating-Point Operations per Second per Processor of 4 Parallel Dense-Matrix Subroutines in PESSL.
Table II shows that the performance of our factorization algorithm on the SP2 is excellent compared to the performance of other distributed densematrix computations in PESSL, which is shown in Table III.
Both algorithms were linked with the same libraries, namely the PESSL implementation of the BLACS, the ESSL implementation of nodal level-3 BLAS, and the LAPACK implementation of nodal factorizations.
Presently, only our wide-band algorithm is implemented in PESSL.
Thanks to John Lemek of the IBM Power Parallel Division for his efforts to incorporate the solver into PESSL.