(1) Solution of [A.sub.22][X.sub.21] = [Alpha] [B.sub.21] and [B.sub.21], is overwritten by [X.sub.21] (TRSM)
(2) Solution of [A.sub.22][X.sub.22] = [Alpha] [B.sub.22] and [B.sub.22] is overwritten by [X.sub.22] (TRSM)
(5) Solution of [A.sub.11][X.sub.11] = [B.sub.11] and [B.sub.11], is overwritten by [X.sub.11] (TRSM)
(6) Solution of [A.sub.11][X.sub.12] = [B.sub.12] and [B.sub.12] is overwritten by [X.sub.12] (TRSM)
Therefore, TRSM can be computed as a sequence of triangular solutions (TRSM) and matrix-matrix multiplications (GEMM).
As soon as the matrices are large enough compared with the block size, GEMM represents a high percentage of the total number of floating-point operations required by the blocked version of TRSM. For example, when the triangular system is "Left," assuming that NB divides m and n exactly (so that the number of blocks is m/NB x n/NB), the percentage of operations spent in GEMM is equal to 1 - NB/m.