b) Create a single array St such that each MDGA will use a subarray of it.
c) Each MDGA computes (using an add-scan) the starting point within St for its array, and stores this in ArrDst.
Thus the sources are stored in an array ESrcArr of MDGA numbers (here, 3) and an array ESrcPos of indices within the corresponding MDGA's states (here, 0).
The destinations are stored in an array EDst of offsets into St marking where the new MDGA arrays start and in array ESrcPos.
For each MDGA i, the time for copying each of its |[St'.
3) Move the new states into the MDGA arrays such that the load is evenly distributed among the processors, as Figure 27 illustrates.
The destinations are stored in an array EDstArr of MDGA numbers and an array EDstPos of indices within the corresponding MDGA's states.
So, we use a queue (implemented with an MDGA, but with queue operations) of environment bindings, decrementing k [multiplied by] q reference counts each step, for some constant k.
This queue of threads could be implemented by an MDGA, although Grit and Page  used a less efficient binary tree data structure for this.
a) Determine which MDGAs need larger arrays, and consider only these for the remainder of this step,