The prompt-specific bundles were identified through a manual check for the overlaps between bundle components and the content words that were unique in the CELPIP writing prompts.
The corpora of higher proficiency levels (CELPIP Levels 7 and 10) yielded more lexical bundles in terms of both type and token, compared with the corpus of the lower proficiency level (CELPIP 4), while the differences between these two higher proficiency levels were less salient (see the second column of Table 2).
Excluding the lexical bundles that appeared in two or three of the lists, we identified 14 unique bundles each at CELPIP Levels 4 and 10, and 6 at CELPIP Level 7.
A closer look at the unique bundles used at CELPIP Levels 4 and 10 groups suggests some differences in their writing, as the unique bundles at CELPIP Level 10 seem to be more polite and formal as shown in do not hesitate to, at your earliest convenience, I would greatly appreciate, whereas the ones in CELPIP Level 4 appear to be more casual as in how are you, have a nice day, if you don't, and because I want to.
The stance bundles showed almost the same percentages across the three proficiency levels (CELPIP Level 4: 32%, CELPIP Level 7: 31%, and CELPIP Level 10: 31%).
Figure 1: Distribution of lexical bundle types across proficiency levels (percentage of tokens) CELPIP 4 CELPIP 7 CELPIP 10 Stance 32% 31% 31% Referential 6% 8% 8% Discourse organizer 26% 26% 28% Other 36% 35% 33% Note: Table made from bar graph.
Out of the available data, two corpora of spoken and written test-taker-generated texts were created through random quota sampling to represent a balanced range of CELPIP levels of performance.
The overall rater judgements of CELPIP levels for performance for each speaking or writing sample was obtained.
In general, meaningful relationships between the eight LFP measures and rater judgements of CELPIP levels of performance emerged from the data.
The first set of correlational analyses were connected to the tokens (total number of words) and types (total number of different words) in the speaking samples and their relationship with rater judgements of CELPIP levels of performance.
The second set of analyses examined the relationship between the percentage coverage of the speaking sample texts by vocabulary frequency estimates of HFV, MFV, and LFV and rater judgements of CELPIP levels of performance.
The next set of correlational analyses were carried out to examine the relationship between the lexical stretch--that is the lowest frequency band (with lower frequency bands at the upper end of the scale of 25 bands, e.g., Band 1 represents the highest frequency word families and Band 25 represents the lowest frequency word families)--accessed to reach 98% coverage of a speaking sample and the lowest frequency band accessed overall, with rater judgements of CELPIP levels of performance.