How it works

The test estimates a respondent’s receptive vocabulary — the number of words that can be recognized in reading and listening. Measuring this precisely would require checking a person’s knowledge of tens of thousands of words one by one, which is unrealistic. Instead, we use Item Response Theory (IRT), a modern framework for designing and scoring tests.

In IRT, a respondent’s vocabulary size is treated as a latent trait that can be represented by a number. The test presents words of various difficulties and asks whether the respondent knows them. For example, “cat” is very easy, while “recusant” is very difficult. Word difficulty strongly correlates with how often a word is encountered. IRT provides the mathematical foundation for estimating a respondent’s ability from their responses. Once we know a respondent’s ability and the difficulty level of every word in our database, we can estimate the probability that the respondent knows each word. By summing these probabilities, we obtain an estimate of the respondent’s total vocabulary size.

To make the test both quick and precise, we use Computerized Adaptive Testing (CAT). After each response, the system updates the respondent’s estimated vocabulary size and selects the next word so that its difficulty is close to the respondent’s current ability level. This ensures that each test item provides the maximum possible information. The estimate becomes more accurate with every step, and the test finishes automatically once the required level of precision is reached.

Word difficulties

Our database contains more than 600 calibrated test words whose difficulties were estimated directly from test-taker data. The remaining words have difficulty values predicted using machine-learning models. These predictions draw on multiple reliable linguistic resources, each capturing a different aspect of word usage:

Unit of measurement

The test reports vocabulary in word families. A word family includes a base word, its regular inflections, and its derived forms, following the criteria described in Bauer & Nation (1993). For example, limit, limitation, limitations, limited, limiting, limitless, limitlessly, limits, unlimited all belong to the same family. Our database contains 25,000 word families.

CEFR thresholds

To estimate CEFR levels from vocabulary size, we combined graded word lists from three reputable sources:

  1. GSE Teacher Toolkit
  2. English Vocabulary Profile
  3. Oxford 3000 and 5000 word lists

These sources allowed us to estimate how many word families a learner is expected to know at levels A1–C1. For the C2 threshold, we used the vocabulary size corresponding to the 25th percentile of adult native speakers, based on data from the myVocab vocabulary test.

Reliability

To make sure each result is trustworthy, we run several checks:

  1. Non-word traps. Every test includes a few made-up words. If a respondent marks too many of them as known, the result is flagged as unreliable.
  2. Multiple-choice follow-ups. When a respondent says they know a word, they may be asked to choose its correct meaning from four options. Too many mistakes make the result unreliable.
  3. Answer pattern check (in progress).
  4. Convergence and consistency check (in progress).

These checks do not change the vocabulary estimate itself; they simply indicate whether the final result can be trusted.