Vocabulary test: how it works

The test estimates receptive vocabulary — a number of words you recognize in reading and listening. The only way to do it precisely is to take a thick vocabulary (with many hundreds of thousands of words) and check whether you know every word one by one. Well, nobody wants to do that. However, there is a better way thanks to Item Response Theory (IRT) — a modern paradigm for the design, analysis, and scoring of tests. According to this paradigm, we assume that your vocabulary is a latent trait, or ability, which can be expressed as a number and measured. The measurement consists of a series of test words of various difficulties, which can be marked as known or unknown. For example, the word “cat” has low difficulty, while “recusant” — on the opposite, has very high difficulty. The difficulty scale is closely related to how frequently we see, hear, or use these words. IRT gives a mathematical prescription on how to calculate one’s ability based on responses to a set of test items of various difficulties — and that is exactly how we do it.

To make the test quick and precise, we use Computerized Adaptive Testing (CAT) technique — another standard in the world of modern testing. We calculate your vocabulary after each response to a test word. Then, we choose the next test word, so it is not too easy or too hard — in this way we maximize the amount of information each test item contributes to the test. The precision of the vocabulary calculation gets better with each step; the test stops when it reaches a certain threshold.

Frequency data

To calculate test word difficulties, we used word family lists created by Paul Nation based on frequency data from two language corpora – BNC (British National Corpus) and COCA (Corpus of Contemporary American English).

What is a unit of measurement?

The test estimates vocabulary in word families. Each word family includes a base word, all its regularly inflected forms, and all words which can be made from the base word using common affixes. These criteria are based on L. Bauer, P. Nation, Word Families, Int. J. Lexicogr. 6 (1993). For example, words limit, limitation, limitations, limited, limiting, limitless, limitlessly, limits, unlimited belong to the same word family.

How easy is it to fool the test?

There are two types of checks. First, there are some non-words among the test items. Second, if you mark a test word as known, you might be asked to clarify its meaning by choosing between 4 definitions. At the end, we calculate attention index with a simple formula (x+y)/(ax+ay), where x is a number of non-words marked as unknown, ax is a total number of presented non-words, y is a number of multiple-choice questions answered correctly, and ay is a total number of presented multiple-choice questions. The final vocabulary estimation is not affected by the attention index. The index is only used to decide whether the response data are valid and can be used in our research.

In a nutshell

The test is designed using Item Response Theory (one-parameter model) and Computerized Adaptive Testing. Bayesian Expected A Posteriori (EAP) estimator is used on each step of the test. Joint Maximum Likelihood is used to calculate difficulties of the test items. The website’s backend is written with Python and Flask, the frontend is written in Vanilla JS and uses Bulma.

How it works

Frequency data

What is a unit of measurement?

How easy is it to fool the test?

In a nutshell