BRAVE for researchers

BRAVE — the Brief Receptive Adaptive Vocabulary Evaluation — is a two-minute adaptive test of receptive English vocabulary designed for researchers who need a short, psychometrically rigorous measure across a wide range of English users.

BRAVE works for learners, native speakers, bilinguals, heritage speakers, and mixed L1/L2 groups, and places respondents on a common receptive vocabulary scale. To make results interpretable, it reports estimated vocabulary size in word families rather than raw scores, and includes built-in reliability checks for unsupervised online administration.

At a glance

Construct Receptive English vocabulary breadth
Target users English learners, native speakers, bilinguals, heritage speakers
Ability range Beginning learners to highly proficient native speakers — common L1/L2 scale
Administration Online, unsupervised
Typical time About 2 minutes; approximately 35–40 items
Method Item Response Theory + Computerized Adaptive Testing
Output Estimated vocabulary size in word families + reliability flag
Reference lexicon 28,276 English word families
Validation sample 40,000+ respondents; A1–C2 learners and native speakers aged 10–77
External validity Spearman’s ρ ≈ .76 with self-assessed CEFR; ρ ≈ .57–.70 with IELTS, TOEFL, and Cambridge exams for learners

What BRAVE measures

BRAVE measures breadth of receptive English vocabulary: the number of English word families a respondent is likely to understand. It focuses on written receptive form–meaning knowledge — respondents see a written English word and indicate whether they know its meaning.

Results are reported in word families, following Bauer and Nation (1993). A word family groups a base word with its closely related inflected and derived forms, such as limit, limits, limited, limiting, and limitation. Vocabulary-size estimates are based on a reference lexicon of 28,276 English word families.

Methodology

BRAVE is built on a foundation of Item Response Theory and Computerized Adaptive Testing. After each response, the respondent's ability estimate is updated, and the next item is selected near their current estimated level, where it is expected to provide the most information. The test continues until the target precision is reached or the maximum item limit is met. This design keeps administration time short (about 2 minutes, ~35–40 items) while maintaining constant precision across a broad ability range. The latent ability estimate is then converted into an estimated vocabulary size in word-family units.

BRAVE uses three item types:

  • Yes/no items — the primary item type. Respondents indicate whether they know the meaning of a written English word.
  • Multiple-choice follow-ups — used for a subset of words that respondents claim to know, to check whether they can recognize the correct meaning.
  • Pseudowords — plausible-looking nonwords used to help flag overclaiming or inattentive responding.

Main output: estimated vocabulary size in word families and a session reliability flag.

Detailed output: 95% confidence range for the vocabulary-size estimate, response times, and number of reliability checks triggered.

Psychometric evidence

Validation

The validation study was based on more than 40,000 L1 and L2 respondents, including English learners spanning the A1–C2 CEFR range and native speakers aged 10–77.

For learners, BRAVE scores aligned strongly with self-assessed CEFR level (Spearman’s ρ ≈ .76) and with major English proficiency exams, including IELTS, TOEFL, and Cambridge exams (ρ ≈ .57–.70). For native speakers, vocabulary estimates increased systematically with age and education level, as expected.

A validation paper describing BRAVE’s design and validation is currently in preparation.

Precision

BRAVE targets approximately constant precision on the latent ability scale: 0.4 logits across the target ability range of approximately −10 to 10 logits.

In word-family units, uncertainty varies across the scale. For most of the ability range — approximately B2 CEFR and higher — the standard error of measurement (SEM) is in the 600–700 word-family range.

Reliability

BRAVE treats reliability primarily as precision for each individual session, rather than as a single overall reliability coefficient. The adaptive algorithm continues until a predefined precision target is reached, so respondents across the ability range are measured with similar uncertainty on the latent scale. In the validation study, 99.3% of sessions reached the target precision before the maximum test length. Short-term repeat testing also supported the stability of BRAVE scores.

Because BRAVE is intended for unsupervised use, it includes session-level reliability checks. Pseudowords and multiple-choice follow-ups help flag possible overclaiming, partial knowledge, or inattentive responding, identifying sessions that should be interpreted cautiously or excluded from analysis.

How to interpret the score

BRAVE reports vocabulary-size estimates in English word families. The score is an estimate, not an exact count of the word families a respondent knows.

Its main advantage is that the number can be compared with meaningful reference points. For English learners, BRAVE scores can be compared with vocabulary-size distributions across CEFR levels, giving an approximate sense of where a respondent falls relative to other learners. The score can also be related to common vocabulary benchmarks, such as graded word-family lists and lexical coverage estimates. A detailed interpretation guide is in preparation.

For native speakers, BRAVE scores can be compared with age-based distributions to describe a respondent’s approximate standing relative to other native speakers of a similar age. These comparisons are useful, but they should not be treated as formal population norms.

Reference distributions for both learners and native speakers are available here: BRAVE results.

Fit for your study

Use BRAVE when you need to

  • include receptive vocabulary as a covariate in a larger test battery
  • obtain a proxy for verbal ability in native speakers or language proficiency in learners
  • measure vocabulary in settings where administration time is limited
  • compare English learners and native speakers on a common vocabulary scale
  • work with heterogeneous samples that include learners, native speakers, bilinguals, or heritage speakers
  • screen participants by vocabulary level
  • describe the language background of participants more precisely than with self-report alone
  • study vocabulary in relation to reading, education, cognition, age, language background, or academic outcomes
  • collect vocabulary data in online, remote, classroom-independent, or large-sample settings
  • use an interpretable vocabulary-size estimate rather than a raw score or percent-correct score

Do not use BRAVE when you need to

  • make high-stakes decisions, such as certification, admission, hiring, promotion, or formal placement
  • use results in settings where participants have strong incentives to inflate their score
  • replace a comprehensive English proficiency exam such as IELTS, TOEFL, Cambridge, or an institutional placement test
  • assess productive vocabulary, speaking, writing, listening, grammar, or reading comprehension directly
  • diagnose specific language difficulties or make clinical, psychological, or educational diagnoses
  • measure knowledge of specialized vocabulary, such as academic, medical, legal, technical, or domain-specific terminology
  • obtain a detailed vocabulary profile by frequency band, topic, register, or word type
  • evaluate whether a learner has mastered a specific curriculum, textbook, course, or word list

Using BRAVE in research projects

BRAVE was built as a practical assessment tool and can be easily included in online studies, classroom-based studies, larger test batteries, educational projects, and collaborative research.

When BRAVE is used as part of a study, it can support individual links with built-in participant IDs. After data collection, researchers can receive detailed results with those IDs attached, making it possible to analyze each respondent’s BRAVE performance alongside the rest of the study data.

Researchers interested in using BRAVE are encouraged to get in touch. Support is available for study setup, data collection, data analysis, and methods reporting.

Data handling and privacy

BRAVE is hosted on Google Cloud infrastructure. The test does not ask respondents for names, email addresses, or other direct identifiers. Test records include optional self-report information and IP addresses, which are used for country-level geolocation and detection of duplicate sessions.

For research projects, BRAVE can use study-specific participant IDs embedded in individual test links. These IDs allow researchers to match BRAVE results with their own study data without collecting names or contact information through BRAVE.

How to cite

Golovin, G. (in preparation). BRAVE: Design and validation of a brief broad-range adaptive test of receptive English vocabulary.

Contact

For questions, suggestions, research ideas, or collaboration proposals, please contact:

Grigory Golovin — gregorygolovin@gmail.com