Vocabulary size of Hebrew speakers

קרא בעברית | Читать по-русски

We present preliminary results from a Hebrew vocabulary test, designed for both native speakers and learners. Detailed information about the test methodology is available on the test description page.

Participants

To date, 1064 respondents have completed the test, including 386 native speakers and 678 Hebrew learners. Let's look at their age distributions:

Histogram of age of Hebrew vocabulary test participants, native speakers only Histogram of age of Hebrew vocabulary test participants, learners only

While native speakers are dominated by younger participants with a surprising peak in 15-20 year old group, learners are predominantly older with a slight excess around 40-50. We will see signatures of both groups in the data.

Vocabulary size

Let's now look at vocabulary sizes of native speakers and learners.

Histogram of vocabulary size of Hebrew vocabulary test participants, native speakers only

Vocabulary size of native speakers spans the whole range between 0 and 70000 words (following the Academy of the Hebrew Language, we assumed that total number of words in Hebrew is 80000). There are three components in the histogram. First, it is a broad Gaussian-like distribution in 30000-70000 range. As we will see later, this range corresponds to adult native speakers. Second, there is a peak around 20000 words, this is a contribution of the excess of 15-20 years old participants which we saw in the age histogram earlier. Finally, there is a tail of results below 10000 words. Most likely, it is a contribution of participants who marked themselves as native speakers by mistake. As we collect more data, the last two components should fade away.

Now, let’s look at the specific numbers for native speakers:

These statistics combine results from native speakers of all ages. Later, we’ll explore how vocabulary size varies with age.

Let's move on to learners.

Histogram of vocabulary size of Hebrew vocabulary test participants, learners only

It looks like there are two groups of learners who came to the test's website. The first group is much larger and domintates the results below approximately 13000 words. The second group is smaller, but it shows much higher results around 25000 and spanning all the way to 70000 words, which is similar to native speakers. Most likely, it is a group of highly proficient in Hebrew, 40-50 years old participants which we saw as excess in the age histogram.

Here are the specific numbers for learners:

These numbers reflect the vocabulary sizes of learners across all age groups combined.

Here is a comparison of native speakers and learners on the same plot:

Histogram of vocabulary size of Hebrew vocabulary test participants, overlapped native speakers and learners

Vocabulary size and age

Let's now dig little deeper and look into how vocabulary depends on age. On the next plot each point represents an individual participant:

Vocabulary size vs age for Hebrew vocabulary test participants, native speakers and learners

The data look a bit overwelming. Let's group participants by age and do some statistics so we can see trends. Let's start with native speakers.

Statistics on vocabulary size vs age for Hebrew vocabulary test participants, native speakers only

This is a box plot, where each box correspond to a group of participants of certain age (10-14, 15-19, 20-24, etc). Middle line of each box shows a median for the group, bottom line - 25th percentile, top line - 75th percentile, whiskers show 1.5x of interquartile range. Data outside of the whiskers is often considered outliers. Individual observations are shown as points.

Plotting data this way immediately allows us to see a trend. For native speakers, vocabulary size grows with age. It grows rapidly up to approximately 25 years (the period of formal education), than slowly goes up and reaches saturation around 55 years. There are not much data after 55 years so we can't say what happens after that. Here are the numbers.

Let's transition to learners.

Statistics on vocabulary size vs age for Hebrew vocabulary test participants, learners only

Vocabulary size of a learner should not depend on his or her age. It depends on how long a person is studing, how much effort is dedicated to a new language, whether a learner is surrounded by native speakers or uses a new language for work. What we see on the box plot is a pecularity of our set of respondents. Namely, we can see two groups of participants. The first group, a majority, spans the whole age range and has lower vocabulary size. These participants dominate for ages below 40, and on average have vocabulary of 2400 words. The second group starts to dominate for ages above 40, and it has much larger vocabulary. That is why vocabulary size on the plot changes dramatically at 40. We saw an excess of participants in the age range above 40 on the age histogram already, it was a signature of the same group of participants.

Frequently asked questions

Can I compare these results with results of other vocabulary tests in Hebrew?

Unfortunately, you can't. It is impossible to compare results of any two vocabulary tests. First, all tests use different methodologies, so they measure slightly different aspects of one's vocabulary. Second, all tests define what is counted as a word differently. For example, some count derivative words, and some do not. Third, every test uses different definition of what it means to "know" a word. Finally, not all tests in the internet are created equal. Only a small fraction of online vocabulary tests are based on rigorous scientific methods.

How does avarege vocabulary of native speakers and learners in Hebrew compares to other languages?

Comparing vocabulary sizes across languages is nearly impossible.

How many words are in Hebrew?

According to the Academy of the Hebrew Language, there are currently 80000 words in Hebrew, and the number keeps growing.


logo small

Last updated: January 25th, 2025