Vocabulary size of Polish speakers
We present preliminary results from a Polish vocabulary test, designed for both native speakers and learners. Detailed information about the test methodology is available on the test description page.
Participants
To date, 2045 respondents have completed the test, including 1121 native speakers and 924 Polish learners. Let's look at their age distributions:
Both groups are dominated by younger participants in high school to college age range, since we promoted the test mostly among them.
Vocabulary size
Let's now look at vocabulary sizes of native speakers and learners.
For native speakers:
- The median vocabulary size is 79,700 words. This means that half of the native speakers tested know fewer words, and the other half know more.
- The 25th percentile is 55,900 words. In other words, 25% of native speakers scored below this level, while the remaining 75% scored higher.
- The 75th percentile is 99,900 words. This indicates that only 25% of native speakers have a vocabulary larger than that./li>
- At the 90th percentile, only 10% of native speakers know more than 112,200 words. This represents the upper range of vocabulary sizes in the dataset.
For learners:
- The median vocabulary size is 8,200 words. This means that half of the learners tested know fewer words, while the other half know more.
- The 25th percentile is 2,800 words. In other words, 25% of learners scored below this level, while 75% of them scored higher.
- The 75th percentile is 18,000 words. This indicates that only 25% of learners have a vocabulary larger than that.
- At the 90th percentile, only 10% of learners know more than 31,900 words. This represents the upper range of vocabulary sizes among the learners who took the test.
Here is a comparison of native speakers and learners on the same plot:
Vocabulary size and age
Let's now dig little deeper and look into how vocabulary depends on age. On the next plot each point represents an individual participant:
The data look a bit overwelming. Let's group participants by age and do some statistics so we can see trends. Let's start with native speakers.
This is a box plot, where each box correspond to a group of participants of certain age (5-9, 10-14, 15-19, etc). Middle line of each box shows a median for the group, bottom line - 25th percentile, top line - 75th percentile; the box covers half of all participants within the group. Whiskers show 1.5x of interquartile range. Data outside of the whiskers is often considered outliers. Individual observations are shown as points.
Plotting data this way immediately allows us to see a trend. For native speakers, vocabulary size grows with age. It grows rapidly up to approximately 25 years (the period of formal education), than keeps going up slowly for the rest of life. Similar studies often report vocabulary decrease after around 55 years old; we do not see this effect. Here are the numbers:
- A 12-year old knows, on average, 40000 words.
- A 17-year old knows, on average, 56000 words.
- A 22-year old knows, on average, 66000 words.
- During the period of active vocabulary grows, students learn around 2600 words every year.
- Most adults know, on average, 91000 words.
Let's transition to learners.
For learners:
- The median vocabulary size of learners in high school and college age range (15-24 years) is 6,800 words.
- The median vocabulary size of adult learners (>25 years) is 10,500 words.
Frequently asked questions
Can I compare these results with results of other vocabulary tests in Polish?
Unfortunately, you can't. It is impossible to compare results of any two vocabulary tests. First, all tests use different methodologies, so they measure slightly different aspects of one's vocabulary. Second, all tests define what is counted as a word differently. For example, some count derivative words, and some do not. Third, every test uses different definition of what it means to "know" a word. Finally, not all tests in the internet are created equal. Only a small fraction of online vocabulary tests are based on rigorous scientific methods.
How does avarege vocabulary of native speakers and learners in Polish compares to other languages?
Comparing vocabulary sizes across languages is nearly impossible.
- Many languages have highly inflected forms, where a single root word can produce numerous variations. In Latin, for example, the verb "amare" (to love) generates forms like "amo" (I love), "amavi" (I loved), and "amatus" (loved). Including all these inflected forms can significantly inflate a language's total word count.
- In some languages, such as German, compound words are common. For instance, "Donaudampfschifffahrtsgesellschaftskapitän" is considered a single word, meaning "Danube steamship company captain." Deciding whether to count compound words as individual entries or as combinations of existing words can be challenging.
- Certain languages have extensive derivational morphology, which adds significantly to their word count. For example, in English, the word "happy" leads to related forms like "happiness," "unhappiness," "happily," and "unhappily." Should these words be counted as one or as separate entries?
- Many languages also borrow extensively from others, expanding their vocabulary. English, for instance, has adopted words like "déjà vu" from French, "tsunami" from Japanese, and "emoji," also from Japanese. Determining when a borrowed word becomes a full-fledged part of a language adds further complexity to the task.
How many words are in Polish language?
It is very difficult to determine the exact number of words in any language, because the estimates depend strongly on what is considered an independent word and what is considered a derivative. We use the PWN Dictionary of the Polish Language, which contains 140,000 words.
Last updated: February 2nd, 2025