plenilune home

the vocabulary calculator

 

About the Vocabulary Calculator #

[download the original list of 132000 English words here]

What it actually does

The process used by the Vocabulary Calculator isn't rocket science; it's just extrapolation, pure and simple.

We start off with a 'population' of ~132000 words, and you look at a random sample of 100 or more.* It's assumed that the proportion of words known to you (or 'familiar' etc.) in the complete list is the same as the fraction suggested by the sample. The total number of words can then be calculated.

Clearly, this process will generally produce a good estimate of the number of words from the original list that you know. The big question is to what extent this number actually equates to your 'vocabulary'.

 

- But how do we define 'vocabulary'?

First of all, we have to decide exactly how we are counting. You can't use the Calculator and then just say "I know approximately 70105 words!" unless you qualify the statement carefully. Are we literally making a tally of words in the sense of permutations of letters, or are we counting concepts/definitions? Do we include declensions, plurals and so forth separately?

For a particular concept, the original word list generally held one or more nouns, an adjective, a verb and an adverb. For instance, we might have

- abound

- abundant

- abundance

- abundantly

all of which are based on the same idea; but they're counted separately, which obviously boosts your score a bit.**

On the other hand, when a word like 'quick' comes up, there is no way of discerning whether the user is just thinking of speed, or whether he also knows the word in its old-fashioned sense of 'alive'/'life'. Naturally, you can get away with identifying only one definition.

So if your result is 70105, it would be safest to say:

"I can identify 70105 different permutations of letters as words, regardless of how many different meanings those words might have. Plurals and different verb forms aren't counted, but I am including adverbs and so forth."

That's the definition of vocabulary size that we have to use. Somehow the statement has lost its clout and impact, and would not be a good conversation starter, but at least it's accurate.

Is the original database adequate?

Actually, the statement is accurate on the basis of one assumption: that all the words you know (according to the criteria above) were in the original word list in the first place. That's why it's best to have as big a list as possible to start off with, one which extends to the most obscure and unknown words. You could estimate your vocabulary size by using a pocket dictionary, but it might not itself be exhaustive enough to tell you how many words you really know: that's the value of a computerized test with a large source database.

So don't be frustrated or annoyed if most of the words in the test are completely unknown: this should in fact reassure you that all the words you do know were in the database, and hence that the results are valid. If on the other hand you knew most of the words that came up, you would be inclined to suspect that the original list wasn't really big enough as a fair basis for the test.

Proper Nouns

Originally the word list held a great many proper nouns: places, people, names of species and so forth. So the results supposedly also included the proper nouns that you knew.

This would have been fine, but for one thing: a person's vocabulary contains myriad proper nouns. Local place names and people, particular interests, specialist professions and hobbies will all contribute. The original word list couldn't possibly have held all these, and so it wasn't valid after all to say that the results included proper nouns.

The only thing to do was to exclude proper nouns altogether. The results do not include them - which is reasonable, as that sort of general knowledge is not really relevant to how articulate you are.

We can, on the other hand, be pretty sure that the new list contains all nouns, adverbs, verbs and adjectives.

 

British vs American

The original 132000-word list was compiled from various sources, some British, some American. Words will consequently have been included twice with their British and American spellings, which is unfortunate although not of massive significance: it probably just means you should round your results down rather than up.

 

Phrases

The list also included various phrases: common short strings of words, perhaps implying something subtly different to what they literally mean. So you have to decide whether you understand the phrases in the intended sense: 'tooth and nails' does not denote what it literally says.

The concern here, however, is the same as that with the proper nouns: can we be certain that all notable short phrases were included in the list? If not, it isn't valid to say that the results give the sum total of words and phrases known to the person in question.

 

Conclusion

If you want to think of your results as absolute values, then here is what they describe:

The number (for each category) of letter-permutations that form singular nouns**, infinitive verbs**, adverbs, adjectives and short phrases.

Wahey. But it seems really that interpreting the numbers in any absolute way is fraught with difficulties. So frankly it remains best to see the Vocabulary Calculator as a comparative tool: it will demonstrate whether your vocabulary is bigger and better than your friend's, but it doesn't tell you anything objective.

back to the Vocabulary Calculator

________________________________

 

*The intermediate sample of 16000 words loaded to your computer isn't statistically relevant.. The Calculator could just contain a 100-word sample and test the same words each time, with equal validity. But it's more fun if you can keep coming across more and more new terms: in fact, the large size of the sample loaded with the page is one of the chief things that make the Vocabulary Calculator interesting and worthwhile.

**Plurals and different verb tenses are not included separately unless they're irregular (e.g. 'children', 'crept') or otherwise notable...

 

 

 
    plenilune@hotmail.co.uk