As I wrote before, one response to my explanation about idioms was,
"None of the correspondents have suggested that they have any difficulty recognising or understanding 'hit the jackpot', yet the low level of occurrence of the expression in corpora suggests that it should be so unfamiliar as to cause difficulty even to native speakers."I'm afraid this doesn't show up a problem with the corpora themselves, but it might go some way to explaining why language teachers seem to be so loath to use corpus data: they don't understand what it tells them.
It wouldn't be unusual for a native speaker of English to encounter language that occurs with the frequency of "hit the jackpot" a number of times per month. That's because native speakers of English tend to encounter millions of words each month. The recent Mehl paper in Science suggests that we speak on average something like 16,000 words per day. Presumably, we're doing much of that in conversation with others, often more than one person, so let's put our conversational word count at 40,000 per day spoken and heard.
Then there's TV. I don't have average numbers, but after looking at a few transcripts, it looks like 7,000 words per hour might be a reasonable estimate. According to Neilson, the average American spends 4.5 hours per day watching TV, so we can add another 30,000 words or so to our count, which now totals 70,000.
I have no data on how much people write, but I suspect it's very little. In terms of reading, I can find no adult data, but 5th-grade children read about 5,300 words per day, bringing our total daily word exposure to roughly 75,300 or 2,290,000 words per month. There are likely other sources of input that I have omitted, but this should be sufficient to make the point.
At the previously established rate of 0.18 to 2.0 occurrences paw, we could expect to see "hit the jackpot" about one to four times a month. If you're about my age, you've probably heard it about 900 times in your life. So, contrary to the above writer's conclusion, it's not at all surprising that we know it. But would you be surprised hear that my six-year-old son doesn't? (I just asked him [update: May 25, 2009. He's almost eight and he still says he doesn't know. update 2: Oct 9, 2011: 10 and still unfamiliar.]).
In contrast to native speakers, our learners don't get anything like 2.3 million words a month input. And what input they do get is degraded by the fact that they don't understand much of it. Thus, what seems very common to us, is quite rare for learners. Somehow, though, it's hard to get many language teachers to accept this. They refuse to believe that idioms are not common, but as we saw recently, anything below about 30 occurrences pmw should be considered low frequency.
There are many factors that can skew our perception of a word's commonality. Psychologists have taken this issue much more seriously than have language teachers/applied linguists and have evolved a number of measures. These include:
- number of letters/phonemes/syllables
- written/spoken frequency
- subjective familiarity rating
- concreteness rating
- imagability rating
- average age of aquisition
- word category (noun, verb, adj, etc.)
- status (colloquial/dialect/alien etc)
- semantic grouping
Earlier posts this series: Idioms, Differences between the corpora, & Where's the cutoff