Tuesday, April 10, 2012

Mark Davies' new academic word lists

Mark Davies, over at Brigham Young University, has developed some fantastic corpus-based resources. His most recent project, along with Dee Gardner, is a set of academic word lists (notice the plural).

In the comparison they provide with Coxhead's 2000 AWL, they claim,
our word lists provide better coverage of academic English. The 570 "word families" in the AWL cover 7.2% of the words in the COCA academic texts, but the top 570 word families in our list cover 14.0% -- nearly twice as much. In a "neutral" corpus -- the 32 million words of academic and semi-academic texts in the British National Corpus -- the AWL covers 7.1% and our list covers 14.0% -- again nearly twice as much.
I haven't had an opportunity to look at these carefully, but at first glance, this seems like a very unfair comparison. It seems that part of the way they have achieved the very high coverage rate is to include some very frequent words, words like between, low, need, difference, use. 

Coxhead's list is built on top of the West's General Service List, which is to say that it only includes words not already listed in the roughly 2,000 words of the GSL. As a result, more very frequent words are excluded (although others, such as area, which are very common but were excluded from the GSL because they had significant semantic overlap with another word, give the AWL an undeserved coverage boost). The approach taken by Davies and Gardner doesn't have any frequency ceiling at all. Rather, they have chosen to consider any word that occurs at least 1.5 times more frequently in the academic sub corpus of the COCA than in the other sections of the corpus. This captures words that have something of an academic proclivity, but a number of those words are vehemently everyday vocabulary.

On the other hand, the new lists distinguish between lexical categories. That is to say that noun use is considered academic, while the verb use is not. Similarly, it brings word families together while still distinguishing between different members. Thus, under the headword move, neither the noun nor the verb move are academic, but movement is.

This is quite a different approach, and will take some time to evaluate. I'm looking forward to trying though.

PS, the lists have been published minus every fifth word as a sort of embargo until a paper describing the lists can be published.

1 comment:

