Thursday, July 01, 2010

Newer AWL

In the most recent edition of Reading in a Foreign Language, Tom Cobb observes that the Academic Word List isn't really so academic. It was built upon the General Service List, which was never designed to be simply a high frequency word list. (It has other problems, such as being built on a small corpus of magazines.) West, who constructed the GSL, removed high-frequency items that were largely synonymous with other words in the list and replaced them with lower frequency words that provided broader semantic coverage. In effect, then, the AWL largely fills in the holes the West left behind. Words like area are really just broadly frequent words and not really academic at all.

Tom and I have been playing with the idea of updating the AWL by basing it not on the GSL but rather on Paul Nation's British National Corpus word lists. As Nation himself has pointed out, "the BNC is not a great corpus for making a wordlist of the high-frequency words because it is largely adult, formal, and exclusively British," but we'll take it as good enough for now. We then took the most common words in the academic sub-corpus of the Corpus of Current American English, removing those that were not common in all subsections (e.g., law, med, history, etc.) and that didn't appear in at least half of the publications in the corpus. This corpus is also not ideal, being only journal articles and only American, but again, it's freely available, and it will do for now.

If you'd like to see how much coverage our interim list has, you can test it out on Tom's Lextutor Site using the cut-and-paste method. Scroll down to the very bottom of the results page.

1 comment:

Steve Neufeld said...

Brett - you might not have seen the article in ESP (2008) by Hanicoglu et al, which highlighted the issues you raise about the AWL in quite a bit of detail. See "Through the looking glass.... in Funny how people latch on to a list like the AWL and treat it like the bible. Surprising that Tom still keeps with the K1/K2 split of the GSL and the add-on AWL for the reasons you mention (and come up in Hancioglu's article.) Coxhead, in her day, did really remarkable research, but nothing that an undergraduate couldn't better now over a free weekend or two. Some practitioners are finally catching on as well:

Beware the BNC by Nation, as it is based on the 20 million word subcorpus of the BNC (Note that the words 'CHAPPIE' and 'MATEY' are in the top 1000 words. Also, he includes the proper nouns, which for profiling purposes makes things a bit skewed at times. Again, monumental effort by Nation, but really, after you get beyond the first 3-5K bands in the BNC, you really are looking at varying shades of pale whether a word is in 7K or 17K.

In our BNL2709 list (, we came to the conclusion that there is a base lexicon that all learners really need to be well-versed in (I did a study of 90 Spanish speaking students in Mexico which shows that these words define the threshold of proficiency using the CEFR as the benchmark.) Beyond that, it really looks to our Lexitronics group like each individual learner develops their own i-lexicon, and by virtue of mechanisms like 'fast-mapping' of Hoey's "lexical priming", really don't benefit much at all in having an 'AWL' to focus on. On the contrary, as we've seen with the AWL, teachers become obsessed with 'rarefied' academia and forsake the real words that matter when it comes to teaching and learning--end result, students really can't communicate fluently.

At Lexitronics, we're looking into a different area of research we call the 'i-corpus' and are conducting some studies to see if students can compile their own 'i-corpus' which they can analyze using tools like ANTCONC. In this scenario, what is useful is to provide students with reference corpora, so they can compare their own writing to a target corpus that is relevant to them. Then, with some training, they can become self-directed learners and build their own framework without the interference from teachers or researchers. In essence, they become language researchers in their own right. A lofty goal, perhaps, but with the technology moving the way it is, not so impossible as it may seem. :)