Tuesday, February 19, 2008

New 360-million words American corpus

Mark Davies and the folks at BYU have just released their long awaited BYU Corpus of American English (360+ million words, 1990-2007). This is the first large-scale balance corpus of American English that is freely available. It is similar in design to the British National Corpus but over three times bigger. There was a time when the American National Corpus was going to fill this role, but in the past 7 years, it has only managed to cobble together a meager 22 million words.

There are a number of changes I'd like made to the interface, such as adding the ability to search by word family, on top of the currently implemented lemma and word search, but overall, it's a lovely gift to the world. Thanks Mark!

No comments: