Wednesday, June 16, 2010

New Corpus from Mark Davies

Brigham Young's Mark Davies has just made available, in alpha, a new Corpus of Historical American English. If you don't know about Mark's free corpora, you should check them all out.

Corpus of Contemporary American English (COCA)400 million words1990 - 2009
Corpus of Historical American English (COHA)   NEW400 million words1810s - 2000s
BYU-BNC: British National Corpus100 million words1980s - 1993
TIME Corpus of American English100 million words1920s - 2000s
Corpus del Español100 million words1200s - 1900s
Corpus do Português45 million words1300s - 1900s

He's (almost?) singlehandedly put together the largest collection of freely useable corpora. Of the above, only the BNC was not compiled at Brigham Young.

Also, when I pointed out to him that it would be lovely if we could query the range of publications and documents in which a text appears, he agreed and it should be possible in a few months.

No comments: