Thursday, May 12, 2011

Google Books Corpus with BYU interface

Mark Davies has now provided a new way to search the Google Books corpus. That would be the 155 BILLION word Google Books Corpus. His interface allows you to search not just for simple strings, the way the NGram viewer does, but to search by part of speech (e.g., you can search for hit the + NOUN). You can also get all the inflected forms of a word. For example, instead of searching separately for hit the fan, hitting the fan and hits the fan, you can just search for [hit] the fan. You can also discover collocates. For example, if you want to know what words typically come within two words after vicious, Bob's your uncle.

There's lots more fun to be had here if you explore. I think my afternoon just got all booked up.

