Tuesday, June 30, 2009

Choosing useful collocations

I'm not very keen on making collocations a focus of my teaching, but I do like authentic example sentences, and slipping in common collocates strikes me as a useful thing to do.

I've been collecting useful examples for years over on the Simple English Wiktionary, and you are free to go there and take what you will. Unfortunately, there are also quite a few rather unnatural examples mixed in, so if you don't find what you're looking for, hie yourself over to the COCA and do a search like this one for help.

Generally speaking, an MI score of 3.0 or more is generally considered to show a genuine relationship between two words. The thing is, this score changes depending on how you search. For example, a search for words occurring within 4 words either side of help in the COCA returns an MI of 6.91 for defray, but if you search for 3 words either side, defray's MI is only 6.75. And if you only search for one word to the right, that goes up to 8.75. And then there's the question of whether you search for the form help, the verb help, the noun help, the lemmas help, etc.
Next, you need to consider not just the strength of the connection, but also its frequency. Back to our help example, contextualize has an MI of 3.29, but the collocation occurs only 10 times in that position, whereas the collocation with cope (MI=3.25) occurs over 340 times.
Along with frequency, you need to consider the range, in other words, the number of documents in which it is found. One document might use a particular collocation so often that it skews the results. Similarly, the context of various documents might be the same. Still with our help example, context has a very high MI score (MI=4.3) and a high frequency (over 1600), and occurs in a wide range of documents, but most of these documents have been taken from various web sites, all of which include the sentence, "Find Documents with Similar Topics Help Below are concepts discussed in this document."
Finally, transparency should be considered. The for sit, chair has an MI of 3.52, but it should be obvious to anyone who lives in the real world that sit is used with chair and may not be worthy of mention, whereas the relationship between help and cope is more opaque and may bear having attention drawn to it.

