I do understand the importance of data-oriented studies. In fact, I was one of the contributors to the first corpus-based English-Japanese dictionary (the Wisdom EJ from Sanseido).
When I wrote the entry for 'prejudice', for instance, I included the phrase 'pride and prejudice' because there was a significant number of occurrences of the phrase in the corpus we used, and frankly, this famous title should be known to anyone studying English, whether they have read the novel by Jane Austin or not. Yeah, the story's crap, I know, but it is very well known.
This phrase was later deleted from the entry (by some editor higher up). Whatever little literary part of my mind was somewhat angered by this, but I kept my cool. Yes, there was always space limitation. Yes, after all, it is a novel, and its title may not always reflect the more general, objective and scientific facts about English. And yes, there was one editor who always objected to literary references, be it Jane Austin or Dr Johnson or whatever. Maybe those things were not 'objective' to him; or they were somehow objectionable to him. I don't know. Ah well, I was only paid to write entries to be edited later.
Don't go just yet. I'll show you another example from Longman Dictionary of Contemporary English. Here is 'sketch' from the 2nd edition (1987) (Click for a larger image):
Notice the parrot sketch reference (sense 3)? Great, isn't it?
Sadly, and I really mean sadly, this disappeared in the 3rd edition (1995):
A move towards a more 'objective' description? - maybe. But is this truly better for learners? I happen to think not.
I don't know about you, but I think anyone studying English needs to be at least informed about English fairy tales and nursery rhymes and yes, Monty Python. Anyone studying Japanese should at least be told about Japanese fairy tales and tongue twisters and what the traditional Japanese meals are like. Regardless of corpus occurrences, you see. Next time I write for a dictionary, I shall defend to death things like 'pride and prejudice'. That will be my pride, and prejudice.