Monday, October 06, 2014

On meeting 'otiose' twice again

I asked Mark Liberman to have a look at what I wrote yesterday since I was struggling to get my head around the probabilities. He was kind enough to write the following guest post:

Maybe a better way of thinking about it is this:

Say the probability that word w_i will be selected at random from a collection of text is P(w_i). Then assuming independence, the probability that the next word will NOT be w_i is (1-P(w_i)), and the probability of failing to find w_i in N successive draws is

(1-P(w_i))^N

If P(w_i) is 1/10^7 (one in ten million), and N is 1000, then we get

(1-(1/10^7))^1000

which is 0.9999. So if we take notice of a rare-ish (P = 1/10000000) word, and draw 1,000 other words at random looking to see it again, then 9,999 times out of 10,0000, we'll fail to find the moderately rare word we were waiting for. And if we draw 10,000 additional words instead of 1,000, the probability of failure is still

(1-(1/10^7))^10000 = 0.999

so we're still gonna fail 999 times out of a thousand.

But the thing is, Rare Words Are Common. That is, a large proportion of word tokens belong to relatively rare types. So suppose that there are 10,000 other words of approximately equal rareness, and every time we see one of them, we set a subconscious process to watch for recurrences of that word within the next thousand instances

If we do this a thousand times, then the chances of failure (for a thousand instances of noting a rare word and looking for it to occur again) become 

((1-(1/10^7))^1000)^1000 = about 0.9
or
((1-(1/10^7))^10000)^1000 = about 0.368

So if you do enough reading for these conditions to be satisfied once a day, you should expect to have this experience several times a week.

Now, none of this reasoning really applies, because you aren't picking words at random from a well-mixed urn, you're reading them in order in coherent text. And words in coherent text are far from independent Bernoulli trials -- when a rare word appears, the probability that it will appear again before long in the same text is massively increased by topic effects (and to a lesser extent style and priming effects).  But this just means that the experience should be more common rather than less common -- unless you insist that the texts be separate and on different topics, and so forth, in which case it gets complicated.

But still, I think that the real puzzle is not why you had this apparently odd experience, but why such we occasionally notice the kinds of coincidences that are in fact rather common.

This is not an unimportant question, since it has a lot to do with the genesis of superstition (and probably science, for that matter...)

The above is a guest post by Mark Liberman.

Sunday, October 05, 2014

On meeting 'otiose' twice in a day

Well, not in the same day, but certainly within a 24-hour period. As I was lying in bed last night, reading Charles Mann's 1493, I came across the phrase the otiose Percy on p. 78.

As of this morning, I've read to p. 90, so that's about 4,500 words later. I also read a few NY Times articles, adding perhaps another 1,200 words. And then I set about to edit an article for Contact, the TESL Ontario magazine for which I'm the editor. Almost immediately, I came across a quote from David Crystal in which he wonders,
whether the presence of a global language will eliminate the demand for world translation services, or whether the economics of automatic translation will so undercut the cost of global language learning that the latter will become otiose.

Friday, September 19, 2014

Climbing the grammar tree

I've started a new blog called "Climbing the grammar tree". The idea is that I will respond to readings I'm doing for my doctoral studies, so check it out.

Tuesday, September 02, 2014

A title misparsed

This morning, I was reading this article at New Statesman, when I came across the following:
Yet surely, when night after night atrocities are served up to us as entertainment, it's worth some anxiety. We become clockwork oranges if we accept all this pop culture without asking what's in it.
The plural clockwork oranges suddenly threw into sharp relief the title of Burgess's book A clockwork orange. For some reason that I am unable to articulate now, if I ever was aware of it, I had always parsed that title like this:
That is to say, I took orange to be a postpositive modifier of clockwork (like proof positive, governor general, the city proper, etc.) instead of clockwork as an attributive modifier of orange, like this:


This was, I must admit and odd and, even to me, puzzling title, but then it's an odd and puzzling book, so I just rolled with it. As I say, it was the plural oranges that made me see the light: adjectives don't do plurals.

I somehow overlooked the frequency of clockwork as a modifier, which should have tipped me off: in COCA, almost 40% of all instances of clockwork are attributive modifiers. Another thing that I was aware of, but which just seemed like more of the weirdness, is that clockwork is rarely--but sometimes--countable, so a clockwork is kinda weird, but not totally beyond the pale.

Perhaps one thing the pushed me to the first analysis was the stress pattern. Usually, an NP with a noun as modifier gets the main stress in the NP. It's a  
  • FAculty office, not faculty OFfice
  • SOCcer ball, not soccer BALL, and  
  • poLICE officers, not police OFficers. 
My impression is that people tend to say a clockwork ORANGE, rather than a CLOCKwork orange. This is the same pattern you get with postpositive modifiers like proof POsitive.

Whatever the reason, what really impressed me is how decades of misapprehension can be overcome by a single choice example.

Tuesday, August 19, 2014

Antedating "determinative"

The OED gives:

b. Gram. determinative adjective, determinative pronoun, etc. (see quots.); determinative compound = tatpurusha n.

1921   E. Sapir Lang. vi. 135   The words of the typical suffixing languages (Turkish, Eskimo, Nootka) are ‘determinative’ formations, each added element determining the form of the whole anew.
1924   H. E. Palmer Gram. Spoken Eng. ii. 24   To group with the pronouns all determinative adjectives..shortening the term to determinatives.
1933   L. Bloomfield Language xiv. 235   One can..distinguish..determinative (attributive or subordinative) compounds (Sanskrit tatpurusha).
1961   R. B. Long Sentence & its Parts 486   The, a, and every are exceptional among the determinative pronouns in requiring stated heads.
Today, I was reading Kellner's Historical outlines of English syntax from 1892 and came across the following on pp. 113–114 (emphasis added):

In Old English the possessive pronoun, or, as the French say, "pronominal adjective," expresses only the conception of belonging and possession ; it is a real adjective, and does not convey, as at present, the idea of determination. If, therefore, Old English authors want to make such nouns determinative, they add the definite article : 
"hæleð min se leofa" (my dear warrior). —Elene, 511.
"ðu eart dohtor min seo dyreste" (thou art my dearest daughter). —Juliana, 193.
§179. In Middle English the possessive pronoun apparently has a determinative meaning (as in Modern English, Modern therefore its connection; German, and Modern French) with the definite article is made superfluous, while the indefinite article is quite impossible. Hence arises a certain embarrassment with regard to one case which the language cannot do without. 
Suppose we want to say "she is in a castle belonging to her," where it is of no importance what-ever, either to the speaker or hearer, to know whether "she" has got more than one castle how could the English of the Middle period put it? The French of the same age said still "un sien castel," but that was no longer possible in English.

§180. We should expect the genitive of the personal pronoun ("of me," &c., as in Modern German)—and there may have been a time when this use prevailed—but, so far as I know, the language decided in favour of the more complicated construction "of mine, of thine," &c.

This was, in all probability, brought about by the analogy of the very numerous cases in which the indeterminative noun connected with mine, &c., had a really partitive sense (cf. the examples below), and, further, by the remembrance of the old construction with the possessive pronoun.
And later:

Later on, the possessive pronoun apparently implies a determinative meaning (as in Modern German and Modern French) ; therefore its connection with the definite article is made superfluous, while the indefinite article is quite impossible. Instead of the old construction we find henceforth what may be termed the genitive pseudo-partitive. See above, 178–180.

Monday, July 07, 2014

Proscribing, narrowly

Over at the NYT, Alexander Nazaryan has a rather strident article about "The fallacy of balanced literacy." Therein, he writes, "balanced literacy is an especially irresponsible approach, given that New York State has adopted the federal Common Core standards, which skew toward a narrowly proscribed list of texts, many of them nonfiction." [Now changed to narrowly prescribed.]

These texts are prescribed. That is, they're imposed, not declared unacceptable or invalid. Nevertheless, the Google Books corpus suggests narrowly proscribed is a new and growing phrase.
So, I'm curious: was this simply a typo, or did he have in mind some metaphor of narrowing down by proscription. Or was it something else?

Monday, June 23, 2014

Thinking like a freak

I listen to the Freakonomics Radio podcast from time to time, and back in May they aired an episode called "the three hardest words...," which, purportedly, were I don't know. The premise was that people hate to admit ignorance and so they hardly ever say, "I don't know."

Except that in most corpus studies, the head-and-shoulders most common, number one, top-of-the-heap three-word string in English is I don't know (It's a three-word string, not four, since -n't is an inflectional suffix, not just a contraction as is taught in elementary schools, but that's another issue.) For instance, in the 3-grams list from the Corpus of Contemporary American English. I don't know is by far the most frequent 3-gram with 199,110 instances (second is one of the at 167,785). In business meetings, we find the same results. Consider table 3.10 on p. 59 of this book, or table 5.8 on p. 183 of this paper.

Now, these are not mostly "I don't know (period)." Far more commonly, they're "I don't know if..." "I don't know what..." etc., which can often be used as a signal of disagreement rather than as an admission of ignorance. Nevertheless, the data stands in rather stark contradiction to the freaky claim. It looks pretty silly to be saying people should fess up to their ignorance, while basing the argument on a point on which you're so ignorant that you assert the most common phrase is the least (or at least the hardest).

(If you're interested in other freaky foolishness, see Joseph Heath's recent post on their simplistic view of the UK medical system.)