I asked

Mark Liberman to have a look at

what I wrote yesterday since I was struggling to get my head around the probabilities. He was kind enough to write the following guest post:

Maybe a better way of thinking about it is this:

Say
the probability that word w_i will be selected at random from a
collection of text is P(w_i). Then assuming independence, the
probability that the next word will NOT be w_i is (1-P(w_i)), and the
probability of failing to find w_i in N successive draws is

(1-P(w_i))^N

If P(w_i) is 1/10^7 (one in ten million), and N is 1000, then we get

(1-(1/10^7))^1000

which
is 0.9999. So if we take notice of a rare-ish (P = 1/10000000) word,
and draw 1,000 other words at random looking to see it again, then 9,999
times out of 10,0000, we'll fail to find the moderately rare word we
were waiting for. And if we draw 10,000 additional words instead of
1,000, the probability of failure is still

(1-(1/10^7))^10000 = 0.999

so we're still gonna fail 999 times out of a thousand.

But
the thing is, Rare Words Are Common. That is, a large proportion of
word tokens belong to relatively rare types. So suppose that there are
10,000 other words of approximately equal rareness, and every time we
see one of them, we set a subconscious process to watch for recurrences
of that word within the next thousand instances

If
we do this a thousand times, then the chances of failure (for a
thousand instances of noting a rare word and looking for it to occur
again) become

((1-(1/10^7))^1000)^1000 = about 0.9

or

((1-(1/10^7))^10000)^1000 = about 0.368

So
if you do enough reading for these conditions to be satisfied once a
day, you should expect to have this experience several times a week.

Now,
none of this reasoning really applies, because you aren't picking words
at random from a well-mixed urn, you're reading them in order in
coherent text. And words in coherent text are far from independent
Bernoulli trials -- when a rare word appears, the probability that it
will appear again before long in the same text is massively increased by
topic effects (and to a lesser extent style and priming effects). But
this just means that the experience should be more common rather than
less common -- unless you insist that the texts be separate and on
different topics, and so forth, in which case it gets complicated.

But
still, I think that the real puzzle is not why you had this apparently
odd experience, but why such we occasionally notice the kinds of
coincidences that are in fact rather common.

This
is not an unimportant question, since it has a lot to do with the
genesis of superstition (and probably science, for that matter...)

The above is a guest post by

Mark Liberman.