Thursday, August 31, 2006

What's OK?

Exactly a month ago, J. Wilder asked the following on the ETJ mailing list:
"Looking for a succinct, all-encompassing answer to the question what's the difference between 'That's okay' and 'It's okay'?"
In most of the instances that I could find, they seem interchangeable. There are just a few where it doesn't seem to work. For one thing, if I have done something that you think is wrong or something has happened to me, and I'm trying to reassure you that nothing is wrong, "that's OK" doesn't work. On the other hand, "it's OK" doesn't work in the following situations:
  1. Could you zip through my letter and see if it's OK?
  2. It's just one of those things. It's OK.
(Note, replacing that with this doesn't fix the problems.) My first thought was that it would be a difference between endophora and exophora, but I don't think this is true since both seem to work when there is an antecedent and both when the referent is external to the words of the conversation.

In fact, I don't think I could consistently explain why it or that is prefered in a variety of situations.

Anybody able to help out?

the long and short

A particluarly lengthy letter to the editor in today's paper reminded me of the letters in the Bangkok Post. In the early 90s, when I was in Thailand, it was my paper of choice (the only English-language alternative being the Hairy Trib, which was far too expensive for my backpacking budget). Commonly, the letters to the editor would run longer than feature articles. I determined to send them a pithy post on the subject, but was stuck between two options: "Shorter letters please." and "Shorter letters, please." While the first was the shorter of the two, there were simply no short letters to please anyone, so I opted for the second, which the Post printed the following week and then reprised at the end of the year.

Tuesday, August 29, 2006

Abhorrent big words

Doug Harkness, writing his "On Politics" column in the local paper produced the following:
"When I began helping out with a local hockey team for athletes with developmental challenges a few years back I must admit that I had no prior experience in dealing with people with autism. Rather than fearing the unknown I made it my business to find out all that I could. I will admit that while I am still a long way from being an expert on autism I know enough to have been abhorred when I read a story in the cross town paper about some residents who didn't really want a group home for autistic adults in their neighbourhood."
Indeed, if you know enough stuff, you're quite likely to be abhorred by someone. But apparently, Doug doesn't know the meaning of abhor. Then again, neither does Ken Epp, and he's a Member of Parliament. The Hansard for 2002 has Ken saying,
"Mr. Speaker, we in Canada have a rich and wealthy heritage in our youth. It is incumbent on governments at all levels, but particularly at the federal level, of ensuring that education is available to students of all financial ability. I am abhorred by the fact that some students have a lot of mental ability and the motivation but lack the money and are deprived of a necessary education."
Ignoring for the moment "incumbent... of ensuring", here again we have abhorred used to mean dismayed, upset or angered. But wait; there's more! The obligatory google search turns up over 5,000 hits for "abhorred by the fact", which indicates that Doug and Ken are far from alone in being objects of loathing for facts, ideas, and all sorts of inanimate nouns.

Wednesday, August 23, 2006

CAJLE conference

If you're in Toronto this weekend, and the first two things on your agenda don't pan out, and if you speak Japanese, you might stop by the conference of the Canadian Association for Japanese Language Education at the Japan Foundation.

I'll be presenting rather briefly at 11:45 on Saturday, along with Teruko Harada and Mihoko Yamagata about the Japanese Graded Readers Project, a project to develop leveled book-length literature in Japanese for leaners of Japanese.

(I appear to be the only presenter in the program whose name had to be in katakana, and my formal Japanese is a bit rusty, so I'm sweating a little.)

The Solid Form of Language

Back from the cottage again. My brother gave me a copy of The Solid Form of Language, a very handsome book by Robert Bringhurst, “poet, linguist, and typographer. His manual The Elements of Typographic Style has become one of the most influential contemporary texts on typographic design.” It’s a lovely Smyth-sewn paperback with a handsome cover of handmade paper.

The book is short (which makes it good for reading when I’m in charge of the two kids) and touches very briefly on a host of scripts and the variety of languages that employ them, occasionally going into some depth, but more often just exemplifying. Those parts that I found most interesting and that related to English were a discussion of the use of italics in Latin scripts and a taxonomy of script capabilities.
"Only a few systems of writing – Latin, Greek, Cyrillic, Armenian – have developed bicameral form, but every script that is heavily used develops multiple styles… This has happened with the brittle and fluid forms of the Latin lower case, which are known as roman and italic. Latin script is unusual, however, in the intimate way it has come to exploit the differences between the two."
Apparently this began in the 16th century with mathematical notation and then expanded to indicate the names of ships, books, and foreign languages.

(Japanese is one of the few scripts that does something similar, except that it does it the other way round, with the cursive ひらがな (hiragana) script being the standard and the angular カタカナ (katakana) script being used for foreign words, among other things.)

Bringhurst introduces his 4-way taxonomy of script capabilities late in the book. Where DeFrancis and others tend to classify scripts as logographic, syllabic or alphabetic, Bringhurst has semographic, syllabic, alphabetic, and prosodic. Standard staff-music notation would be an example of prosodic script, as would most English punctuation. Bringhurst wisely allows for a given language to employ a script or scripts that fall into more than one category. For example, English recruits the semographic Arabic numerals, prosodic punctuation, and alphabetic Latin scripts. In fact, he points out that Latin script can be syllabic, as in FBI, or semographic, as in MMVI (which, BTW, doesn’t seem nearly as grand as MCMLXXXIV or other late 20th century dates).

Sunday, August 20, 2006

there's two

Eric Bakovic is looking at semantic and syntactic number with minority and majority. One of the example sentences he brings up begins with there's. The problem with this example is that it's throwing an extra twist into the mix. There's (only the contracted form) agrees with plural nouns more commonly than there are does, at least in spoken British English (I don't have North America data).

Verb agreement is one of the first things that ESL students are usually taught and teachers are regularly exasperated when they don't get it right. In fact, most teachers (myself included) don't really understand the whole system themselves.

Saturday, August 19, 2006

Frequency and collocations

I'd love to say that my post about word frequency generated a great deal of discussion, but with only about 9 people viewing the blog each day so far--that's OK, it's still in its infancy--I'm happy when I do get the rare comment. At any rate, last summer, there was quite a bit of debate on the TESL-L mailing list about collocations and frequency.

According to Wikipedia, "collocation is defined as a sequence of words or terms which co-occur more often than would be expected by chance." For example, heavy smoker is a collocation (notice, we could say *strong smoker or *great smoker, but we generally don't.)

Back in 1993, Michael Lewis published The Lexical Approach, a book that has been rather influential in TESL circles, in which he pushes a view that has teachers designing and using activities to bring students attention to specific collocations. It has students recording, working with and studying these collocations. In the TESL-L discussion, I argued that this view was both unworkable and unprofitable, mainly because of the low frequency of collocations.

While language learners can build knowledge of collocations through extensive reading and listening, this is not something that we can do effectively by design in the classroom.

A few years ago, I looked at vocabulary in a reading textbook series that our program uses, Interactions. I ignored the most common 1,000 word families (from the GSL) because most of our students know these when they enter the program, but looked at the second thousand most common words and the 560 words of Averil Coxhead's Academic Word List (AWL) (using Tom Cobb's Compleat Lexical Tutor site). I found that, in Interactions 1 Reading and Interactions 2 Reading combined, 60% of these word families appeared less than four time, with singletons being the largest group. Only 24% of word families were repeated more than 7 times. Given this low repetition for individual word families in textbooks, it's clear that there are VERY few collocations that will turn up more than once. And that's over two successive 16-week courses. If they don't recur, students are very unlikely to pick them up.

So what if you deal with them out of context? The problem is that there are simply too many. If you teach strong wind as one poster to TESL-L suggested, then shouldn't you also teach wind's more common collocates: rain, blow, cold, speed, gone, and through (according to Collins COBUILD Corpus Concordance Sampler). That's seven. So, if we're looking at the top 2,000 words in English, and we estimate that there are an average of 3 collocates each that are as strong as strong wind, that's 6,000 collocates (minus whatever mutual collocates there are). There's simply no way you could spend class time on more than a fraction.

Even if you did focus entirely on collocation, the payback would be minimal. If we can take the British National Corpus (BNC) as being representative of English as a whole, then the strong wind collocation occurs a mere 3.06 times per million words (strong within 4 words either side of wind(s)). In contrast, a "difficult" word like compromise (which is not even in the top 2,000 words of English) occurs singly 10 times more often. So, is it more worthwhile to enrich students' understanding of wind by looking at collocates, or to have students study a basic meaning for compromise? Wouldn't "big wind" or "heavy wind" get them by just fine?

Here are a few things for teachers to keep in mind when they consider teaching collocations:
  • many collocations are obvious and require no teaching (e.g., look out the window; read/write/publish a book).
  • many collocations are too specialised to bother teaching to moststudents (e.g., insolvency act)
  • most collocations are too infrequent to bother teaching (e.g., rancid butter; a glimmer of hope)
  • individual words are often far more frequent than even a strong collocation, so keep things in perspective
For the times, however, when you do want to know about specific collocations (and there are good reasons for teachers to pay attention to these, even if they don't "teach" them in class, the most user-friendly way to find collocations that I know of is Just The Word.

Thursday, August 17, 2006

What's the frequency, Jack?

Frequency information can be a great boon to language learners. If I were going to learn Spanish, for example, I'd want to know the most common, say, 500 words, at least for starters. But this information isn't always easy for language learners to access. In fact, it's often pretty difficult for linguists to compile.

To begin with, there's the question of what consititutes a word. Are jump, jumps, and jumped one word or three? Is jump (the action of propelling yourself up into the air) the same word as the barrier that you hurl yourself over or the ramp that you ski off? What about jumper (a person who jumps) or jumper (a sweater or a pinafore, depending on if you're GBish or USian). How many words do we have now?

Next is the problem of deciding what a representative corpus of the language would look like. Do we look at only spoken language, only written or both (if both, in what ratio)? Do we include only language produced after a certain time: perhaps, 1980? Do we inlcude only certain flavours of English (Australian or Canadian)? What genres do we include? If it's spoken, can it be scripted or must it be spontaneous? And how big must our corpus be? Clearly a corpus of a mere hundred words would be of almost no help at all, but is a million words enough? What about 500 million?

Another large problem is simply going about actually counting all this stuff. What we need is a machine like the one in Dr. Seuss's Sleep Book with balls that drop and a chap who counts them. Unfortunately, what we have is humans and computers. Humans are smart and, given the right instructions, can usually do a fairly reliable job of dealing with the question of what a word is. Unfortunately, we have short attention spans and are much slower than computers.

Some early frequency counts in English were done by humans, for example Michael West's A General Service List of English Words (Longman, Harlow, Essex, 1953). But most recent counts are done by computers, they being great at this kind of tedious taks. Unfortunately, they have serious problems deciding what a word is (see, jump, above). They also don't deal with spelling mistakes, or spoken language very well (though they're getting better at that).

This becomes even more difficult where you're dealing with a language, such as Japanese, which doesn't have spaces between the words, and for which it's legal (though often non-standard) to write words in a variety of ways. For example, the following are all the same word (hikkoshi = move house): 引っ越し, 引越し, 引越, 引っ越, ひっこし, and each of them has at least 300,000 google hits. This was a huge problem for me when I was compiling a corpus for a series of Japanese graded readers, about which I'm presenting later next week at the CAJLE conference in Toronto, but that's another posting for another time.

Many people have attacked these problems from different angles. For English, some of most useful results are:
  1. Word Frequencies in Written and Spoken English by Geoffrey Leech, Paul Rayson, Andrew Wilson
  2. The Academic Word List by Averil Coxhead
  3. The BNC word family lists by Paul Nation
I should also mention Tom Cobb's wonderful Compleat Lexical Tutor, which provides tools based on frequency counts for English and French.

For other languages, Routledge has recently begun releasing frequency dictionaries. Those currently available include a Spanish one by Mark Davies (Series editor, along with Paul Rayson, and designer of VIEW).

Saturday, August 12, 2006

No future tense? Nonsense!

That's the typical reaction I get when I try to explain that English has no future tense. Perhaps the reasoning is similar to that which lead Ann-Marie MacDonald to write that "the present tense will reign"; People tend to conflate tense and time.

But it's not just this misunderstanding. A future tense seems to be some kind of mark of pride. Being told that your language doesn't have one often brings out Chauvinistic zeal in everyone from English teachers to students from Japan, Korea, Turkey, Finland, or Arab-speaking countries. "Of course we have a future tense," they say. In fact, the only group of students I've come across who have no problem with the idea seems to be Chinese students, who actually tend to be rather proud that Chinese has no tenses at all. (Of course, many languages, such as Spanish, do have a future tense.)

But getting back to English, ESL teachers and our materials are almost unanimous in their agreement that will and (be) going to are (is?) the future tense, despite decades of linguistic analysis telling them otherwise. Yet, it makes far more sense to teach will as one of the nine modals rather than teaching modals and then treating will separately as a tense. Similarly, there's nothing special about the going to idiom, which acts almost exactly the same as planning to, hoping to, intending to, etc. From there, it's a short hop, skip, and a jump to the idea that the present tense is often used to talk about future events, and that the past tense has meanings other than past time.

Perhaps someday there will be a pedagogic ESL grammar series with no future-tense nonsense.

Tuesday, August 08, 2006

Getting all tense about tense

We just came back from four days up at the cottage. Between putting worms on hooks, preparing meals, swimming, and drying off children, I managed to do some reading. I had just started The Way the Crow Flies when I came across the following passage.
"...everything about an air force station is new. And it will stay that way for its entire operational life...The families in the PMQs will always seem like the first families to move in, they will always have young children of about the same age. Only the trees will change, grow. Like reruns on television, an air force station never grows old. It remains in the present. Until the last flypast. Then it is demobilized, decommissioned, deconsecrated. It is sold off and all the aging, the buildup of time that was never apparent, will suddenly be upon it. It will fade like the face of an old child. Weeds, peeling paint, decaying big-eyed bungalows...
But until that happens, the present tense will reign."
Leaving aside the issues of "coordinating conjunctions" and "comma splices", neither of which bothered me, the use of the word tense in the last sentence stuck in my craw. Why tense?

Indeed, tense is related to time, but tense is purely a grammatical issue and I don't think Ann-Marie MacDonald intended to say anything about grammar, did she? Is this an instance of linguification? Is MacDonald claiming that the majority of the finite verbs used on the station will be in the present tense1, but that that will change once it's decommissioned?

I expect, having used, "It remains in the present" a few lines ealier, MacDonald was merely looking for some stylistic variation, but I wish she hadn't gone for tense.

But perhaps I'm being overly pedantic. It seems that, for most English speakers, tense just means something like, "verb form". When The New York Times thinks there is a conditional tense and The Economist thinks there's a passive one, perhaps that's what tense actually means. When ESL teachers sit around in a meeting discussing how many tenses to cover in a semester (English only has two or three), perhaps we need to accept that using tense to mean a grammatical form that is primarily used to locate a verb in time is technical jargon. And if MacDonald wants to use it to mean time in a novel, why not?

[1] Likely true; the Longman Grammar of Spoken and Written English tells us that, "conversation and academic prose alike show a strong preference for present tense forms.")

Tuesday, August 01, 2006

More myths and confusion

Over at Language Log, Mark Liberman, commenting on my FANBOYS posting, writes,
Brett Reynolds' post on FANBOYS observes that lists and hierarchies of this kind are "myths" that "[give] the faithful a comfortingly simple handhold in a confusing world". I'm sure that this is true -- but such myths can be confusing and even disturbing, not comforting, for those who think about them too seriously.
Indeed, I found myself very confused when I first started college teaching, and it's not just lists and hierarchies. Many staples of the curriculum simply made no sense to me: "extended paragraphs", mandatory topic sentences, concluding sentences for every paragraph, a grading system that deducted points for every error (as if you could count them), etc. It took me over two years, a lot of conversations, and good deal of reading to feel confident enough to assert that the curriculum is the problem, rather than my understanding of English, writing, and students' needs.