Saturday, November 07, 2009

There aren't much data

When asked by the New York Times why some people get a sore arm at the site of a flu vaccine injection while others do not, Dr. E. Yoko Furuya, assistant director of epidemiology at NewYork-Presbyterian Hospital/Columbia University Medical Center replied, “The short answer is that we don’t know for sure... There aren’t much data out there on this topic.”

I don't know whether Furuya is a native speaker of English or Japanese (the leading E. suggests she was not born in Japan), but her use of data is interesting. Historically, datum was the singular and data the plural, but datum is rare, occurring less than once per million words. When I search the COCA for ". the data [be]", the breakdown is as follows:
  • were 141
  • are     87
  • is        28
  • was    20
  • 's          8
This suggests that for most people, data as still plural. Let's have a look at determinatives:




1

 THESE DATA
1625
2

 THIS DATA
537
3

 MORE DATA
362
4

 SUCH DATA
310
5

 ALL DATA
255
6

 THAT DATA
218
7

 SOME DATA
174
8

 WHICH DATA
169
9

 SAME DATA
131
10

 ENOUGH DATA
129
11

 ANY DATA
127
(note: #9 same isn't a determinative)

Again, 1 and 2 suggest that plural data still beats out singular data while the others are all ambiguous, being possible with both singular uncountable nouns and plural countables.

Regardless of which is most common, clearly both interpretations are widely used. The interesting thing with Furuya's sentence is that she's got plural verb agreement (data are) with singular determinative agreement (much data, cf. *much pencils), a grammatical ambulocetus of sorts. (See here for more confusion with latinate plurals.)

4 comments:

Anonymous said...

Data is plural, but is it still plural of the singular datum, or is the definition changing. Datum = 1. a. A thing given or granted; something known or assumed as fact, and made the basis of reasoning or calculation; an assumption or premiss from which inferences are drawn.
I'm not sure we mean plural of a. when we use data, or database - a singular collection of plural. Add to the confusion that the OED indicates datum is plural when "d. pl. The quantities, characters, or symbols on which operations are performed by computers and other automatic equipment, and which may be stored or transmitted in the form of electrical signals, records on magnetic tape or punched cards, etc."

Chris Slaby said...

I think the useful question is--who uses the word data (and in what context)? I can't fully answer that question, but at least one answer is scientists. Another answer is academics. These two groups are usually pretty well educated. They also write for a living. And if we're talking about scientists and academics in the U.S., they most likely write in English, and were most likely trained to write in English. Thus, there are people who 1) have actually spent some time considering the English language and 2) have been privy to past uses (i.e., previous writings by earlier scientists and scholars) of the English language. It seems to me that you see (or hear) scientists and academics using data as both a singular and a plural. Here's my theory. The average person, I would say, would think (would instinctively use/conjugate) data is singular. I think we can acknowledge that as the acceptable use. A good way to check this is to see what we teach in schools. I don't know the answer, but if someone is friends with an elementary or middle school English (language, not lit., preferably) teacher, please ask what they teach (and how they know that's what they're supposed to teach). Scientists and academics have either encountered Latin formally (they studied the language), and thus they learned that data was in fact a plural form in Latin, or have come across it via their education more generally. Scientists and academics do, I would say, encounter Latin in the daily lives more than the average person (perhaps not more than lawyers, or Latin teachers, hopefully). So at some point, many of these people were exposed to the idea that data is or can be a plural. Some were indifferent to this fact and keep using it as a singular. Some may have been immediately converted and started using data as a plural form. It seems to me to be a question of prescriptive taste. I usually use the word "who" when talking about people. "The man who was standing over there." I do not think the sentence using the word "that" instead of "who" is impossible (people use the word "that" when talking about people all the time), nor do I necessarily think it's incorrect. It's a murky area. Book is clearly a singular form. Therefore, a verb must be conjugated in the singular form. "The book is interesting." "A book contains much information." This is clearly the right form because we accept book to be singular. Some people accept data to be singular, and others, perhaps aware of a Latin precedent and/or more archaic use/function of English, will say it's plural. A prescriptivist might say one is right, or perhaps more right, than the other. I don't think I'd agree. We just have to acknowledge where people are coming from, in what context did they learn to use this word. I think many scientists and academics have encountered data as a plural and thus, next time you hear a talk about a new experiment or research, listen to the scientist or academic talking and see whether they use the singular or plural. And then after the talk, perhaps privately, ask why they said it the way they did.

Q Higuchi said...

I agree with Chris Slaby that, simply put, 'data' is basically a non-count noun in everyday English, while the 'datum/data' distinction tends to survive in scientific writing.

I think Dr Furuya's slip (if I may call it that) came about because she was caught between the two kinds of language usage (or 'registers' if you like). When she started saying 'The short answer is ...', she probably had in her mind the limited number of conflicting/confusing reports on the subject that she was aware of. There aren't many (data) - which isn't much (data). Oops.

By the way, on the many, many occasions where I agonised over the datum/data issue while translating scientific papers, I found that the non-count use of 'data' is becoming more common. Yes, 'these/those data' is still used, but at least the usage is heading that way.

Adrian said...

I think Chris Slaby is wrong on two counts. First, although many people (me included) use "data" in the singular, many people use it in the plural, and both are acceptable.

Secondly, this has little to do with Latin or knowledge of Latin or encounters with Latin. It is more a question of people's knowledge of *English*, datum and data both being /de facto/ English words. If people have been taught, or have read, that data is a plural, they will tend to do what they have been told.

To explain further: I use criteria as the plural of criterion, not for any etymological reason, but because I have learnt that *in English* criteria is the plural of criterion.