There are some people who will correct the phrase “the data is” to “the data are” claiming that the word data is plural. The basis of the claim is that in Latin the word data is the plural of the unit datum and that our English word is derived from the Latin. You may guess that I don’t buy this or I wouldn’t be writing this post.
It pained me so much to write the phrase “the data are” that I did a little bit of research. And by research I mean that I googled it, found some people that agreed with me, and quit looking. Despite my biased approach to research I do believe there is a logical argument for “the data is”.
First, English is not Latin and it’s not good enough to accept a grammar rule on the basis of Latin alone. Since the question is about the plurality of the word, let’s note the different kinds of nouns. There are two different kinds of nouns in English that are non-singular, count nouns and mass nouns. Count nouns are things like pencils and books for which the singular is a single unit of the object. Mass nouns are things like water that don’t have a natural unit and require a unit in order to count them (liters of water can be counted).
Let’s highlight some more differences between count nouns and mass nouns. Since count nouns are things that can be counted, we can answer questions like “how many pencils do you have?” and expect a reasonable answer. We can’t ask “how many” for mass nouns, “how many water do you have” doesn’t make sense, since there is no unit to count. Instead we can ask “how much water do you have”. Another difference is one of Brian’s peeves, you can have fewer pencils and less water, but you can’t have less pencils or fewer water.
So at this point, if we want data to be a plural noun, we have to be prepared to answer the title question of this post “How many data do you have?”. Note that this is not “how many data points” or “how many bytes of data” since both of those include an additional unit, but simply “how many data”. We also have to be prepared to say things like “Mark has fewer data than Susan” rather than “less data”.
If you’re still concerned that we’re breaking from Latin, let’s consider the word stamina. My understanding is that the Latin was a plural. But in English, stamina is not a plural. We adapt words from other languages for English, but we’re not bound by their grammar. I suggest getting ahead of the game and using data as a mass noun rather than a plural.
P.S. Apologies for my recent absence.