Thursday, 27 February 2014


Christopher Ehret, a linguist and specialist in African prehistory, believes that the Afroasiatic language family - the earliest attested family in the world, besides Sumerian - dates back to pre-agricultural times in northeastern Africa.  He claims that the expansion of the family began between 16,000 and 11,000 BCE, making Afroasiatic not only the earliest attested family on the planet, but also the oldest reconstructable one.

I don't think this is true, as it happens, and I find it much more plausible that Afroasiatic began in the Levant or in Anatolia about 10,000 years ago (c.8000 BCE), expanding out with the development of the Neolithic package of wheat, domesticated cattle, and so on.  But 10,000 years is still an incredibly long time, far older than any other known family; Indo-European and Austronesian are more on the order of 5,000 years before present, for example.

The reasoning behind claims of such extreme time depths is that Egyptian, the oldest attested language (and branch) of Afroasiatic, is clearly very different from other branches that appear within a few hundred years of Egyptian (Akkadian, a Semitic language, being the first).  Afroasiatic - a sprawling and diverse family, sometimes referred to as a 'phylum' or 'macrofamily' due to its internal diversity - could not have appeared on the scene in such different guises if it hadn't differentiated considerably earlier, and it makes sense for it to have expanded out with the migrations of the world's first farmers.

Afroasiatic is reconstructable because we have early surviving texts from different branches.  I'm not an Afroasiatic specialist, but I daresay that if we didn't have them, Afroasiatic would remain an undemonstrated hypothesis.  We might see tantalising similarities between modern Hebrew and Oromo, but perhaps we wouldn't be able to say that they are related.  Languages change over time, and the more time has gone by, the greater the number of changes.  Words change in meaning, new words are borrowed from other languages, and phonemes change.  This can render cognates impossible to identify and genetic relationships between languages impossible to demonstrate.

Linguists happen to know that Hebrew and Oromo are related, of course, and if you proposed an Afroasiatic family in the absence of early epigraphic evidence, you would happen to be correct.  But the point is that, if the early epigraphic evidence were lacking, the relationship would not be demonstrable.  Afroasiatic would remain a tentative hypothesis at best.  We might be able to see some patterns in material culture or DNA in northeastern Africa and the Near East that could potentially be explained by invoking the undemonstrated hypothesis of Afroasiatic, but that genetic and archaeological evidence would not be able to demonstrate the existence of Afroasiatic itself.  It would still be an undemonstrated language family.  More importantly, the language family, being undemonstrated, would not be an independent strand of evidence to add to the genetic and material cultural evidence.  It would just be one great unfalsifiable mess of evidence.


When mainstream historical linguists reject claims of macrofamilies dating back to the Pleistocene, they aren't doing it out of dogmatism.  There are several such claims; perhaps the most famous is 'Amerind', proposed by Joseph Greenberg as the ancestor of nearly every languages in the pre-Columbian Americas (besides the Na-Dene and Eskimo-Aleut languages).  Greenberg's claim is that the initial migration across Beringia into North America in the Pleistocene was undertaken by a single group speaking a single language, and that this language, 'Amerind', diversified over time into all of the diverse and seemingly unrelated language families of the Americas.

This may well have been the case.  There are biological reasons for believing that the number of initial Beringian migrants was low and that they were reasonably homogeneous.  Amerind may well have existed; there may well have been only one language family in the Americas at the end of the Pleistocene, and if we went back there in a time machine, we'd know what it was.

Unfortunately, a time machine is the only way to do it.  Greenberg's work in Language in the Americas, the book in which he outlined his claim, was sloppy in the extreme; besides using unorthodox methods (methods that couldn't eliminate the possibility of borrowing or coincidence and that relied on a passing similarity between words instead of rigorous checking of sound correspondences), he also worked far too quickly and repeated classifications that had already been known to be mistaken for decades before he started writing.

Even if the work hadn't been sloppy, though, there's almost certainly no way to verify Amerind.  The Pleistocene migration of humans across the Bering Strait happened far too long ago, and changes to all of the words and grammars of indigenous American languages have built up to such an extent that genuine scientific comparison won't show anything substantial about Pleistocene ancestry.  If we had written evidence from the Americas from several thousand years ago, then progress could be made, but that evidence doesn't exist.  All we've got are fragments that date to the last two millennia - and the last millennium for the vast majority of them.

Linguists work from the bottom up, comparing words in a few languages that show some sort of prima facie similarities with one another.  They look through long lists of vocabulary items and compare them, side-by-side, for regular correspondences in sounds (an initial /f/ in one language corresponding to an initial /p/ in another, for example).  They look at sentence structure and other grammatical features, and try to work out if or how two languages are related by looking at how they function at a fundamental level.  They only think of a word as securely reconstructed when the sound changes that led to it are known or highly plausible.  This isn't how these extreme time-depth macrofamily claims work, if they even work at all.

If you see an archaeologist, geneticist, or anyone else claiming to be able to correlate haplogroups or lithic technologies with alleged macrofamilies like Amerind or Nostratic, then you should approach their claims with caution and bear in mind that the language family they're using probably isn't well-established.  These macrofamilies are elaborate hypotheses that explain few facts and attempt to establish themselves on shaky and unscientific foundations.  It would be nice if we could correlate language, genes, and material culture at vast timescales, and it would be nice if West Eurasia DNA correlated neatly with a single super-family of languages (for example).  Unfortunately, there's very little reason to believe that we'll ever be able to achieve such a perfect science of prehistory.


  1. I just read this. I think it is a timely corrective to a lot of the naivete that you see in population genetics concerning what historical linguistics can accomplish. Most linguists aren't even interested in reconstruction, since they know that the comparative method is not going to answer the most fundamental questions about grammatical universals or the innate design features of language. As for long-range reconstruction, the most promising approach I know of is a project that attempts to use abstract grammatical features of language, rather than the traditional lexical sound correspondences. We know from the work of Johanna Nichols that these abstract parameters change more slowly than vocabularies do, so if we were going to demonstrate the reality of Amerind or Nostratic, this would be the best way to do it.

  2. I daresay that if we didn't have [early epigraphic evidence], Afroasiatic would remain an undemonstrated hypothesis. We might see tantalising similarities between modern Hebrew and Oromo, but perhaps we wouldn't be able to say that they are related.

    It's possible, but I get the impression that most people talking about this overstate the importance of early written records, and have never actually checked how much exactly later developments obfuscate the key evidence.

    Semitic and Indo-European are not the typical case. Most language families in the world have been established, and even their protolanguages been reconstructed, essentially solely from modern evidence. Not just by direct comparison on modern languages, no — but by reconstructing incrementally older intermediate proto-languages, to sieve out which features of the modern languages are truly old and inherited. This is even true for about two thirds of Afrasian as well! The Berber, Chadic, Cushitic and Omotic groups have no ancient literary languages. So your argument here seems to defeat itself by reductio ad absurdum: we don't have early epigraphic evidence on Oromo, or any of its reasonably close relatives, ergo cannot claim that it has been demonstrated to be related to Hebrew?


You can post anonymously if you really want to, but I would appreciate it if you could provide some means of identifying who you are, if only for the purpose of knowing who has written what.