You need to sign in or sign up before continuing.

3.0

Despite decades of technological advancement and entire countries' worth of money being thrown at the problem, machine translation has still not replaced the human brain. The neural networks that power Google Translate, one of the best (yes, I said best) machine translators out there, can do a pretty good job if you give them something in a language that's commonly translated to/from English—Spanish, Portuguese, or French, for example. But even then the machine doesn't "know" the languages it's using; it's only mapping one set of data points to another set of data points. And it's genuinely incapable of understanding context-dependent words: if you tell the machine to give you the French word for "read," it has no way of knowing if you want "lire" (infinitive verb), "avoir lu" (past perfect), "lisais" (imperfect indicative), "lu" (masculine singular past participle), "lus" (masculine plural past participle), "lue" (feminine singular past participle), "lues" (feminine plural past participle), "lisez" (second-person plural or formal imperative), "lis" (second-person singular imperative), or something else entirely, because there's no contextual information clarifying your request. Machines don't know what semantics are, and they sure as hell don't know anything about nuance.

Neural networks are often described as machines capable of learning, but they're not, not really—they're capable of creating new things, but only that which is formed from things they've already been given (i.e., a neutral network fed the English dictionary will not be able to understand Chinese). A neural network told to identify pictures of sheep will not understand the idea of sheep as a concept, and will instead map the word "sheep" to a certain cluster of pixels that can commonly be found in still images of sheep. This will, predictably, result in neural networks which misidentify clouds as sheep. Often, neural networks will "learn" that those clusters of pixels labelled "sheep" tend to be found in pictures of green fields, and thus you'll wind up with neural networks being given a picture of an empty pasture and confidently informing you that there's a 90% probability that there's a sheep in that picture. There's not—there never is—but the neural network can't actually distinguish between "sheep" and "location where sheep-pixels are commonly found," so you'll end up with a lot of unhelpful information.

Ever wonder why small children can so easily identify dogs, yet tend to call every unknown quadruped a dog as well? It has to do with the way our brains categorise schema. Dogs are an incredibly diverse species—a teacup chihuahua and a St. Bernard have almost nothing in common from a morphological standpoint, but thankfully phenotyping doesn't usurp genotyping when it comes to ranked taxonomy—but human brains are hardwired to recognise "dog" from a sea of potentially unfriendly similar-shaped animals. The way this works is through a process of elimination: if an unknown creature looks more like a dog than it looks like anything else, it's probably a dog. And, most of the time, it is. But machines don't have this ability, even when they can fake it pretty convincingly: they're not capable of truly, genuinely "guessing" what something is. And so a machine can always be predicted, controlled, overridden. Because the human brain, that 1.2-1.4 kg lump of meat and electricity, is capable of making shit up that's not logical. Language can't be parsed mathematically, no matter how many people try; it's organic, illogical.

And that's why linguistics are cooler than maths. No this wasn't to prove a point to anyone in particular.