Language (r)evolution

April 3, 2023

As ancient Greek philosopher Heraclitus once said, the only constant in life is change. And it accurately applies to language as well. The spectacle of language change can be observed from two perspectives: synchronous (different language versions going hand-in-hand at the same time) and asynchronous (how language evolves across decades, centuries or millennia). And, taking into account especially how rapidly technology is developing nowadays, we shouldn’t be surprised that a similar phenomenon can be also observed in programming…

Dialects

At some point in our lives, we do realise that even within the borders of the same country, people living in different regions tend to speak a bit differently. For instance, in Munich, where I live, you can sometimes hear the locals speak the Bavarian dialect – maybe not so often in the city itself, with all the newcomers from other German cities, but once you travel further away, it’s actually difficult to hear the “standard” language used on television. And Bavarian is not an exception, of course. People living close to Dresden or Leipzig will also speak differently from the ones up north, close to Hamburg. But, even though each of these dialects is to some extent different from others, they are still local varieties of the same language, with slightly different grammar, sounds, and vocabulary. Similarly, a dialect in programming is a version of a language that may slightly differ from its “parent” and add some additional features, but generally stays the same.

For instance, CoffeeScript and TypeScript are two programming languages often considered to be dialects of JavaScript. They differ from its parent by the use of syntactic sugar, which is a modified syntax providing the same functionality for features that are already there. It doesn’t aim to reinvent the wheel with completely new concepts, but rather aims to help make the code more concise and easier to read or write. Now, you can ponder what the relationship between CoffeeScript, TypeScript and JavaScript is. The truth is that the two compile into JavaScript, which means a special program is used behind the scenes that translates their code to JavaScript. The advocates of CoffeeScript point out that it uses less code and therefore offers better readability. It doesn’t need semicolons and curly braces, but rather relies on the use of whitespace. There is also no need to declare variables.

const myFunction = (counter) => {
   if (counter === 10) {
      console.log(“The code will be executed”);
   } else {
      console.log(“The code won’t be executed!”);
   }
}Code language: JavaScript (javascript)
myFunction = (counter) ->
   if counter is 10
      console.log “The code will be executed”
   else 
      console.log “The code won’t be executed!"
Code language: PHP (php)

The example of a simple function above shows the simplicity of CoffeeScript (right). It re-introduces the use of meaningful whitespaces known from Python in the positions where we had to use round and curly brackets in JavaScript (left), and it doesn’t require us to declare the function with the const keyword.

As far as I’m certain that there are people who think that some dialects are easier to learn than others, we are not going down that road – in linguistics, all dialects and languages are considered equal, because their difficulty is highly subjective, depending on your mother tongue or other foreign languages (or dialects) that you may speak. However, analogically, we could venture to make a metaphor that each dialect, although different on the surface, compiles to its parent language behind the scenes. As another funny fact, it’s also worth mentioning that the syntactic sugar of CoffeeScript was inspired by several other programming languages, among others Python and Ruby. An analogy can be also observed in natural languages. In the Eifel (a region in western Germany), for example, one can hear a few words coming from French, spoken just across the border in neighbouring Belgium. Some of the examples are Klüer for “color” (fr. couleur) or Visaasche for “face” (fr. visage). This shows that also natural dialects can derive elements from other languages!

On the language change

Do you sometimes get a feeling that you don’t understand today’s teenagers at all, because they have developed a whole system of new vocabulary that doesn’t resemble at all what you remember from the time when you were their age? Me too, despite the fact I used to be a student myself not so long ago. But, actually, language changes way more than we could imagine.

In fact, English from before the Shakespearean era was so different from the one we speak nowadays that it is hardly understandable even for today’s native speakers. A substantial part of this state of things is attributed to a phenomenon that happened a very long time ago. To start with, way before the Europeans discovered the Americas, around 1400 AD, English vowels looked quite different than they do now. For instance, today’s “bite” was pronounced rather like “beet” and the then “meat” was very close to our “met”. However, owing to a change still shredded in mystery (also known as the Great Vowel Shift), people started speaking differently, gradually modifying almost all of their vowels. The consequence is that a person living in 1400 would probably not be understood by the language users living now. That is also in large part why English pronunciation is so inconsistent with its spelling. And, more importantly, the described change happened over the time of approximately 300 years. And believe me, 300 years for such a fundamental change is not so much! For this reason, we could say that it was a little linguistic revolution. 

Similarly, our programming languages are also constantly changing. A good way to illustrate it is the case of JavaScript. Since its creation in 1995, we have been experiencing new releases of the language within an interval of several years (in this case often referred to as ECMAScript, a JavaScript standard). Examples of the changes introduced by ECMAScript 6 are: the introduction of immutable variables (const), template literals enabling embedded expressions within strings, or arrow functions. Sometimes, the differences between two versions of one language are much more striking. This was the case of Python 3, which was an updated version of Python 2 directed mostly at fixing the bugs of its predecessor. However, the changes were so big that v3 was no longer compatible with v2. Something similar happened in English a long time ago.

But what exactly is  that mysterious force constantly giving rise to our languages evolving? Although the full mechanics of how and why this happens is still subject to controversy, there are many interesting concepts that deserve to be mentioned. One of them is the so-called principle of least effort, which means that most languages tend to reduce their complexity with time, because, to put a long story short, their speakers want the action of speaking to be as painless and smooth as possible. That explains why we have observed the transition from eight grammatical cases in Proto-Indo-European (the common ancestor of all the Indo-European languages, spoken 4500-6500 years ago), through seven of them in Latin to zero in today’s Spanish, French or Italian, or why we often use the abbreviated forms like “gonna” and “wanna” instead of “going to” and “want to” in everyday English. Two other reasons for language change are language contacts (when certain words in one language are borrowed from another) or the emergence of new needs of its speakers, which, in turn, propels the demand for new vocabulary. Surely, we didn’t need a word for “metaverse” or “deepfake” half a century ago.

And why do our programming languages change? Why did we move from directly feeding our computers with binary code to the state of things we have achieved today? I assume that it’s for two main reasons: firstly, because our needs evolve, and secondly, because our technology evolves, too.

It may be hard to believe that, but the history of programming started actually way long before you can imagine. We need to go back all the way to the 1840s, when English mathematician Ada Lovelace composed an exhaustive set of annotations about the Analytical Engine, Charles Babbage’s proposed general-purpose computer design. Although Lovelace didn’t finish developing the machine during her lifetime, her work was groundbreaking and this is why, nowadays, she is considered the very first programmer. However, it was not until the 1940s that programming began to take shape as the discipline in the modern sense. In those days, the only language that existed was that of zeros and ones, so the instructions that the computer was fed went directly to the CPU. Even though such code was easier and faster to grasp for the computer, it was counterproductive from the human point of view. Throughout the years, as technology was constantly developing and more and more depended on being able to program efficiently, actual programming languages were invented in order to make programming more developer-friendly. In the past, there was a term “very high-level language” in reference to languages with a very high level of abstraction, including Python, Pearl, PHP, or Ruby. However, the term went out of use as long ago as in the 1990s. This is owed to the fact that as the years go by, the general tendency is to use the languages that are more and more abstract and resemble natural languages. In effect, what used to be considered very high-level 30 years ago is considered just… regular high-level today. Just as we moved from eight to zero grammatical cases in some of the natural languages, because we wanted to speak simpler, we also made the action of programming simpler from the programmer’s point of view. Both cases are connected by efficiency and lower effort.

What the future brings

In this and the previous article, we have been taking a look at programming and natural languages by comparing them with each other. But, taking into account the general tendency of programming to be more and more high-level and resemble the way in which we actually speak, does it mean that very soon, such an analysis won’t be needed anymore, as we will be able to program using only natural language in a few years?

Actually, to say “yes” wouldn’t be such a long shot. In the era of search engines, autocomplete, automatic translation and the recently so popular ChatGPT, we can be sure that natural language processing has a very bright future ahead of it. One instance of the area where the work on that is already taking place is Project Wisdom, a Red Hat initiative in collaboration with IBM Research that aims to make Ansible usage faster and simpler. More specifically, “Project Wisdom will be able to read plain English entered by a user, and then generate automation content written in the Ansible syntax […]”. Although Project Wisdom solely focuses on the automation of automation (repetition intended), it is possible that also the future of programming will look similar. 

One possible hindrance that I see in this respect, however, is the fact that natural language is not the best choice to say something clearly. All in all, our everyday communication is flawed – full of misunderstandings and, in effect, repetitions or disclaimers aimed at resolving any emerging doubts. For this exact reason, although programming in natural language may seem like a dream and the ultimate stage there is to achieve as far as coding is concerned, it could actually turn out way too tedious and counterproductive than the traditional approach of making use of programming languages. One of the solutions able to resolve that problem that pops into my head is that the computer could ask us additional questions along the way to get rid of any uncertainties as to what code to write for us.

Nevertheless, with the general tendency of the syntaxes being more and more abstract, I have no doubt that the still considerable abyss between natural and programming languages could soon be, if not annihilated, then at least drastically reduced, and our lines of code written in plain English would be changed into zeros and ones using a sophisticated compiler making use of a powerful AI model.