As a person with years of academic experience as a linguist and deep love for learning languages, I’ve been pretty much used to discovering all the intricacies of different language systems and comparing them with each other. However, even when I developed an enthusiasm for programming, I still treated natural and programming languages as two totally separate concepts. But one day, I decided to give it more thought and realised that there are way more similarities between the two than I had initially thought…
When I first started learning Java, it didn’t take long until it reminded me of the language which has always had a special place in my heart: German. The strictness and convolution of Java’s syntax is actually an almost perfect analogy to German. To name a few examples, Java doesn’t easily forgive mistakes and in order to build a meaningful expression there, one often needs a whole lot of words. Even to print a simple line, you need the whole System.out.println()
method, and everything that you write must be wrapped in a class, introducing many lines of seemingly unnecessary syntax. Similarly, with its long words and never-ending sentences, German often used to leave me with a feeling that it’s quite a big deal to say anything at all in that language. And do you remember those innumerable times when you forgot to put a semicolon at the end of a line and your text editor threw a red, screaming error at you? I think that every German learner can relate, but in order to explain it, we need to talk a little bit more about grammar.
Throughout history, German has developed the so-called separable verbs. To name a few examples: um|schreiben (“rewrite”), auf|machen (“open”) or ab|bauen (“dismantle”). At first glance, they look like typical verbs, but when you try to build a sentence with them, real craziness gets uncovered: the German speakers have a weird habit of cutting those verbs in half, leaving us with a prefix and the rest of the verb separated from each other! For this reason, very simple sentences using the aforementioned verbs would look the following way in German:
Ich schreibe meinen Artikel um. (I’m rewriting my article.)
Er macht das Fenster auf. (He’s opening the window.)
Wir bauen die Marktstände ab. (We’re dismantling the market stands.)
At first glance, it resembles the English phrasal verbs a lot (“turn down”, “speak up”, “get off”). However, in German, the little prefix (um, auf, ab) always has to go to the very end, no matter how long your phrase is. In effect, the following could be theoretical sentences in German:
Ich schreibe jeden Freitag um 3 Uhr nachmittags zusammen mit meinen Freunden, die auch in der Nachbarschaft wohnen, einige meiner Artikel, die ich eines Tages in der Zeitung veröffentlichen möchte, um.
Which means:
“Each Friday at 3 pm, together with my friends, who also live in the neighbourhood, I rewrite some of my articles that I’d like to publish in the newspaper one day”.
Almost each beginning (or even advanced) German learner has the tendency to forget about those tiny prefixes by the time they have finished their sentence. In result, what they just said either doesn’t make sense at all or means something completely else (for instance, schreiben simply means “write”, machen means “make” or “do”, and bauen means “build”). In effect, even though you’re covered in sweat by the unbelievable feat of finishing a German sentence, your interlocutors are left with confusion and you see a big “ERROR” written in their big, staring eyes. Just like Java throwing at you lines of red code just because you forgot about that semicolon at the end of your line…
Another rule of Java that may seem quirky in the very beginning (especially if you’ve only had experience with languages like Python or JavaScript) is that you always have to specify the type of your variable while declaring it. To name a few, we need to get used to remembering the types such as char
for single characters, int
for integers or boolean
for true/false. This reminds me of the pain that German learners must go through when memorising the lists of all the nouns with their articles – der, die or das, referring to masculine, feminine or neuter words, respectively. For a person who only speaks English as a foreign language and has to face the challenge of mastering the German skills, this is usually a never-ending nightmare. “Why do they need all the three articles?”, you can hear many times, “while Spanish, French or Italian can do with only two?”. Most of the Scandinavian languages, maybe driven by the love for more simplicity, also decided at some point to connect the masculine and feminine grammatical gender into one. And although Dutch officially still has retained the masculine, feminine and neuter ones, the distinction between the first two has almost disappeared in real life, too. But German, apparently just out of spite, still sticks to all three…
However, despite all the jokes and complaints about their grammars, both German and Java gain a lot on closer acquaintance. What seems like redundant complexity and malice in the very beginning turns out to be elegance, chic and love for order with time. Whether you like it or not, Java is still one of the most widely used languages used for Android, web and desktop applications. Similarly, although German may seem cumbersome and not the most practical language there is, we cannot deny that it did surface as the language of science, poetry and philosophy, giving the world many prominent individuals such as Schiller, Kepler or Humboldt, to name only a few. Needless to say that Goethe is still worshiped as one of the greatest geniuses of the western world, next to Chaucer, Shakespeare, or Dante. But the similarities between the programming and natural languages don’t end here and they stretch far beyond.
Language systems
Each natural language is constituted by a set of several different systems: phonetics and phonology responsible for its sounds, morphology dealing with words and their internal structure, as well as syntax, semantics and pragmatics. Do the last three ring a bell? Well, you might have encountered them already in your programming career, because the syntax, semantics and pragmatics considerations are important building blocks that you need to know in order to say that you can program in a programming language. And that’s why we are going to have a look at them now.
Syntax
A common assumption in linguistics is that a human being can create and understand an infinite number of sentences in their mother tongue. This is illustrated by the set of sentences below:
(1) The cat was not chasing the mouse.
(2) The cat was not chasing the mouse, but it was running away from the dog.
(3) The cat was not chasing the mouse, but it was running away from the dog, which had been hungry for hours.
If we had all the time in the world, we could add more elements to the sentence infinitely, as there would always be some possibilities of modifying its elements. We could replace “mouse” with “rat” and so on. However, we are not completely free, as there are some constraints that we have to respect for the sentence to still make sense. For instance, we couldn’t say “A mouse was chasing the cat”, since it would change the meaning completely (well, actually, something like that could be possible for instance in Polish, but this is outside of the scope for today). Neither could we swap the words randomly to create a sentence like Was mouse not a chasing cat the. However, the syntax rules are still heavily dependent on the language. In effect, some syntaxes are simpler and some might get pretty quirky. It’s very similar in programming, where we also have the notion of syntax, whose rules vary greatly, depending on the language. For example, whereas the syntax of Ruby is considered as rather intuitive, that of C++ is way more complex. But let’s see that in practice by comparing two short snippets of Python and JavaScript, respectively, by the example of how they approach functions:
def calculate(x, y):
if x < y:
print(y-x)
elif y < x:
print(x-y)
else:
print(‘The numbers are equal!’)
Code language: PHP (php)
const calculate = (x, y) => {
if (x < y) {
console.log(y-x);
} else if (y < x) {
console.log(x-y);
} else {
console.log(‘The numbers are equal!);
}
};
Code language: JavaScript (javascript)
There are a few notables differences between the two syntaxes:
(1) In Python, functions are defined by the use of def. In JavaScript, they are usually defined by creating a variable const with the parameters placed between the = and an arrow sign (an alternative is the traditional function declaration from before the introduction of ECMAScript 6).
(2) In Python, blocks of code are defined by indentation, whereas in JavaScript, they are introduced by the use of curly brackets, and the optional indentation is only for the visibility purposes.
(3) Python uses the elif keyword, which is the contraction of else if. In JavaScript, the full form must be always used.
In spite of the addressed differences between the syntaxes of Python and JavaScript, I’m pretty sure you can roughly understand both even if you only “speak” one. Even though specific parts of the syntax may look completely different at first glance, with time, you start to take notice of the fact that the logic between the two is not as dissimilar as you might have initially suspected. And that’s exactly what happens between natural languages and their speakers, too. As I was learning Czech for a short time in the past few months (I’m a Polish native speaker myself), the language of my neighbours seemed pretty intimidating at first. Shortly after, however, it became clear to me that there are in fact, many more analogies between the two Slavic tongues than there are differences. Be it programming or natural languages, there will always be some variation between how they are built. At the end of the day, however, you can be surprised at how many functionalities they have in common.
Pragmatics
Now, let us look at the next language system more: pragmatics, which is the part of linguistics that deals with how language is affected by the context, or the situations in which it is used. The adjective “pragmatic”, on the other hand, is defined as solving problems practically, having in mind realistic considerations. For example, when we are lost in Berlin and ask a pedestrian “Do you know where the Brandenburg Gate is?”, the worst thing that the person may say is “Yes, I do” and walk away. What we are doing is not inquire about their knowledge first but rather ask straight to tell us the way if they know it. It’s for purely practical purposes that we do it by the use of only one question. And it’s a good practice to answer Take the U5 metro line. Similarly, just like a good interlocutor knows all this stuff about effective communication, a competent programmer can use the whole set of best practices, or patterns, in their code. Why? Because a clean code is the key for a well-organised, fast application that is not only easily understandable, but also easy to maintain. This can include making correct use of the scope (defining a variable globally if it’s used throughout the program or only inside a certain function if we don’t need it elsewhere).
Semantics
The last language system where programming and linguistics coincide is semantics, in the latter defined as the study of meaning. When we hear the sentence I saw a unicorn in my garden, on the very low-level, it’s nothing but soundwaves that are picked up by us and processed in our brains so that we can make sense of them in no time. And it doesn’t only relate to full sentences but also words – the mere word unicorn creates a mental image of a mythical creature with a horn in our minds. We realise this especially when someone tells us “Do not think about a unicorn”. Even if we are to try not to think about it earlier, we first need to visualise what a unicorn is. To put a long story short, it has an effect on us. Similarly, in programming, the semantics of a language is what effect a strip of code has in practice. How does this function make our application behave? Even in HTML, we make use of numerous semantic elements like <h2>
or <p>
tags that give each part of our document a meaning, so that we know that each component has a concrete role to fulfill.
Low level and high level
If we were to move on a scale from the languages most resembling human talk to the most machine-like ones, we would start with high-level languages like Java, Python or Ruby, followed by the low-level Assembly, C, and Pascal (although the latter one is not being used in professional settings anymore). Whereas the former are easy to understand or debug for us, human beings, they are not directly understood by the computer and must be compiled to the machine code behind the scenes. On the other hand, the low-level languages may be less programmer friendly, but they use a code that is optimised for the processor to a higher degree, which makes compilation easier. The most low-level language there exists, of course, is the binary computer language, operating only on zeros and ones (having said that, we should still mention quantum computing, which could soon revolutionise the performance of everyday computational tasks and let us leave the traditional zero-and-one approach behind).
Unlike in the case of programming languages, however, we cannot say that some natural languages are more low-level and others are more high-level. All in all, the complexity of a language is a subjective feeling and cannot be judged impartially. But, interestingly, linguistics differentiates the so-called surface structure and deep structure, which could serve as an analogy here. Whereas the former corresponds to the outward form of our sentences (i.e., the one that is heard), the more abstract deep structure conforms to the general sense. Or, in other words, the deep structure is how our brain understands the sentence, and the surface one communicates the actual information that expresses it. The deep structure is always translated to the surface structure by the use of transformations. To understand it better, let’s look at the following examples:
(1) James broke the window.
(2) The window was broken by James.
(3) It was James who broke the window.
(4) What James did was break the window.
I would dare to say that the relationship between the deep and surface structure is very similar to that between our high-level languages and the machine language in programming. There are many a time numerous ways one functionality can be programmed. Some are simpler and easier to read, while others are more convoluted and take more intellectual effort to decipher for our human brains. However, eventually, all of the code is translated into the fundamental system of zeros and ones, just like there are many elements that differ between the above sentences. Linguists also like to use a lot of quirky terminology, so each would have a name, be it the passive (2), a cleft sentence (3) or a pseudo-cleft sentence (4). However, in the grand scheme of things, all of them express a single thing: the fact that the rascal who broke (again!) that brand-new window was James!
Ending note
So, even though programming and linguistics seem to be two totally different worlds at first, with a little bit of imagination and open mind, the two concepts are more interconnected than it may seem. And, supposing you don’t let yourself become disheartened by Java and German’s seeming unfriendliness and once you’ve mastered their syntaxes and know all of their tricks, you can proudly pat yourself on the shoulder. Let me say even more: even among the individuals that have a negative opinion of the aforementioned languages, all will probably still admit the unquestionable demand on both of them. Both Java and German always unshakably keep their high position in the rankings of the most widely chosen languages – be it as a developer to construct an application, or a translator at a company that does a lot of business with renowned companies having their seats in Germany.
More about the similarities between programming and natural languages is coming soon. This time, we will take a closer look at language change.