Beyond Hype: How Open Source, Open Science and Open Data fuel innovation

July 31, 2023

The latest book by the Czech-Canadian scientist and policy analyst Vaclav Smil, Invention and Innovation: A Brief History of Hype and Failure (The MIT Press, 2023) inspired me to write this blog entry, partly as a review, but also as an opportunity to share some reflections about its implications for the domain of open source enterprise software and on how the virtues of openness can benefit science and technology as a whole. As I have been myself trained as a historian of science and technology, I am keenly aware that inventions and innovations have been essential parts of human activities for millenia, and that they are fundamentally tied to the social forces and cultural rulesets of the places and times in which they occur. Also, inventions have never followed the linear paths of progress that 19th-century positivists would have liked us to believe, an ideology which is, unfortunately, still widespread in some form or another among many influential technologists today. As a technologist, I have witnessed the software industry with all its inventions, spectacular innovations, but also inflated expectations and disappointments during the last 3 or 4 decades. Like most of my peers, I am sensitive to the fact that failures are an integral part of our technological journey. On the other hand, if we “techies” are honest with ourselves, our enthusiasm for technical novelties and innovations also means that we are quite likely to succumb to the temptations of hype. The word “hype” appeared in the English language only 100 years ago in the context of publicity and marketing – probably as a contraction of hyperbole – in the negative sense of an exaggeration, a claim that raises very high but false expectations. Semantically it is dangerously close to propaganda. However since the introduction of the Gartner hype cycle 20 years ago (which, curiously, is not a cycle, but never mind) that describes how technologies undergo phases representing their maturity, adoption level, and applicability, the term has become less negatively connoted, keeping the meaning of a communication hyperbole, but almost in the sense of passionate advocacy. According to this view, in the age of the internet and social media, all technological innovations, especially in IT, will necessarily go through some phase of “hype” before eventually becoming mainstream.

In this article, besides offering a brief review and summary of Smil’s book, I want to explore the question of how the issues of policy raised in his essay can be applied to the domain of open source software. More generally, I want to take the opportunity of the discussion around “hype and failure” vs “invention and innovation” to widen our consideration beyond the strict domain of software, and also see how principles inspired by and extending those we know from open source software are also getting traction in all domains of science and technology.

Invention and Innovation vs Hype and Failure

Smil’s essay is not about failures of designs – also known as “flops” (e.g. Titanic, Challenger, etc.), nor about the undesirable side-effects of inventions (e.g. negative environmental impacts). It is rather a succinct treatment of three categories of innovation failures: unfulfilled promises, disappointments, and eventual rejections. Smil contends that throughout history inventions always fall into one of four different categories:

  1. Tools
  2. Machines
  3. New materials
  4. New methods of production, operation and management (including information gathering and processing)

But invention is not the same as innovation! For instance, the USSR had a lot of first-rate inventions, but suffered from lack of innovations in several key areas. Conversely, China after the 1990s experienced innovations at massive scale, which were largely based on the appropriation of foreign inventions. Similar routes were taken decades earlier by Japan and South Korea, but these nations soon became “important inventive economies”. Smil uses the number of patents granted per decade in the last 200 years as a proxy for innovation, while in the same token admitting that it is simplistic and requires further qualification. And indeed he does mentions examples of stupid patents from the recent past, and points us to the EFF “Stupid patent of the month” webpage where one can find a plethora of horrifying contemporary examples, such as Amazon’s patent on white-background photography which caused ridicule in 2014. While we are at it, and closer to open source software, it may be relevant to remind us of Red Hat’s Patent Promise (explained in more details in this 2017 blog).

Smil’s second chapter focuses on “inventions that turned from welcome to undesirable”, with the prominent examples of leaded gasoline, DDT and CFCs, followed in chapter 3 by a look at inventions that were widely predicted to become dominant, but never took off, namely airships (like those of Graf Zeppelin), nuclear fission, and supersonic flight (remember the Concorde?). Regarding the second example, with the recent release of the movie Oppenheimer where the role played by the US Atomic Energy Commission in the 1950s features in the background, the larger public is again reminded of the inflated hopes that once existed around nuclear power for civil purposes and the imagined benefits nuclear engineering would bring during the second half of the 20th century. Smil then moves on in the fourth chapter to examine those “inventions that we keep waiting for”, and gives as examples “hyperloop” travel (which was first imagined in 1799!), nitrogen-fixing cereals, and controlled nuclear fusion.

Finally, the fifth and last chapter, devoted to “techno-optimism”, begins with a pointedly-formulated reminder 

that success is only one of the outcomes of our ceaseless quest for inventions; that failure can follow initial acceptance; that the bold dreams of market dominance may remain unrealized; and that even after generations of […] efforts, we may not be any closer to the commercial applications first envisaged decades ago. [p.151]

I can only concur with Smil’s statement that observations and insights about the past are likely indicators of patterns that might be repeated in the future.

This last chapter is probably the most thought-provoking part of the book. Smil offers a critical assessment of anticipated breakthroughs that have still dreams and are in his view unlikely to materialize any time soon, namely the colonization of Mars, the widespread use of brain-computer interfaces, the advent of fully self-driving cars, as well as the use of AI for medical diagnoses. In view of the recent hype around ChatGPT and other Large Language Models, his remarks about AI (“no category of modern inventions and technical advances have been so poorly and unhelpfully covered as AI” [p. 157]) will probably cause a few readers to think he is wrong on this point, but given the inflated expectations and outlandish claims we have all heard and read in the past months, his sober reminder that “neural networks are not only brittle […] but biased […], prone to catastrophic forgetting, poor in quantifying uncertainty, lacking common sense, and, perhaps most surprising, not so good at solving math problems [p. 158]” is worth keeping in mind, even though some of those assessments are being currently challenged, at least in parts. I generally agree with Smil’s conclusion that “our quest for AI is an enormously complex , multifaceted process whose progress must be measured across decades and generations” and that the larger “realm of intelligence … remains beyond the capabilities of … machines” [p. 159]. Smil then approaches the myth of “exponential innovations” (inspired by the pace of innovation in microelectronics), where he offers a powerful rebuttal of its proponents, among them Yuval Harari’s “dataism”, Azeem Azhar’s “exponential age” and Ray Kurzweil’s “singularity”. He finishes the chapter on a more positive note, by offering an enumeration of the inventions that humanity needs most (on which see also his recent essay in New Scientist: The 12 innovations we need to save humanity and the planet).

Overall, Smil’s book may not provide spectacular new theories or reveal insights that are fundamentally new to historians of technology, but his short and very readable essay offers a no-nonsense treatment of a major topic that should have the attention of every technologist today.

What does it mean for Open Source Software?

Following on Smil’s clear-sighted exposition, are there any lessons to be learned from his book that can be applied to the world of open source software? Are the principles underlying the open source movement, which is now building the foundation of our digital economy, less prone to failures and more akin to drive invention and innovation? This cannot be answered by a yes or no, of course, the reality behind those questions being quite complex. Yet there are certain attributes of the Open Source model that can help elevate the associated technological endeavors and minimize the risks of outright failures. Open Source projects are by nature collaborative and transparent. So far so good. But adoption is always a key issue. Even with the backing of big technology corporations like Google, Amazon, Microsoft, Red Hat or SAP, the popularity and wide adoption of open source projects would be unthinkable without the driving force of online communication and collaboration platforms (GitHub, Gitlab, StackOverflow, Hacker News, Reddit, Slack, Discord and countless others) and the amplifying effect of technical conferences, books and magazines. This is obviously where the danger of hype lures. But I would argue here that hype as a phenomenon in our industry has been itself totally over-hyped! The large majority of IT professionals have indeed the maturity and experience to navigate the waters of innovation without heeding the siren’s call of exaggerated claims and false expectations. And it turns out that contrary to what the Gartner Hype Cycle implies, most technological breakthroughs turn out in the end to not go through a phase of hype at all.

Of course we could fill this article with countless examples of over-hyped and failed open source projects. Some of those would make us smile. In most cases it is likely that only few readers would remember those that have long fallen into oblivion. (To mention a single example: while researching the topic I came across the Google Wave project, which I must admit I had totally forgotten about.) Technologies that once generated a lot of excitement for a short period of time have nonetheless become established and ubiquitous. Other technologies are hyped early on but take a very long time to mature. In general, the open source ecosystem around those technologies has closely reflected the trends, with projects surging, flattening and being abandoned along the way.

It lies in the nature of the open source model to encourage competing ideas and a diversity of projects spread along the full spectrum, from very mature and stable to highly experimental and dynamic. Github currently hosts 100 million repositories and serves 65 million developers. 40 million repositories were created in 2020 alone! Of course the real number of active open source projects is nowhere that big, but still, this gives an idea of the overall order of magnitude, which is staggering. By design, not all ideas or technologies succeed in being adopted and thus many projects end up being abandoned, with the typical result of getting archived and read-only on Github. A fascinating aspect of open source communities is the way technical or strategic decisions are made, which is very different to how such processes traditionally happen in closed environments. Participants and members of such communities adhere to a social contract that binds them to follow clearly-outlined processes and methods of collaboration. Those communities benefit from the advantages of collective wisdom, where highly qualified people with diverse backgrounds come together and whose opinions and expertise are put to contribution without being too strongly influenced by the immediate pressure of a single line of management.

Open Source today is very different from what it was when I had my initial experiences as an active contributor within Linux, TeX and Perl communities ca. 20 years ago. The way software is planned, developed, tested, distributed and consumed has changed tremendously. Even the last decade has experienced significant changes. A good way to get the pulse of those evolutions is by reading the yearly Octoverse report by Github on the State of Open Source, as well as Red Hat’s The State of Enterprise Open Source. Yet the ideas that were at the core of open source 30 years ago are still valid and are even being adopted in other fields, like journalism, intelligence, and, to come back to our initial discussion, science!

Open Science

Last January, the White House declared the year 2023 as the Year of Open Science. Shortly thereafter a similar announcement was made by NASA in the journal Nature. Before moving further into the topic, let us have a look at how NASA defines Open Science:

Open Science is the principle and practice of making research products and processes available to all, while respecting diverse cultures, maintaining security and privacy, and fostering collaborations, reproducibility, and equity.

Unsurprisingly, this definition is very similar to and strongly inspired by the one usually used for open source in the software industry (e.g. this definition or this one) or how public administrations define it in the context of digital sovereignty. As a movement, Open Science goes hand in hand with Open Data and Open Access. Open-Source Hardware and Open-Source Labs likewise play an increasingly important role in many scientific areas. The use of open source software, moreover, has truly become an integral and fundamental part of the Open Science movement. In the course “Open Science 101” offered by NASA, there is a full chapter devoted to open source software. It is also no accident that this course is made available as a community-driven project on Github. It is commonplace nowadays that scientific articles, even in disciplines that are not even close to computer science, are published with one or more closely-associated repositories on Github or Gitlab, where code and data are provided to make it possible for other researchers to reproduce and expand on the results presented in a paper.

One of the most fascinating manifestations of Open Science and Open Source I have recently encountered was detailed in an article on Github ReadME devoted to how the field of nuclear physics is embracing openness and open source software. You read it correctly: nuclear physics, the very scientific discipline that has been most strongly associated with closed walls and secrecy during the past century! But worry not: you won’t find recipes for building weapons on Github.

Conclusion

Open practices, whether in the field of software engineering, or more generally in science and technology, are not guarantees against hype and failure. One could even argue that these are inherently part of the open model. Innovation needs enthusiasm and passion and cannot happen without risks and experimentation. On the other hand, open source and open science, with their shared virtues of accessibility, transparency, collaboration, distributed control and decision making, and also reproducibility, bring unmatched advantages when it comes to designing, implementing and testing within different specialized domains. These processes are less likely to suffer from ill-informed decisions and noxious implementation paths. Simply put: Open Science is better science, just like open source software is better software!