Notes Information Apocalypse

The Missing Literary Traditions of Programming

There has been a huge interest in the size matters problem of large code bases, general issues of software bloat and why writing expressive code is so hard. I want to look at these problems by forgetting about the properties of the programming languages themselves, and thinking about the context of what is actually written in these languages.

Clearly, Unix, Linux, BSD, emacs, awk, sed, bash and their progeny form an archive, a historical connection that continues to support an almost uncomprehendably vast cloud of internet infrastructure, but a reasonable argument can be made that these programs are also a literary tradition. This is by no means a unique view - a lot of people have discussed the relationship of Unix to the culture of the word and these views have even led to certain controversy. Richard P. Gabriel (author of the infamous Worse Is Better) has often talked at length about the poetry of programming and the relationship between programming and writing. Donald Knuth has written at length on the importance of what he calls literate programming.

It is possible to push the idea of code as literature too far - many of the important and significant developments in computer science and software design can not be considered a literary tradition at all, or better still, they could be considered part of the wider literary tradition of set theory, modern mathematics, science, and natural philosophy. However, it is useful to evaluate developments in the history of programming languages by considering the influence of literature on the evolution of the English language. English has become the global information currency for science and technology, and has spawned completely unexpected new forms such as rap in hip hop and Joycean stream of consciousness prose. All of this is possible because the English language has a massive body of literature for new speakers and writers to read and listen to, draw upon, understand, and grow from.

Hands wring and bloggers, writers and programmers (myself included) continue to expectorate countless phlegmatic gobs of words debating questions like "why is writing code so hard?". Yet, simply blaming closed corporate software models for this mediocrity is considered heresy by many. But just pause and think about it for a moment...

Imagine a programmer who had never downloaded and explored the open source code that was considered an outstanding work in the particular field they were working in. Would they be as capable of producing a master work in this field? Would they be able to even judge what a master work was?

All of the greatest works of literature in prose, poetry, written language, are published, accessible in physical paper and sometimes online They're not often free, but once you get hold of a book, you are allowed to read every single word. Despite the fact that code can be considered literature, software does not work in the same way as regular literature. Software multiplies and transforms information as it runs, but it is generally created in a static written form.

When this code (writing) is locked away, there is a certain trend of literary innovation that is never accessible for voracious young readers in search of inspiration and guidance. Instead, it remains hidden, used to create a fulcrum for extracting value that helps some companies to become vastly rich. Nothing wrong with that is there? But it also means that there is no way aspiring authors can read it and no way that regular end users can choose whether the software they are paying for is a great work of literature or a mediocre hack job. They can only try and manipulate dodgy human factors via HR departments, in the hope by proxy that selecting the right team of individuals will lead them to a great piece of literature. Boom Bust! Wrong strategy. Due to bureaucratic and corporate sales pressures, mediocrity triumphs, featuritis rules the software industry. Adding new baubles and warts'n'all complexity trumps clarity, internal consistency and elegance. Given a few years and a few marketing dollars, some of the messiest and most mangled and flawed insanity that has ever been written becomes installed on millions of computers around the world, sporadically plunging the entire industry into a morass of technical debt. The worse strategy succeeds, but it also means that very little of the truly amazing work that is being done makes the leap from the lab into the living room.

It is possible to change and break out of this cycle through language evolution. My view is that we don't have to see the software industry as swallowing marketroid crap and digesting whatever shit falls out of the assholes of the early adopters into the mouths of consultants and stomachs of larger enterprise companies, before passing through the bowels of government. If enough programmers assert their craft as a literary art, they will team up with scientists, hackers, and concerned citizens to change the world and invent their own business models (whether for better or for worse is up to their personality and the views of many others). Computing is ubiquitious in our society, and programmers are collectively one of the most powerful groups of change agents imaginable.

Authors face a tricky tightrope walk between writing because you have to, writing to enlighten and influence others, writing to entertain, and writing to generate intellectual property. Sometime market competition does lead to fantastic improvements in quality, but when the entire industry gravitates en masse towards hoarding and locking away their best code as property ownership rather than authorship, what hope is there for a much wider improvement in programming as a craft? What happened to the old school world of hackable machines where any kid could plug away and tinker with the basic architecture of computing? After the 1980's, hardware acceleration and consumerism has taught people to expect the basic tools and products they use every day to be locked boxes - ephemeral and throwaway. Convenience over conservation.

At this point, many programmers would probably want to argue that it is the result that matters, not whether the original code is beautiful or not. We should not lose sight of this. Yet who could deny that badly formed, incoherent code leads to badly working software? The relationship between written code and the resulting software is so subtle and delicate, it seems there's no way out of this contradiction unless we accept that there must be a literary component to code, whether we can define it or not.

Programmers are driven mad by the tension between writing the simplest imperative thing that can possible work and the organic desire of naming, classifying, and annotating types to create a brutalist architectural utopia of static structure. Larger, regimented software projects can easily gravitate to an extreme position on this spectrum. Statically annotated object oriented languages can utterly fail on a global ecological level in large software projects, just as giant mega-concrete architectures create horrific urban ghettos. The irony here is that Perl, the most poetic language of them all has probably contributed as much, if not more than Java to this excruciating disintegration of the imagination. Perl deviates in precisely the opposite way, towards slums and shantytown architectures, disgustingly organic in comparison to the symmetric brutalist beauty of Java's object towers, but equally (if not more) torturous on a larger software project. Now that these problems are out in the open and widely discussed, it is becoming more clear that the very fact we can use such architectural analogies or metaphors to describe large software projects is a strong indication that something is going badly wrong.

The deeper problem here is an abstract confusion of how granularity should exist in codebases. Many of the scaling problems of programming languages are caused by overly rigid syntax and the weakness of file, directory, and package hierarchies for organizing code to express a cohesive purpose. The cult of modularity and reusability has blinded us to these weaknesses. If it wasn't for the creative influence of authors like Ward Cunningham who are prepared to question any assumption about code style, then we might be completely lost.

Prevailing wisdom in the software industry by and large, focuses on helping programmers become writers of adequate code rather than great code. But adequate writing is never enough to transform, improve, and change the conditions of the language itself, which is what leads to the possibility of truly creative and expressive writing. If a language can be remade and tweaked as it is written statically, as well as when it is run, then syntax will no longer constrain designers, it will be an expressive tool. This hasn't happened yet, despite the fact that the technology and theory to do this has existed for a much longer time than even Java and Perl have been around for.

It could be that the whole reason why the software industry is still slowly struggling to reinvent Lisp and Smalltalk is because of a corporate legacy that has locked away and standardized what could have otherwise evolved in the organic direction of popular literary traditions. If computer code was more fully accepted as literature from the very outset, this would have allowed the shift towards more expressive languages to occur much sooner. We know that the evolution of spoken and written languages is shaped by the way they are used through feedback cycles of patois, slangs, and dialects, as well as via creative literary expression. Static thinking and metaphors of construction, de-facto architectures of dirt and <a href=/software-is-not-made-of-bricks">software as bricks</a> has crippled the development of programming languages to such a great extent that there has never been a brilliant literary revolution in the way that Shakespeare influenced the English language (I'll leave it up to you the reader, to consider the alternative view that we could describe either Lisp or Smalltalk as being the Shakespearean influence on the history of programming languages).

The reasons why the revolutionary software environments of Lisp and Smalltalk failed are largely cultural, but the mean effect was that many lessons learned did not flow into the mainstream programming culture until much later. In the background, during the 70's and 80's, the whole internet infrastructure was already evolving towards openness and massive global interconnectivity. All the kids were watching E.T and Ghostbusters and playing simple video games, having no idea of what was about to hit them. On the early computers of this time, children were exposed to BASIC and Assembly Language, rather than Smalltalk and expressive object oriented code, which was a very positive thing in many ways, except that it created an entire generation of programmers who still struggle to understand code as literature and treat accidental complexity and verbose syntax as inevitable. New programmers now will be far less liable to fall into this trap, so it will be interesting to see what the next few years might bring.

The Smalltalk language was never intended as a pedagogic statement of how to design better software. A huge part of its purpose was to support experiments into new computer interfaces such as the mouse, drawing and graphics painting, and overlapping draggable windows. All those things are now so taken for granted that they have become invisible, like grammar is to the speakers of a popular language. But Smalltalk (both through accidental and deliberate means) provided the prototype and template for much of the last 30 years of personal computing.

Unfortunately, one of the most important features of the Smalltalk language is yet to reach widespread acceptance, yet it is probably this one single feature that is most important for computer languages to become closer to natural languages in their expressive and creative power. A lesson learned from natural languages is that they bootstrap themselves, they use themselves to evolve. In the programming world, languages that can do this are often described as 'self hosted' languages, which in essence means that the language can be written inside itself, rather than built up on top of an external lower level language (read about meta-circular interpreters for a better explanation of the nuances and importance of this). One wonders if this ability of languages to host themselves is nothing but the whole point of actually having computer languages in the first place. Rather than shedding tears of lament for losing the 'turtles all the way down' Lisp machines or the 'objects as membranes, membranes as actors' beauty of Smalltalk, we should strive to understand what is so important about these concepts and how they can help us clean up the mess that we are currently in.

The reason why we still read Shakespeare today is because he blew English apart with incredibly expressive metaphors and new uses of words that have since allowed the average English user a much richer and more expressive way of communicating. Programming languages need to be open to this potential too, but it can't happen until we establish greater literary traditions that blast away the current situation where C is still the canon and most of the greatest code in existence has only ever been read by a small armful of individuals.

In the programming and computer science community, I believe our influential leaders and educators need to place a much greater emphasis on code as literature, encouraging programmers and designers to think about their work from the same motivations as a writer of great novels, a poet, or a cutting edge journalist would. The end result, making users happy, is always the goal, but in our frameworks, libraries, and languages, we should be pushing for continuous expressive improvement and we should be making the code open to as many new programmers as possible to learn from.

I don't believe for one second that the closed source model is the only way to develop innovative software. Innovation simply requires keen investment and a carefully managed collaborative environment that enables authors to take the biggest possible risks, while remaining below a certain threshold of despair that will destroy a project.

Allowing creators to feel free enough to fail enables them to take the risks that lead to success. Innovative software may benefit from beginning behind closed doors, but once an idea is solid and starting to stabilize, it can quickly be led into the open. What is often forgotten is that code can still be open without being released for free under a general public license (as if © had had a small influence on written literature? ha!) It can be paid for and supported by a corporate organization or managed by a non-profit. If the code is good enough, there are few limits.

It is simply not true that open source programmers have their heads stuck in a 1970's intellectual framework, only that the world is still dominated by a literary tradition of code from those particular works in the 1970's. Denying the value of developing new and more radical open source literature is to look forward to software and programming languages continuing to languish for another 30 years.