The Secret Weapon That Built $300 Billion in Startups

Published on

Ryan Randomski


“Design patterns are bug reports against your programming language.”
– Peter Norvig: Director of Research at Google

During a particularly quotable Q&A session with Java Inventor James Gosling, someone asked him: “If you could do Java over again, what would you change?” “I’d leave out classes,” he replied. After the laughter died down, he explained that the real problem wasn’t classes per se, but rather implementation inheritance (the extends relationship). Interface inheritance (the implemented relationship) is preferable. You should avoid implementation inheritance whenever possible.

What’s the big deal anyway? It doesn’t seem incredibly harmful to use implementation inheritance over interface inheritance at first glance, and this is the same guy that culled null references his billion-dollar mistake. Implementation inheritance at its core adds complexity to a system that’s entirely incidental. Which is a fancy way to say your fault. This is a well-understood phenomenon popularized by the 1994 book Design Patterns by the infamous Gang Of Four. Now it’s a commonly cited principle of object-oriented programming. Often called composition over inheritance or less commonly, the composite reuse principle. 

The book is typically called Gang Of Four for short because the full title is maddeningly long. The group got the nickname because they were such titans in the software space they’re nicknamed after a group of Chinese Communist Party officials during the cultural revolution in 1966 that was charged with a series of treasonous crimes. The original Gang of Four was so powerful it remains unclear which decisions were carried out by Mao Zedong and the gang. I really enjoy this book, but I have my own criticisms that are shared among many of my software peers I’d be happy to dive into. Appropriately enough, the real Gang of Four is blamed for “the worst excesses of the societal chaos that ensued during the ten years of turmoil. Their downfall on October 6, 1976, a mere month after Mao’s death, brought about major celebrations on the streets of Beijing and marked the end of a turbulent political era in China.” It’s one of the first things on their Wikipedia. In regards to the new GOF (Gang of Four), among their most common complaints is that this book in its essence is a set of design patterns.

“Think of the GOF as helping losers lose less.”
Richard P. Gabriel: Coined the phrase “Worse Is Better”

Google’s Director of Research, Peter Norvig, has a nice presentation he made in 1998 that’s still commonly shared today, as in right now. You can find it on his website at, but even if you don’t have the time for it. I’d bookmark the website, it’s like taking a time machine back to the 90s and 2000s when everybody had personal pages. It’s fascinating, fun, and delightfully passive-aggressive at times. Like how when you visit the link, you’re greeted with 3 viewing options. The last one is “1998-style html (not recommended).”

In the presentation, he demonstrated dynamic programming languages, in this case, Lisp and Dylan, often making the problems design patterns solve either eliminated or invisible. In short, if your code is highly repetitive, that’s a sign you could be reaching for a higher-order construct. There’s always another level of abstraction to hop on. This presentation is actually a case study of the book Design Patterns, in which he showed that 16 of the 23 design patterns in the book had “qualitatively simpler implementations in Lisp or Dylan than in C++ for at least some uses of each pattern.” Now I know what you’re thinking, simpler than C++ is not a very high bar. But these are the same patterns we use in Java, C#, TypeScript, and the rest of the cast. Dynamic functional languages eliminate most of the common software challenges we face in these languages. To break down just how prevalent this is, he demonstrated potentially every design pattern you’ve ever heard of. Here are the 6 dynamic language constructs that make 16 design patterns unneeded or invisible:

  1. First-class types (6): Abstract-Factory, Flyweight, Factory-Method, State, Proxy, Chain-Of-Responsibility
  2. First-class functions (4): Command, Strategy, Template-Method, Visitor
  3. Macros (2): Interpreter, Iterator
  4. Method Combination (2): Mediator, Observer
  5. Multimethods (1): Builder
  6. Modules (1): Facade

I read GOF about 10 years ago, so I don’t remember all the common design patterns off the top of my head, but I’d say that’s literally all of the design patterns I used in enterprise code bases. So what are we all destined to program in Lisp? That’s the secret weapon? In my opinion, not necessarily. Lisp was actually dubbed the secret weapon by Eric S Raymond, but more popularized as a “secret weapon” by Paul Graham in his essay, beating the averages. It’s the story of how he created viaweb, which he sold to Yahoo as Yahoo Stores for about half a million shares of Yahoo valued at about 50 million dollars. That sale ultimately started his career as one of the most legendary Angel Investors in Silicon Valley with his infamous startup factory, YCombinator. YC is the biggest startup engine in the world, starting over 3000 companies among them being StripeAirbnbCruisePagerDutyDoorDashCoinbaseInstacartDropboxTwitch, and Reddit to name a few. His team coaches all of the companies he invests in through his teaching program startup school. The combined valuation of YC companies is $300 billion. He has more thoughts on Lisp as well, as he’s the author of ANSI Common Lisp. A book that teaches you very powerful ways to think about programs. He actually called it the 100-year language, because as computers become more powerful, we are now constrained by our ability to control them, rather than optimally. At this point, I can get several terabytes of storage extremely economically, that wasn’t true in the 80s. Let’s not ever forget, that the iPhone 6’s processors could guide 120 million Apollo-era spacecraft to the moon. Computing technology speeds and throughput grow obscenely fast. Computing power grows faster than innovation. There’s an abundance of potential.

In Paul Graham’s essay, The 100 Year Language, he draws attention to how programming languages evolve in a tree pattern just like biological evolution does. Languages such as Cobol have no intellectual descendants. He predicts the same for Java, as do we at Lambda Group. Java is no doubt a highly successful language, but from an evolutionary standpoint, it doesn’t have long-term intellectual potential. This begs the question, what languages do have strong evolutionary longevity? And can we use evidence of this to predict programming languages in 100 years? Much like the evolution of a species, evolutionary trees branch out and branch back in. You have descendants and descendants of those descendants, but then someone else’s family tree can get intertwined with yours. They’re called in-laws, and in programming languages, descendants can converge into hybrid languages just as we can combine two humans into a hybrid human. Interestingly enough, Fortran’s descendants are merging with Algol’s descendants throughout their respective lineages. Lisp has had countless descendants nonstop since it was first conceptualized in the summer of 1956. For this reason, among others, Graham’s personal stab at the 100-year problem was the programming language Arc. The idea was to have an axiomatically lightweight programming language. As a proof of concept, Paul Graham made the popular news website Hacker News in Arc, which receives about 10 million page visits a month. You don’t need a fast language to do big things, as Arc makes no optimizations for speed whatsoever. It is merely optimized for implementation simplicity. It’s all about how simple the code is and nothing else.

You may start to really see the power of Lisp in the talk, The Most Beautiful Program Ever Written by William Byrd, one of the premier researchers on logic constraint programming, program synthesis, and many other fields as well as the author of the Little Lisper and the Reasoned Schemer. William Byrd and Alan Kay both talk very highly about a small few lines of lisp code in the Lisp 1.5 Programmer’s Manual. It’s a funny few lines of code Alan Kay famously and somewhat ostentatiously called “Maxwell’s Equations of Software.”

Why such a lofty title? Maxwell’s Equations form the foundation of classical electromagnetism, classical optics, and electric circuits. They provide a mathematical model for electric, optical, and radio technologies, such as power generation, electric motors, wireless communication, lenses, radar, etc. And this code? What is it exactly? This is a Lisp interpreter, written in Lisp itself. Thus, Lisp in Lisp. William Byrd dedicated his entire career to studying this code the moment he saw it.

This code allows you to build an infinite tower of interpreters that interpret one another very easily. In a similar way to Maxwell’s Equations, this code forms a foundation of computer science. This gets at the heart of program synthesis, or programs that write programs. Paul Graham used metaprogramming to build the $50 million Yahoo Stores faster than anybody could in the dot com era. A skill he calls his secret weapon. Had he had the ability to program with more powerful constructs, he would have lost the competitive advantage he says was crucial to launching the company that ultimately raised him the capital to start YC, which launched 3000 startups.

Lisp is worth learning for the profound enlightenment experience you will have when you finally get it; that experience will make you a better programmer for the rest of your days, even if you never actually use Lisp itself a lot.
Paul Graham, Beating The Averages

He also said, much like learning Latin, Lisp won’t get you a job but it will improve your mind, and make you a better writer in languages you do want to use, like English. Now let’s talk about the Blurb Paradox. This doesn’t only apply to programming, but to many domains. It’s the paradox of abstraction itself. Blurb is a hypothetical language that falls directly in the center of the abstraction continuum. You have machine code towards the bottom, and Java much higher. Java is more abstract than machine code, so that’s how it earns its place. Blurb is more abstract than Java and Machine code, but less powerful than others. The Blurb programmer is content with programming in Blurb because they’ve led a successful career in it their whole life. They look down the abstraction continuum, and they see tedious work and repetitive actions. They look up, and they see a language that looks like Blurb, but it has a bunch of strange concepts that don’t appear practical or relevant. They think, what’s the point? This has nothing to do with solving the problem. A machine code and Java programmer look up at Blurb and see a language that looks like theirs, but with all of these senseless constructs that don’t appear to be getting to the heart of the problem. And therein lies the issue. You may look down the abstraction continuum, but not up. Paul Graham was able to use macros, a language feature unique to Lisp dialects and other homoiconic languages to great success. In fact, 20-25% of his 50 million dollar codebase was macros. That means 20-25% of his code was something that wasn’t easy to do in other languages. He was higher on the abstraction continuum, so his competition didn’t stand a chance. Yahoo immediately rewrote the code in C++ after the acquisition, and thanks to Paul Graham’s lavishly expensive golden handcuffs, he was fortunate enough to be at the heart of the operation and tell us why.

I believe the pointy-headed bosses were the driving force in the port. When I worked at Yahoo, management were nervous about the software being written in Lisp because they thought it would be hard to find programmers who knew it. Not so much that they couldn’t find any, I think, but that because it was a comparatively rare skill, management would have less leverage over the hackers. When skills are not a commodity, employees aren’t hot-swappable.
– Paul Graham after selling Yahoo Stores

The choice was made for leverage over the labor markets and thus, leverage over the programmers. We’re more afraid of using a rare skill set than writing a mock lisp interpreter in C++ and shipping that to customers for a major platform. Yes, they really did write a mock lisp interpreter in C++ to pull this move off. That’s how the Yahoo Stores acquisition was pulled off on a technical end. Now we ask, is Lisp the secret weapon? Not really, it’s abstraction itself. As long as you remain higher on the continuum, you will have tremendous leverage over the programmers who are coding in Blurb. This is the difference between Java and machine code. You may have similar abilities to your competition, but you have significantly more leverage. You’re working with power tools and leaving them in the stone age.

Here’s another step up the abstraction ladder from lisp itself. Actually,  more like a leap. Remember how William Byrd dedicated his career to studying the ostentatiously named “Maxwell’s Equations of Software”? His primary work is currently in a system called minikanren. The name kanren comes from a Japanese word (関連) meaning “relation”. The partners of Lambda Group carefully study William Byrd’s Ph.D. dissertation, Relational Programming in miniKanren: Techniques, Applications, and Implementations. It’s about a 300-page long description of how to implement in a very simple and practical way, a playground to do nearly any kind of relational programming you want. In short, minikanren is a standard library for building logic programming languages. Why build this? Well in his case, you could make a compelling case that it’s the awe and wonder of it all. Much like one of Lambda’s partners, William Byrd learned to program at the age of 9 years old. His father bought him a CoCo.

Being a color computer released in 1980, it was at one point in time, the most popular computer in the world. Radioshack even taught classes on how to use these things. If you want to play around on one of these things, you don’t need to buy one. Thanks to the magic of emulators you can run one of these bad boys right in your browser. So you too can write William Byrd’s first-ever computer program from 40 years ago.

I guess you could say he bricked it. Error right off the bat! It might not look so impressive at first glance. When he opened up the machine and immediately asked for something quite optimistic. Perhaps the code here isn’t as impressive as the optimism. And to keep the optimism up for 40 years is the kind of passion for computing that truly drives innovation. This is one of the most historically innovative programs ever written, not because of what it does, but because of where it leads.

When someone says “I want a programming language in which I need only say what I wish done,” give him a lollipop.
Alan Perlis

The epigram above is a jest intended to mock this sort of thinking. Even more humorous, this is a programming philosophy Byrd calls “ Lollipop Driven Development”. This philosophy is our compass for the next 100 years of software development. Search for lollipops and we’ll find innovation. And just as Perlis, who’s an artificial intelligence pioneer, by the way, mocks this optimism, equally as cynical is the immensely famous Fred Brookes’s white paper called No Silver Bullet. A thesis not proven wrong in half a century that essentially says we’re doomed. Brooks opens it right up with, “There is no single development, in either technology or management technique, which by itself promises even one order-of-magnitude improvement within a decade in productivity, in reliability, in simplicity.” Which admittedly, is a huge buzzkill. It’s actually only 36 years old, but he had this thesis much earlier in his career. The paper looks over every hyped-up technology that has ever promised to bring 10x growth to software productivity and systematically shoots them all down in a very solemn and melancholy tone. Just read this: “Einstein repeatedly argued that there must be simplified explanations of nature, because God is not capricious or arbitrary. No such faith comforts the software engineer.” This guy can not be fun at parties, right? Well, on the contrary, he’s really quite interesting to talk to. There’s a YouTube videoon PapersWeLove where some speakers took him out for pizza and soda. He’s quite the charming conversationalist despite the bleak paper. Interestingly enough, another white paper titan, Out Of The Tar Pit, a 2006 paper of about 66 pages, walks us through the “software crisis”. Is just about the antithesis of No Silver Bullet. It’s a remarkable read that outlines the mechanisms that turn all of our codebases into big balls of mud, hairballs, tar pits, spaghetti, or big hairy monsters, to name a few. So here’s the analogy, much like your legacy systems, a tar pit caused many a wooly mammoth to perish. And while it’s stuck, some Saber-toothed tigers think they look tasty and jump into the fray. Well, now they both meet their demise. Next thing you know some vultures want a free lunch, and they swoop in and it turns out the same way for them. The tar pit is an analogy to complexity. And the animals, your software. Much like a legacy system growing stronger and devouring your organization like black hole-eating planets, software complexity can quickly gobble up software systems if left unchecked. It’s game architects and principally engineers play on a daily basis. Software engineering is much like Icarus trying not to fly into the sun. We juggle this every day. Brookes would have us believe this will always be the case, but at the end of the paper is the lollipop. What’s the light at the end of the tunnel? Relational programming. That’s why we will be playing the 100-year language game with Paul Graham and the Wayne Gretzky Game with Xerox Parc. Minikanren is much closer to the 100-year language than Arc is.

So what’s the big deal with relations? Relations don’t specify algorithms like other sorts of programs. It’s not much like computer programming at all. There’s no algorithm, only facts, and assertions. It’s entirely non-deterministic. A variable X could be 5, nonexistent, be every even number, or even as William Byrd once demonstrated, every program in the universe.

Toto, I don’t think we’re in Kansas anymore..
Dorothy: The Wizard Of Oz

Every program in the universe? Try that out as a leetcode problem. How difficult is it to pull off a feat like that? Well, not hard at all. He did it live coding! And effortlessly playing around in the minikanren playground. In fact, he was jamming out. It’s not entirely clear if that was his goal. Say you get a really difficult problem on a job interview. They say, write a program that solves a Rubik’s cube and do it in 20 minutes. With our present tooling, this is just cruel and unusual punishment. With relational programming, you simply must state the definition of a Rubik’s cube. 3×3 colors on each side, you can rotate any row or column, and there’s a state in which each side is full of a color distinct from all other sides. That took about 20 seconds to write, and a solution emerges. You’ve said the problem, then you ask for the solution. Not only that, you can now ask for every possible solution to every possible Rubik’s cube problem. You may also ask for the most optimal solution set to any given Rubik’s cube problem. That’s what we mean by lollipops. Ask and you shall receive. Now we’re cooking with gas.

How long of a journey is it from point A to point Z and how do we get there? Nobody ever said it would be easy. This is the spot we should be aiming for with software if we have any hope at all. The childlike awe and wonder of having the courage to get to a place like that is more a story of love and passion for the machine than a story of grit and intellect. Always be looking for lollipops. If you’re trying to innovate, never forget this CoCo prompt:




Scroll to Top