Sturgeon's Law in the Age of LLMs

Sturgeon’s law (or Sturgeon’s revelation) is an adage stating “ninety percent of everything is crap”.

(or as I colorfully heard it today in another context: spam to ham ratio is high!)

And so it goes with LLMs as well: what they produce is 90 percent crap. I work with these tools daily and have been since 2022, first with Copilot and ChatGPT, then with Claude and Cursor, and now back on Copilot, Claude and ChatGPT. But what they produce is (90%) crap and I only use them in specific, well defined areas, where I can minimize their crappiness or, if I slip, their crappiness will not impact the final product:

Generating documentation and comments based on code, comments, and often transcripts of my videos explaining what I am trying to do. They are vastly better than I am at this as I will never, in natural languages, achieve the level of consistency that I think documentation and comments merit. I have to edit what they generate but in general they do save me a ton of work there. An example is this is the work I did here a couple of weeks ago. All the code was written by me but comments were generated with a prompt to Copilot.
Generating unit tests de novo for new or old modules. In the context of directly commercialized applications (rather than deeper components), I give absolute preference to end-to-end tests rather than unit tests. Adding end-to-end tests runner is actually the first thing I build on any important new project. Still, before performing major changes, I think it’s a good practice to have unit tests for specifically isolated modules. Some of it is practical: it’s much faster to run unit tests than end-to-end tests. However, before the advent of LLMs, the juice was rarely worth the squeeze (even at 100% coverage). Yet now, by leveraging LLMs, adding and expanding unit tests is cheap and that’s a great thing.
Front-end work: for training reasons, LLMs are especially well adapted to popular front-end frameworks like React, in which I only have instrumental rather than intellectual interests. I am skillful enough to describe what I want to see and LLMs shine here and my role rarely goes beyond nibbling at the edges.
Scripts and infrastructure-as-code: because it is not in the critical path of the code execution but rather has a supportive role in product development, I rely heavily on LLMs. The leisure of not having to read (too often) Terraform documentation is just too great.

In all of these cases, I have rarely had to throw away LLM output and I have rarely if ever struggled to make LLM to make it all work (only in two specific front-end cases IIRC). Notice the pattern: it’s all on the leaf level, never on branch of deeper - there is just no building actual decent software design with LLMs as they are today. And in my case none of it is product code (front-end work is strictly for fun) Even building a slightly more complex modules are just exercises in frustration as they produce 90% crap (more in my case but let’s keep it simple) which is not surprised because they were trained on everything and “ninety percent of everything is crap”. And no, especially when it comes to programming, reinforcement learning from human feedback cannot help as it does not scale: we would have to employ the best programmers to be the humans in the reinforcement loop and how long could one do that before going nuts from seeing all the crap? And there is no known algorithm that can actually recognize great code from average code (and I can’t imagine one short of AGI) and thus improve the training corpus.

Which leads me to the following: if I can recognize that 90% of what LLMs produce is crap, does that happen to other experts in their fields? And do we then happily ignore that at our detriment, as in Gell-Mann amnesia effect:

The Gell-Mann amnesia effect is a cognitive bias describing the tendency of individuals to critically assess media reports in a domain they are knowledgeable about, yet continue to trust reporting in other areas despite recognizing similar potential inaccuracies.

Using LLM to generate answers on algorithms greatly expands my repertoire of possible movies (e.g. I got to Alias method through LLM), giving me access to a far greater body of knowledge that I personally have. And unlike search, it serves me in a form with which I can interact. But actually writing working production code with LLMs is not on the menu for me, and not from the lack of trying, at least if the software is challenging enough.

Last modified on 2025-08-31