The Average Machine

When you ask one of the frontier AI models to write something for you, the result is usually shitty. The writing is technically competent. The structure is fine. And yet when you read it back you think, this could have been written by anyone. Which is roughly true. It has in fact been written by **everyone**. It's writing by the biggest committee ever created. A large language model (LLM) like Claude, ChatGPT, or Gemini, stripped of marketing, is a math function that takes some input and produces output based on statistical patterns in its training data. That's the whole mechanism. It doesn't understand your question. It doesn't understand your code. It has no concept of bad. It has no concepts at all. Everything you experience on top of it, including the chat interface, the apparent personality, the seeming reasoning, is built on that loop running billions of times. Imagine a room containing every human who ever wrote anything. Ask all the poets how they'd finish a sentence, then tally the votes and take the most common answer. Ask all the lawyers and do the same. The model takes whatever frame you give it and produces what the average voice within that frame would say. That's why when you tell a model "you are a poet" the distribution shifts toward poet-flavored continuations. Tell it "you are a contract lawyer," and it shifts toward contract-flavored language. Give it a long, specific prompt and it shifts toward even more specific corners of the consensus. Who'd have thunk one of the most consequential pieces of technology built in the last decade was essentially an autocomplete chatbot trained on everything from collected ramblings of Reddit, Wikipedia articles, SEO sludge right through to books, academic papers and peer-reviewed publications. The writing is mediocre not because the data is junk, but because averaging anything, even excellent things, produces grey mush. While this may sound like a criticism, for most of what you'll use these tools for, that's exactly what you want. # When Average Is the Whole Point When I want an LLM to write a contract, I don't want it to be creative. I want it to use language that follows standard conventions so that it will be understood instantly. I don't want it to introduce ambiguity by trying to say something familiar in a clever new way. A contract that surprises is a contract that gets disputed. The whole point of legal language is that it has been tested to mean what it means, and the test is decades of other people having used it without confusion. Same with a business report. I don't want an executive summary for board members written in a unique and whimsical way. I want it to follow the conventions they already know and understand so they can get the information without having to think about how it's being delivered. This applies to almost everything that gets called "knowledge work." Contracts, project updates, meeting summaries, technical documentation, job descriptions, boilerplate code, customer service replies. The vast majority of writing produced inside companies isn't supposed to be original. It's supposed to be clear, conventional, and forgettable. The reader's job is to act on the information, not to admire the prose. Regression to the mean is, for these uses, the entire feature. You're trying to produce something predictable, in the literal sense: something the reader can predict the shape of and therefore process quickly. Conventional consensus is precisely the right tool for that, and it's the reason these models are used by hundreds of millions of people to make their work faster, cheaper and in most cases better. The mistake is generalizing from "this thing produces excellent conventional output" to "this thing can do everything a human can do." # The Double-Edged Sword The averaging works in both directions. For people whose work was below the mean, the model pulls the work *up* toward competence. The junior analyst writing their first report. The non-native English speaker drafting a customer email. The technician writing documentation outside their natural strengths. That's an enormous gift. For people whose work was above the mean, the model pulls their work *down* from unique capability toward the same competence the novice has been pulled up to. The domain expert. The experienced operator. The writer with a distinctive voice. The mechanism is now a penalty, not a gift. Expertise lives in the long tail and the model can't reach it. You want a tax return near the mean. Far below the mean and you're missing standard deductions, claiming nothing and unnecessarily paying too much tax. Far above the mean and you're claiming things nobody claims, structuring things in unprecedented ways and earning yourself an audit. The narrow band of "competent and conventional" is exactly the right place to sit for that document. An LLM trained on thousands of pages of tax code will land you in that band reliably, and that's why it's a great tool for the job. The fluent, well-structured, predictable output cuts the other way where averageness is fatal. Thought leadership. Creative writing. Strategic positioning. Voice-driven personal brand content. Anything where the value is in saying something the existing consensus is wrong about, or in saying something familiar in a way nobody has said before, or in having a take that's recognizably *yours* and not interchangeable with anyone else's. The danger is that the average now sounds *good* and good is the enemy of *great*. You've outsourced the one part of your work that distinguished you. When it comes to a positioning statement for your business, you don't want the mean. The mean is the wallpaper of LinkedIn: "We help [target] achieve [outcome] through [method]." Every competitor sits at the mean. The mean is where you become invisible. What you want is the long tail: the specific, the unexpected, the formulation nobody else has reached that makes your audience feel something. An LLM trained on existing positioning statements will pull you back toward the mean every time, because that's what it's built to do. The tax return wants the center. The positioning statement wants the edge. Both are the same person's work in the same week. The mistake is treating them the same way. Expertise is a long-tail phenomenon. The reason an experienced lawyer is worth more than a junior is that the experienced lawyer has seen the cases that don't fit the standard pattern, and knows when to recognize one. The value of expertise is precisely the part that's *not* in the mean. It's the rare and the unusual, and those cases require the practitioner to depart from convention because convention will get this particular one wrong. That's what the human expert is for, and that's what the human expert is going to keep being paid for. # The Wisdom of Crowds In 1906 Francis Galton noticed at a country fair that the average of 787 guesses about the weight of an ox was almost exactly correct. It was closer than any individual guess, including those of cattle experts. Prediction markets work on the same principle. Wikipedia works on the same principle. The aggregation of many independent human judgments, under the right conditions, reliably produces something more accurate than most individual humans can produce alone. This phenomenon has substantial literature behind it. What we didn't have, until recently, was a way to access this mechanism on demand for any question we could pose. That, more or less, is what an LLM approximates: a compressed statistical memory of many crowds. You ask a question; the model returns the average of millions of human judgments relevant to that question. It's a new interface to access our collective intelligence. This is why LLM output is so often surprisingly competent. You're not getting one person's opinion, you're getting an aggregation that reliably outperforms most individuals. The magic behind the wisdom of crowds is that it averages. The mechanism works on problems that have a single correct answer and fails on problems that require deep expertise, novel reasoning, or stepping outside the consensus frame that the crowd doesn't have. Individual errors are canceled out and the crowd lands close to the truth even when no individual does. This is perfect for questions like the weight of an ox, the number of jellybeans in a jar or "what's a typical CAC for B2B SaaS?" For questions that step outside the frame that everybody shares, the right answer isn't a midpoint between existing views but rather somewhere none of them are. Averaging here amplifies these errors. The same conditions that make a crowd wise about an ox's weight make it stupid about whether heavier-than-air machines can fly, in 1903. Further, most of what we call expertise is recomposition rather than novelty. The senior accountant isn't inventing new accounting but rather applying known principles to this year's return. The strategist isn't deriving frameworks from first principles but recombining frameworks internalized over a career. Genuine novelty is rare even within expert work. Most expert hours are spent on recomposition and sophisticated pattern-matching against a deeper library than the average practitioner has. And LLMs are geniuses at pattern matching and recomposition. We've been using the phenomenon of wisdom of crowds for a long time. Every functioning market is a wisdom-of-crowds machine, every well-run democracy is one, every working scientific discovery is one. LLMs are the wisdom of crowds at scale. What's new is that we can now access this wisdom and get an answer in seconds. There's a simple test for whether the task you're undertaking sits in your unique genius zone or whether the wisdom of crowds will suffice. Ask yourself: would I be upset if my audience received exactly this content from one of my competitors? If the answer is no then averageness is fine, and the model is the right tool. If the answer is yes because of what makes your version distinctively yours then you can't use the model to do the work, only to support it. # Artificial Stupidity Despite what we've been led to believe, there's no such thing as artificial **intelligence**. LLMs don't "think." They produce output based on statistical patterns in their training data. It's a magic trick like a parrot saying 'I love you.' The bird has been trained to produce certain sounds but it has no idea what love is, what 'I' refers to, or that it's even communicating. It's just reproducing patterns for the benefit of its owner. It's easy to be faked out by the sophistication of an LLM's mimicry. The artificial nature of this becomes apparent in domains where you have deep expertise. However, in unfamiliar territory, the illusion can be much harder to spot. The Gell-Mann Amnesia effect is a phenomenon coined by author Michael Crichton and goes like this. You read a newspaper article about something you understand well, maybe it's in your field, or about your hometown, or an event you witnessed. You notice it's full of errors, misunderstandings, and confident wrongness. You get annoyed. Then you turn the page and read the next article, about something you don't know, and for some inexplicable reason you trust it. The journalist got the thing you can check completely wrong, why would they get the next thing right? But we systematically forget this and re-extend trust each time we encounter content outside our expertise. This is exactly what happens with LLMs. The subtle errors. The confident-sounding nonsense. The plausible-but-wrong citations. The missing nuance that an experienced practitioner would catch instantly. You spot the bullshit in your domain, shake your head, and then turn the page and trust it completely on something else. The parrot sounds fluent in every language you don't speak. Ask the model to write code, and if you don't program you'll think it's brilliant because it runs. An experienced software engineer looking at the same output will see 1000 lines of bloated, poorly-structured code where 100 lines would do the job more efficiently. Or ask the model to write something for you, and if your written English is below average you might think it's great writing because by your standards it is. # Jevons Paradox James Watt's improved steam engine (patented 1769) was marketed as a coal-saving device. His engine used roughly 75% less coal than earlier models to do the same work. The expectation was simple: more efficient engines = less coal burned. However, in 1865 William Stanley Jevons noticed that more efficient steam engines didn't reduce coal consumption but rather increased it. It turns out when you make something cheaper to use, people use a lot more of it, and total consumption goes up rather than down. This observation in economics is called Jevons Paradox. The same is becoming true of knowledge work. The cost of producing competent text, code, images, and analysis is collapsing. The naive prediction is that the people who produce these things will lose their jobs or go out of business. The Jevons prediction is that we'll just produce vastly more of all of them. This has implications for what scarcity looks like in the AI era. When average is abundant, average becomes worthless as a competitive position. The remaining value lives in the things average can't do: original ideas, taste, conviction, doing things nobody asked for, deciding which of a thousand options is the one that matters. These human qualities are going to be priced like beachfront property. # The Einstein Test Here's a thought experiment. Imagine you trained a state-of-the-art LLM on everything humans had written up to the year 1904. Every book, every paper, every letter. Anything written after that, you carefully kept out. Then you asked it the questions that were puzzling physicists at the time. Strange experimental results nobody could explain. Equations that didn't quite fit together. What would it produce? It would produce the answers the smartest physicists of 1904 were already producing. Clever patches. Adjustments to existing theories. Reasons the strange results might be measurement errors. It wouldn't produce Einstein's theory of relativity, which threw out assumptions everyone had taken for granted. It can't step outside the room. Human beings can step outside frames. Einstein, Darwin, Galileo. None of them produced the consensus of their training data. They produced what the consensus said was wrong, and turned out to be right. That gap between producing excellent average answers and producing the answer that overturns the average isn't one that more computing power will close. It's a gap between two completely different kinds of thinking. One we know how to scale. The other we don't know how to do at all, in any system we've built. This is not a temporary limitation that a bigger model will fix. Train a system to reproduce patterns in human writing and it will get better at reproducing those patterns. It will not, by getting better at that, become something different. # How to Actually Use This You do two types of knowledge work. The type where conventional, predictable output is exactly what you want, which, if you're honest, is probably most of your work. And the type that requires taste, conviction, judgment, and the willingness to say something the consensus would reject. The first set is most of your hours. The second set is most of your value. The way to apply this to your own work is to ask, honestly, where you sit for each thing you do. The honest answer is rarely uniform. Most people are above average at some things and below average at others, often within the same job. I am not a qualified tax lawyer. For anything tax-related I'm well below the average accountant, so the model will pull me up toward competent. I should use it aggressively there. The same is true of legal contracts, Python code, and languages I don't speak. Below the mean, the model is a gift. I am, by reasonable measure, an above-average writer. For writing, particularly the writing that has my name on it and represents my judgment, I'm above the mean, and the model will pull me down toward competent. I should use it carefully there. Not never; carefully. Use it for research, for structural critique, for finding the weak paragraph, for generating counter-arguments. But it's not likely to produce the big perspective-shifting idea or the unique voice. Bad AI use asks the model to supply the taste. Good AI use supplies the taste and asks the model to do the labor. --- **Liked this?** Join 200,000+ readers who get my twice weekly newsletter: <div class="newsletter-form"></div> (It's like your smart, good-looking friend who gives you awesome, actionable marketing and business growth ideas in 5-minutes or less.)