AI Broke the Zero Marginal Cost Myth of the Internet

I have been feeling this more strongly lately: AI applications are not quite like traditional internet products. They feel more like the internet with a factory attached to the back.

That sounds strange at first. AI still arrives through web pages, apps, APIs, subscriptions, memberships, SaaS dashboards, and all the old software vocabulary. A user opens a page, types a sentence, gets an answer. From the outside, it does not look fundamentally different from search, messaging, or any online tool.

But the bill tells a different story.

Much of the old internet’s magic came from copying and distribution. Building a search engine is expensive. Building an e-commerce platform is expensive. Building a social network is even more expensive. But once the system is running, the extra cost of serving one more user is often much smaller.

That extra cost is called marginal cost.

Chinese version of this article

An AI factory behind the internet

Low marginal cost is what made many internet habits feel natural: free products, subsidies, growth before monetization, “get users first and figure out the business later.” Once users arrive, there is always some story about ads, memberships, commissions, games, finance, cloud services, or something else that can pay the bill later.

Hear that story often enough, and it creates an illusion: software is basically free to copy.

AI breaks that illusion.

The Old Internet Was Closer to Printing

Traditional internet products were never actually free to run.

Servers cost money. Bandwidth costs money. Storage costs money. Engineers cost much more. At large scale, the infrastructure bill of an internet company is not some rounding error.

But the basic character of the internet was still copying and distribution.

An article can be written once and read by ten thousand people. A product detail page can be built once and opened by ten thousand shoppers. A social post can enter many feeds. A search index, once built, can serve countless queries. There are still caches, databases, recommendation systems, ad systems, and moderation systems behind it all, but the broad pattern is the same: take something that already exists and deliver it more efficiently.

In that sense, the internet was closer to printing.

The first copy of a book is hard. Editing, layout, plates, machines, logistics, all of it costs money. But once the machine is running, printing more copies brings the unit cost down. The internet pushed this logic so far that people almost forgot the paper and ink existed.

That is why early internet companies could burn money with some internal logic. More users meant more data, stronger network effects, and costs spread across a larger base. Growth looked like a road toward victory. Many companies died on that road, of course, but the logic itself was coherent.

AI is different. Much of what AI produces is not a book printed in advance. It is more like firing up the furnace after the user arrives.

AI Is Closer to Piecework Production

A user asks a question, and the model runs inference. A user asks for a long document summary, and the model reads context and runs inference again. A user asks for an image, and a more expensive image model may run for longer. A user starts an agent task that searches, reads files, writes code, and runs tests, and the cost becomes a chain of production steps.

AI as on-demand production

The software shell is still there, but the factory starts showing through.

Tokens are raw materials. GPU time is machine time. VRAM is workshop capacity. Model quality is equipment precision. Context length is process complexity. API calls are outsourced manufacturing. Self-hosting a model is buying machines and building your own line.

This is not a perfect economic model, but it is a useful one for developers, because it forces a plain question:

When one more user arrives, are you making money or losing money?

In the past, a small online tool mostly worried about whether the server could handle traffic, whether the database was slow, or whether bandwidth would spike. AI applications add a sharper question: the server may survive, but will the wallet survive?

I wrote about this before in The Dilemma of AI Application Developers. One example was an AI image generation mini program I built. Even with elastic deployment, starting the GPU only when there was a user request and charging by the second, one generated image still cost about 0.1 to 0.2 RMB.

That sounds cheap for one image.

But if the feature is free, the meaning changes completely. A user generates one image, and the developer pays a little. A user generates ten images, and the developer pays more. The user thinks, “This is fun.” The developer looks at the bill and thinks, “This is not going well.”

That is the difference between many AI tools and ordinary internet tools. It is not just a few more page views or clicks. Every real use can become a real cost.

Free Starts to Feel Heavy

Internet products love being free.

Free email, free cloud storage, free social networks, free content, free utilities. None of them were truly free, of course. Someone paid through ads, memberships, data, ecosystem lock-in, or some delayed business model. Users just did not feel the cost directly, at least not at the beginning.

Free AI applications are more awkward.

A user is not merely taking up an account, a few database rows, or some storage. Once the user actually uses the model, cost starts burning. Long context, multi-turn conversations, image generation, speech generation, video generation, web search, code execution: the stronger these features become, the less they resemble air.

So questions that used to be postponed now have to be answered early.

Can anonymous users use it? How much free quota should there be? Should expensive models be restricted? Do failed retries count against quota? What happens if the API is abused? If a user only plays with it once and leaves, who pays for that? Can the revenue from paid users cover the model bill?

These look like business questions. In code, they are engineering questions.

You need login. You need quota. You need rate limits. You need caching. You need queues. You need model tiers. You need cost monitoring. You need protection against people treating your API like a public tap. In the old days, a small web utility could sometimes launch in a fairly naked state and survive. Launching a naked AI utility is more like putting a running machine on the street with a note that says: free to use.

People will use it.

The question is who pays for the electricity.

Growth Can Become Dangerous

Internet people are usually afraid that nobody will use their product.

AI products are afraid of that too. But they are also afraid of something else: too many people using the product without paying.

Growth and the cost meter

This is especially harsh for independent developers. Large companies can treat AI as a strategic investment. They can spread the cost across cloud businesses, ads, ecosystems, financing, and long-term positioning. Independent developers do not have that much room to maneuver. A bill is a bill. The credit card charge does not shrink because “AI is the future.”

So the quality of growth matters more.

A user willing to pay for workflow efficiency and a user who generates a few images and disappears mean very different things to a product. The first may be a business. The second may only be cost. In the old internet, user growth at least sounded cheerful. In AI, low-quality growth can feel like receiving a pile of orders with no payment attached. The workshop is busy, the owner is poorer.

That is why AI products enter gross margin thinking earlier.

It is not enough for a feature to be cool. It is not enough for users to want it. You also have to ask whether more usage makes the product lose more money. Many AI demos are impressive in a presentation and much less comforting online. The better the effect, the more people use it. The more people use it, the more beautiful the bill becomes.

Beautiful in the wrong direction.

Costs Will Fall, But They Will Not Vanish

One common reply is that compute will get cheaper and models will get cheaper.

I believe that too. Chips will improve. Inference frameworks will get faster. Models will be quantized, distilled, routed, and specialized. Smaller models will handle more simple tasks. Large model providers will keep fighting on price.

But cheaper is not the same as free.

When bandwidth became cheaper, the internet did not stay with text pages. It moved to images, video, livestreaming, and cloud gaming. When storage became cheaper, people did not store less. They took more photos, uploaded more videos, and kept more backups.

When compute becomes cheaper, AI will probably not stay at today’s level of usage. Context windows will grow. Agents will become more complex. Automated tasks will run more often. Something that is called a few times a day may become something that runs continuously in the background. Falling cost expands the boundary of use, but it also creates new ways to consume resources.

So the real answer is not to wait for cost to become zero. The real answer is to learn how to account for it.

Use cheap models for simple tasks and stronger models only when needed. Cache what can be cached. Run what can be asynchronous outside the realtime path. Ask for confirmation instead of regenerating blindly. Do local preprocessing when possible instead of sending everything to a large model. Model routing, cost monitoring, quota design, retry policy: these sound like engineering details, but they are business fundamentals for AI products.

An AI application that does not watch cost is like a factory that does not watch the power meter. Loud machines do not necessarily mean a healthy business.

The Old Internet Formula Is Not Enough

AI is still software.

It can iterate quickly. It can be distributed online. It can be sold by subscription. A small team can build things that would have been hard to imagine before. These are real software advantages.

But AI is not only software.

Traditional software was powerful because copying was cheap. The internet was powerful because distribution was cheap. AI applications add a difficult layer: high-quality output has production cost, and that cost rises with usage.

So the old phrase “grow first, monetize later” needs to be weighed again.

Who pays for each service?

If users pay directly, the product must be worth paying for. If enterprises pay, the product must enter real workflows. If ads cover the cost, traffic value must be high enough. If a platform subsidizes the cost, the product must accept that the platform can change its mind. If the developer pays personally, it is better to know from the beginning whether this is a learning project, an experiment, or a long-term business.

Not every AI project needs to make money. Learning projects, demos, portfolios, and technical experiments can have their own value. But if something is treated as a product, it cannot live only on vision while ignoring the ledger.

The old internet taught people that scale solves many problems.

AI reminds people that scale also amplifies many problems.

The Plain Rule Still Applies

So “AI is like manufacturing” is not just a colorful metaphor.

It is a reminder that AI brings production back into each request. Traditional internet products were closer to copying and distribution. AI is closer to on-demand production. It still has the speed of software, but it also has the cost discipline of a factory. It can open the entrance to the whole world, and it can spend real resources on every output.

This is not pessimistic.

In fact, because the cost is real, the value can become more real too. If an AI application can make users willing to pay for each act of production, or pay continuously for the efficiency it creates, then it is not just a toy. It may be a tool, a service, or a new way to organize work.

But this era no longer lets developers hide completely inside the old illusion of free internet products.

Software made copying cheap. The internet made distribution cheap. AI makes computation itself part of the product.

And production has always had a simple rule:

Machines run. Materials are consumed. The meter moves. Someone has to pay, or the business cannot continue.