Why Cheaper AI Makes Product Thinking More Important, Not Less

A startup called Taalas just demonstrated a chip running Llama 3.1 8B at 17,000 tokens per second. That's roughly 10x the current state of the art. No expensive HBM memory. No liquid cooling. Radically simpler hardware.

This isn't an isolated event. Inference costs have been dropping steadily, and hardware like this will accelerate that trend dramatically. We're heading toward a world where running an LLM costs nearly nothing.

That sounds like great news. And it is — if you're building the right thing.

The "AI-Powered" Trap

Over the past two years, thousands of startups launched with variations of the same pitch: "We use AI to do X." Many of them are thin wrappers around an API call. A prompt, some formatting, maybe a nice UI on top.

When inference was expensive and access was limited, that was enough to differentiate. You had AI, your competitor didn't. Game over.

But that window is closing fast. When anyone can add a competent LLM to their product for pennies per request, "we use AI" stops being a differentiator. It becomes table stakes — like having a database or a search function.

What Actually Matters

If AI becomes free tomorrow, does your product still matter?

That's the question I keep coming back to. And the answer depends entirely on how deeply you understand the problem you're solving.

Here's what I mean. I'm building BidScribe, a tool for writing RFP responses. The AI part — generating draft answers from company knowledge — is maybe 20% of the value. The other 80% is everything around it:

Understanding the workflow. RFP teams don't work sequentially. They split questions across subject matter experts, chase approvals, and race deadlines. The tool needs to fit that reality, not force a new one.
Context management. A generic LLM doesn't know your company's pricing model, your compliance certifications, or that you lost a deal last quarter because of a weak security answer. Building the right context pipeline is harder than the generation itself.
Trust calibration. Nobody submits a million-dollar proposal based on raw AI output. The tool needs to show confidence levels, source references, and make editing frictionless. The UX around the AI matters more than the AI.

None of this gets commoditized when inference gets cheaper. If anything, it becomes more important — because the AI layer itself is no longer the bottleneck.

The Moat Shifts

Think about it this way. In the early days of cloud computing, simply being "cloud-based" was a selling point. Eventually, everything moved to the cloud, and the differentiator shifted back to the actual product.

AI is following the same pattern, just faster. The companies that will win aren't the ones with the most sophisticated model. They're the ones who built the best workflow, the most thoughtful UX, and the deepest understanding of their users' actual problems.

This is especially true in B2B. Enterprise buyers don't care what model you use. They care whether you solve their problem reliably, securely, and without creating more work for their team.

What This Means for Builders

If you're building an AI product right now, here's what I'd think about:

1. Separate the AI from the value. Can you describe what your product does without mentioning AI? If not, you might be building a feature, not a product.

2. Invest in workflow, not just generation. The prompt-to-output pipeline is the easy part. The hard part is everything before and after: data ingestion, context selection, human review, integration with existing tools.

3. Own the problem, not the technology. Become the expert on your users' workflow. Talk to them. Watch them work. The insights you get from that will outlast any model advantage.

4. Plan for free inference. Design your business model assuming AI costs approach zero. If your margins depend on inference being expensive, you're building on a shrinking foundation.

The Opportunity

Here's the optimistic take: cheaper AI is a massive tailwind for builders who think in terms of problems, not technology. It means you can experiment more, serve more users, and build more sophisticated pipelines without worrying about cost.

The barrier to entry for "using AI" drops to zero. But the barrier to entry for "understanding the problem deeply enough to solve it well" — that stays high. That's where the real value lives.

I'm betting on that with BidScribe. The AI will get cheaper and better on its own. My job is to make sure the product solves the actual problem, regardless of what model is running under the hood.

I'm building BidScribe in public — an AI tool for RFP responses. Follow along on Twitter or check out my other posts on gerloff.dev.