Building AI Features With TypeScript Without Turning Your App Into A Demo

It usually starts as a script. You paste an OpenAI key into a .env.local, call generateText, and watch a coherent paragraph appear in your terminal. You wrap it in a Next.js route handler, hook it up to a textarea, and demo it on the team Slack. People react with the fire emoji. You ship a Loom video.

Then you put it in front of real users and the seams show within a day. The model returns JSON with a stray comment block, and JSON.parse throws. Someone pastes a five-megabyte transcript and you get a 400 from OpenAI about token limits. The request takes 12 seconds, the user clicks "Generate" four times, and your monthly bill has a new shape. None of this is the model being broken. It's the model being a non-deterministic remote service that you wrapped like a normal API call.

This is the gap between "AI demo" and "AI feature." Most of it is closed with the same TypeScript habits you'd apply to any external integration — boundary validation, structured failure modes, retry semantics, idempotency. The model just happens to be where the chaos lives.

LLMs Are Not APIs, They Are Untrusted Subprocesses

A normal API has a contract. A GET /users/:id returns either a user shape or an error you can branch on. You can write a type for it once and call it a day.

An LLM has a tendency. You ask for JSON; you usually get JSON. Sometimes you get JSON wrapped in triple backticks. Sometimes you get an apology before the JSON. Sometimes you get the JSON with an extra field you never asked for, because the model thought it would be helpful.

The right mental model is: an LLM call is a subprocess running untrusted code that returns a string. Every byte of that string crosses a trust boundary on the way back into your app. Your job, in TypeScript, is to put a real boundary there — not a comment that says "// TODO: validate".

Validate At The Boundary With Zod 4 And `generateObject`

If you need structured output, do not ask the model for JSON in a prompt and then JSON.parse it. The Vercel AI SDK exposes generateObject (and streamObject) for exactly this — you give it a Zod schema and it handles the model-side instructions, parsing, and repair logic for you.

TypeScript

import { generateObject } from "ai";
import { openai } from "@ai-sdk/openai";
import { z } from "zod";

const Profile = z.object({
  name: z.string().min(1),
  skills: z.array(z.string()).max(20),
  seniority: z.enum(["junior", "mid", "senior"]),
  contactEmail: z.email().optional(),
});

export type Profile = z.infer<typeof Profile>;

export async function extractProfile(bio: string): Promise<Profile> {
  const { object } = await generateObject({
    model: openai("gpt-4o-mini"),
    schema: Profile,
    prompt: `Extract a structured profile from this bio:\n\n${bio}`,
  });
  return object;
}

A few things worth pointing out. gpt-4o-mini is the right default for extraction work — it's fast, cheap, and good enough that paying ten times more for gpt-4o is rarely justified. z.email() is the Zod 4 spelling (the old z.string().email() still works but is being deprecated in favor of the dedicated string formats). And generateObject will throw a typed error if the model can't produce valid JSON for your schema after retries — you don't have to chase trailing commas yourself.

When You Do Need To Parse Free Text, Use `safeParse`

Sometimes you're dealing with a model response that's mostly natural language with one extracted field, or you're calling a provider that doesn't support structured outputs cleanly. In that case, parse defensively:

TypeScript

const Result = z.object({ summary: z.string(), tags: z.array(z.string()) });

const parsed = Result.safeParse(JSON.parse(rawText));
if (!parsed.success) {
  // structured error you can log or surface to retry logic
  throw new Error("model output failed schema: " + JSON.stringify(z.flattenError(parsed.error)));
}
return parsed.data;

safeParse returns a discriminated union instead of throwing, which composes cleanly with whatever error envelope your route handler uses. z.flattenError (Zod 4) gives you a serializable view that's safe to log.

Three-lane diagram showing a user request crossing a validation boundary into a runtime layer that calls an LLM, then back through a normalization step before the UI renders, with retry, idempotency key, and structured logging branches highlighted in cyan against navy. — The shape of an AI feature that survives — boundary in, boundary out, observable in the middle

Streaming Is The Real Latency Fix

The honest answer to "the model takes ten seconds" is not "make it faster." It's "show the user something within 200ms." Streaming is non-negotiable for chat-style features. streamText from the Vercel AI SDK gives you a Response object you can return directly from a Next.js route handler:

TypeScript

import { streamText } from "ai";
import { openai } from "@ai-sdk/openai";

export async function POST(req: Request) {
  const { messages } = await req.json();
  const result = streamText({
    model: openai("gpt-4o"),
    system: "You are a concise product help assistant.",
    messages,
  });
  return result.toUIMessageStreamResponse();
}

On the client side, useChat from @ai-sdk/react consumes that stream, manages the message array, and gives you a controller you can wire to a form. The user sees tokens within a few hundred milliseconds. The total time to finish doesn't change, but the perceived latency drops by an order of magnitude.

Background Work For Anything Over 30 Seconds

For long-running generation — analyzing a PDF, regenerating an entire document, multi-step agent runs — streaming buys you nothing because the user shouldn't be staring at a tab anyway. Accept the request, return a 202 Accepted with a job ID, run the work in a queue (Inngest, BullMQ, Cloudflare Queues, whatever your stack uses), and notify the UI over SSE or a poll. This also lets your retry policy live somewhere sane instead of inside a request handler that times out at 60 seconds.

Retries Need Backoff, Jitter, And A Budget

OpenAI 429s exist. Anthropic 529s exist. Network blips exist. A naive retry loop turns one bad minute into a stampede. The pattern that actually behaves:

TypeScript

async function withRetry<T>(op: () => Promise<T>, opts = { max: 3, baseMs: 500 }) {
  let lastErr: unknown;
  for (let attempt = 0; attempt < opts.max; attempt++) {
    try {
      return await op();
    } catch (err) {
      lastErr = err;
      if (!isRetryable(err) || attempt === opts.max - 1) throw err;
      const backoff = opts.baseMs * 2 ** attempt;
      const jitter = Math.random() * backoff;
      await new Promise((r) => setTimeout(r, backoff + jitter));
    }
  }
  throw lastErr;
}

The jitter matters more than the backoff. Without it, every client that hit the same 429 retries at exactly the same moment two seconds later, and the rate limiter sees the same wave again. isRetryable is a small predicate — 429, 5xx, network errors, yes; 400, 401, 422, no.

Idempotency Stops The Quadruple-Click Bug

The user clicks "Generate" four times because the spinner doesn't feel like progress. Without idempotency, you just paid for four completions and stored four results. Add a client-supplied key:

TypeScript

const key = crypto.randomUUID(); // generated once when the form mounts
await fetch("/api/generate", {
  method: "POST",
  headers: { "Idempotency-Key": key, "content-type": "application/json" },
  body: JSON.stringify({ input }),
});

On the server, before you call the model, check whether you've already produced a result for that key. If yes, return the cached result. If no, run the model and store the result against the key with a sensible TTL (an hour is usually enough). Redis with SET NX EX is the boring, correct primitive for this.

Cost And Token Limits Are Product Constraints, Not Footnotes

A free-tier user shouldn't be able to burn $40 of completions in an afternoon. Cap input tokens before you call the model, not after. The provider will tell you the input was too large via a 400, but you've already spent the round-trip and your error logs are now full of preventable errors. Use a tokenizer (tiktoken for OpenAI, Anthropic's messages.countTokens endpoint via @anthropic-ai/sdk for Claude) to measure and truncate the prompt yourself, and set maxTokens on the call so a runaway response doesn't keep going.

A One-Sentence Mental Model

A production AI feature is just a normal feature where the slowest, least predictable dependency happens to live behind the most expensive HTTPS call you make — treat it accordingly, validate at the boundary, stream what you can, and never let the model decide your retry policy.

Building AI Features With TypeScript Without Turning Your App Into A Demo

LLMs Are Not APIs, They Are Untrusted Subprocesses

Validate At The Boundary With Zod 4 And `generateObject`

When You Do Need To Parse Free Text, Use `safeParse`

Streaming Is The Real Latency Fix

Background Work For Anything Over 30 Seconds

Retries Need Backoff, Jitter, And A Budget

Idempotency Stops The Quadruple-Click Bug

Cost And Token Limits Are Product Constraints, Not Footnotes

A One-Sentence Mental Model

Let’s make something great together

Links

Contacts

LLMs Are Not APIs, They Are Untrusted Subprocesses

Validate At The Boundary With Zod 4 And generateObject

When You Do Need To Parse Free Text, Use safeParse

Streaming Is The Real Latency Fix

Background Work For Anything Over 30 Seconds

Retries Need Backoff, Jitter, And A Budget

Idempotency Stops The Quadruple-Click Bug

Cost And Token Limits Are Product Constraints, Not Footnotes

A One-Sentence Mental Model

You might also like

AI Agents In TypeScript: Workflows, Tools, Memory, and Guardrails

Structured Outputs: Making LLM Responses Useful In TypeScript

Vercel AI SDK: Practical Patterns For Real Products

Let’s make something great together

Validate At The Boundary With Zod 4 And `generateObject`

When You Do Need To Parse Free Text, Use `safeParse`