Structured Outputs: Making LLM Responses Useful In TypeScript

The first AI feature I shipped that survived contact with users was a "categorise this support ticket" endpoint. It was supposed to read a message and return one of seven categories. The first version asked GPT-3.5 in plain English: "Reply with only the category name, nothing else." It worked perfectly in development. In production it returned Billing, then billing, then **Billing**, then Sure! The category is Billing., then a paragraph explaining its reasoning followed by the answer.

I added "do not include any other text" to the prompt. Then "do not use Markdown formatting". Then "respond with a single word in lowercase". Each addition fixed the last failure mode and broke a new one. The endpoint had a switch statement that grew weekly. Eventually a customer's message contained the word "billing" in lowercase and the model decided to quote it back with quotes around it, and "billing" is a different string than billing.

That is what writing AI features without structured outputs feels like. You are negotiating with a slot machine. The fix is not better prompting. The fix is to stop asking the model to format JSON and start telling the API the exact shape you want.

What Changed In The Last Two Years

Native structured outputs landed across every major provider between 2024 and 2025. OpenAI calls it Structured Outputs and ships it on gpt-4o, gpt-4o-mini, and the o-series reasoning models. Anthropic exposes the same idea through tool-use schemas on claude-sonnet-4-5 and claude-opus-4-5. Google has it on Gemini 2.0+. The implementation is roughly the same everywhere: you submit a JSON Schema along with the request, and the provider constrains the model's decoder to only emit tokens consistent with that schema.

The result is not "better at JSON". It is literally cannot produce invalid JSON. The model could decide to refuse, or hallucinate values, or pad strings with nonsense — but the parse always succeeds, and every key is one you defined.

In TypeScript you almost never write the JSON Schema by hand. You write a Zod schema and let a library translate it. The Vercel AI SDK does this for you with generateObject and streamObject:

TypeScript

import { generateObject } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';

const Ticket = z.object({
  category: z.enum(['billing', 'bug', 'feature', 'security', 'other']),
  urgency: z.enum(['low', 'medium', 'high']),
  customerEmotion: z.enum(['neutral', 'frustrated', 'angry', 'pleased']),
  summary: z.string().max(280),
});

export async function classify(message: string) {
  const { object } = await generateObject({
    model: openai('gpt-4o-mini'),
    schema: Ticket,
    prompt: `Classify the following support message:\n\n${message}`,
  });
  return object; // typed as z.infer<typeof Ticket>
}

object.category is 'billing' | 'bug' | 'feature' | 'security' | 'other'. Not a string. The TypeScript compiler knows. The runtime is enforced by the provider. The switch statement is now a sealed exhaustive switch and any new category is a compile error.

Descriptions Are Instructions

Once you stop carrying formatting instructions in the prompt, your prompt gets short. The job of describing what each field means moves into the schema, where it belongs. Zod's .describe() adds a description to the JSON Schema, and the model reads it.

TypeScript

const Invoice = z.object({
  invoiceNumber: z.string().describe('The visible invoice id, e.g. INV-2026-0142'),
  issuedAt: z
    .string()
    .describe('Issue date in ISO 8601 format, e.g. 2026-03-04')
    .refine((s) => !Number.isNaN(Date.parse(s)), 'must be a parseable date'),
  total: z.number().describe('Total in the invoice currency, as a decimal number'),
  currency: z.enum(['USD', 'EUR', 'GBP', 'PLN', 'UAH']).describe('ISO 4217 code'),
  lineItems: z
    .array(
      z.object({
        description: z.string(),
        quantity: z.number().int().min(1),
        unitPrice: z.number().nonnegative(),
      }),
    )
    .min(1, 'an invoice must have at least one line item'),
});

Notice what's happening. total is a number, not a string — the model can't return "thirty". currency is an enum, so Dollars is impossible. lineItems is non-empty. quantity is an integer ≥ 1. None of this is in the prompt. All of it is in the type, where review tools can find it and where TypeScript can use it.

Streaming Objects, Not Just Text

For long responses, streaming the object as it builds keeps your UI responsive. streamObject is the streaming sibling of generateObject — same schema, partial progress on every chunk:

TypeScript

import { streamObject } from 'ai';

const result = streamObject({
  model: openai('gpt-4o-mini'),
  schema: Invoice,
  prompt: `Extract invoice data:\n\n${ocrText}`,
});

for await (const partial of result.partialObjectStream) {
  // partial is a Partial<Invoice> that grows as the model emits tokens
  send(partial);
}
const final = await result.object; // fully validated Invoice

The partial type is DeepPartial<Invoice> — every field is optional until it's complete. In a UI that's exactly what you want; you can show line items appearing one by one without waiting for the totals row.

A diagram showing a messy paragraph of free text on the left, a Zod schema as a glowing funnel in the middle, and a clean, fully-typed JSON object on the right; below, three side-callouts mark where the schema enforces shape: an enum prevents string drift, a refine prevents invalid dates, and a min prevents empty arrays. — From paragraph to typed object: the Zod schema is both the validation rule and the prompt instruction.

Schema First, Prompt Second

The change in workflow is bigger than the code change. With structured outputs, the schema becomes the artefact you review. The prompt becomes a thin task description. Most of my AI feature reviews now consist of asking "is this Zod schema what we actually want from the model" and almost never "is this prompt good".

Concretely, the order I work in:

Write the Zod schema. Be aggressive about enums and .min/.max constraints.
Sketch a prompt of two or three sentences — what task, in what voice, with what assumptions.
Try the call on five real examples and a few intentionally weird ones.
If outputs are off, change the schema descriptions before changing the prompt.

Step 4 is the one that catches people. When the model misbehaves, the instinct is to add bullet points to the prompt. Try editing .describe() on the relevant field first. The model is more reliable when the constraint sits next to the field it's constraining.

When To Reach For Tool Calls Instead

generateObject is for one output object. If your feature needs the model to choose between multiple structured actions — call function A with this shape, or function B with that shape, or just answer in text — you want tools, not a single schema. The AI SDK's tool({ inputSchema: …, execute: … }) API gives you the same Zod-driven typing for arguments, and the model picks which tool to call:

TypeScript

import { tool, generateText } from 'ai';

const tools = {
  scheduleReply: tool({
    description: 'Schedule a follow-up email for a customer',
    inputSchema: z.object({
      customerId: z.uuid(),
      sendAt: z.iso.datetime(),
      template: z.enum(['nudge', 'apology', 'survey']),
    }),
    execute: async (args) => createSchedule(args),
  }),
  closeTicket: tool({
    description: 'Mark the ticket resolved',
    inputSchema: z.object({ ticketId: z.uuid(), reason: z.string().max(200) }),
    execute: async (args) => closeTicket(args),
  }),
};

await generateText({ model: openai('gpt-4o'), tools, prompt: '…' });

The contract is the same as structured outputs: each tool has a schema, the model can only fill it with valid values, and your application stays type-safe end to end.

Local Models Are Catching Up

If you're running open-weights models locally — Llama, Qwen, the latest Mistral — provider-native structured outputs aren't available the same way, but the gap is closing. llama.cpp supports JSON-schema-constrained decoding. vllm ships with grammar guidance. Libraries like instructor and ollama-js give you a similar developer experience by parsing the schema, prompting the model, and retrying on validation failure.

The retry-on-failure strategy is fine for prototypes, expensive at scale. If structured output is on the critical path of a feature, prefer a model and provider that constrains the decoder natively. Cheaper per call, lower p99 latency, fewer ways for things to drift.

Validation On The Way Out, Always

One last habit. Even with provider-native structured outputs, validate with Zod before you do anything destructive with the result. The provider promises the parse succeeds; your business rules go further than the parse. A .refine() that says "this date can't be in the future" or "this price has to match an existing SKU" lives in your code, not the model's.

TypeScript

const safe = Invoice.safeParse(object);
if (!safe.success) {
  log.warn({ issues: safe.error.issues }, 'invalid invoice from model');
  throw new BadAIOutputError(safe.error);
}
const invoice = safe.data;

safeParse is the right call here. You want a typed branch for the failure, not a thrown exception in the hot path of a streaming response.

A One-Sentence Mental Model

Structured outputs turn the LLM from a text generator you negotiate with into a typed function call you describe with Zod, and once you're working that way, your AI features start looking like the rest of your TypeScript code instead of an island of strings to be regex'd.