So you've shipped a Node service. Maybe it serves an API, maybe it handles webhooks, maybe it's the backend for an app that's actually getting traction. And one day a request comes in that needs to resize a 4000×3000 product photo, or hash a 200MB upload, or run a tokenizer over a multi-megabyte prompt before sending it to an LLM. The request takes 800ms. That alone wouldn't be a problem. The problem is that during those 800ms, every other request your process is handling sits frozen. Health checks time out. Faster endpoints suddenly take 600ms. The load balancer thinks the instance is dying and pulls it out of rotation. You scale up. The next big upload arrives. Same thing happens to a different pod.
You've just collided with the one thing about Node that hasn't changed since v0.10: the event loop is a single thread, and CPU-bound work blocks it. All of it. Everything you've been told about Node being "great at concurrency" was true - for I/O. The minute you put something genuinely computational on the same thread that's also accepting connections, parsing JSON, and resolving promises, you don't have concurrency anymore. You have a queue.
Worker threads are Node's answer to this specific problem. Not the only answer - cluster, child_process, and "just call out to a Go binary" all exist - but the one that's built into the runtime, sits in your process, shares memory when you want it to, and is cheap enough to use for one-off operations instead of needing a separate service. This piece walks through what worker threads actually are, when they're the right call, how to use them for the three workloads that drive 90% of real-world adoption (image processing, AI preprocessing, generic CPU-heavy jobs), the concurrency patterns that turn one worker into a manageable pool, and the failure modes that bite people the second they ship.
The Event Loop Problem In One Paragraph
Node runs your JavaScript on a single thread. That thread does everything: HTTP parsing, your route handlers, JSON serialisation, callback dispatch, promise resolution, the works. The I/O it offloads - disk reads, network calls, DNS lookups - goes to a thread pool managed by libuv, but the JavaScript itself stays on one thread. This is fine, even great, when your handlers are mostly waiting for something else (database, upstream API, file system). It is catastrophic when one of those handlers decides to do real work.
A single await sharp(buffer).resize(800, 600).toBuffer() call burns about 50-150ms of CPU for a typical product image, all of it on the main thread. A crypto.pbkdf2Sync(password, salt, 100000, 64, 'sha512') call is around 50ms. Synchronously parsing a 10MB JSON file is 200ms. Running a JS tokenizer over a 100KB prompt for an LLM call is 30-80ms. None of these are slow in isolation. But each one is a window during which your process accepts no new connections, dispatches no callbacks, and answers no health checks. Stack them under load and you get the cascade above.
This is what worker threads fix. They give you a second JavaScript thread (or a fifth, or a sixteenth) that runs in the same Node process, with its own event loop, its own V8 isolate, and its own little universe of memory. You hand work to it through a message channel, and your main thread stays free to do what it's good at: shovelling I/O.
What Worker Threads Are, And What They Are Not
A Worker is not a thread in the sense a Java or C++ developer would expect. It's a new V8 isolate in the same OS process, with its own event loop running on its own OS thread. Each worker has its own copy of the loaded modules, its own globals, its own require cache, and - by default - its own memory. You can't share a regular JavaScript object between the main thread and a worker. What you can share is binary data, through SharedArrayBuffer, and you can transfer ownership of an ArrayBuffer so it ends up belonging to the worker without copying. We'll get to both.
It's also not the same thing as cluster. cluster forks an entire Node process per CPU core, each with its own port handle behind a shared socket. Workers stay in one process. That matters because:
- They start in 30-100ms (a
clusterfork is more like 200-500ms because it boots a fresh Node). - They share the parent's file descriptors and don't fight over a single listening socket.
- They can pass data through transferable buffers in microseconds, not through IPC pipes.
- They share the same OOM budget - one worker eating all the heap kills the whole process.
And it's not child_process.fork() either, which is the same "spawn a whole new Node" model as cluster, with the same startup cost and the same IPC-only communication. Use child processes when you want isolation (a bug in the child can't crash the parent), workers when you want speed and shared memory.
The decision tree is short:
- Need to use all CPU cores for incoming HTTP requests? →
cluster(or just run multiple pods). - Need to run a totally independent binary or untrusted code? →
child_process(or a real sandbox). - Need to do CPU work in response to a request, in the same service, with the lowest startup overhead? →
worker_threads.
The Smallest Useful Example
Before getting into pools and patterns, let's see a worker do one thing. We'll hash a buffer with crypto.scrypt - a deliberately expensive function - off the main thread.
import { parentPort, workerData } from 'node:worker_threads';
import { scryptSync } from 'node:crypto';
const { password, salt } = workerData as { password: string; salt: string };
// This is the blocking call we don't want on the main thread.
const derived = scryptSync(password, salt, 64);
parentPort!.postMessage(derived);
import { Worker } from 'node:worker_threads';
import { fileURLToPath } from 'node:url';
const WORKER_URL = new URL('./hash-worker.js', import.meta.url);
export function hash(password: string, salt: string): Promise<Buffer> {
return new Promise((resolve, reject) => {
const worker = new Worker(fileURLToPath(WORKER_URL), {
workerData: { password, salt },
});
worker.once('message', resolve);
worker.once('error', reject);
worker.once('exit', (code) => {
if (code !== 0) reject(new Error(`Worker exited with code ${code}`));
});
});
}
A route handler can now await hash(...) and the event loop is free for the entire duration of the scrypt call. From the handler's perspective it looks exactly like an async I/O call. From the worker's perspective, it doesn't know or care that anything else is happening in the parent.
There are two things wrong with this code that we'll fix as we go. First, we're spawning a new worker per call, which costs 30-100ms of startup that we just gave away. Second, there's no timeout, no error categorisation, no cancellation. For a one-off script this is fine. For production, neither is.
CPU-Heavy Workloads: When To Reach For A Worker
The rule of thumb that's served me well: if a function takes longer than 10ms of CPU and runs on a hot path, it belongs in a worker. Below 10ms, the cost of marshalling data across the message channel starts to eat your gains. Above 50ms, you're flat-out wrong to keep it on the main thread.
Functions that almost always cross the 10ms line:
crypto.pbkdf2Sync,scryptSync,argon2.hash- password hashing is designed to be slow.zlib.gzipSync/brotliCompressSyncover anything more than ~100KB.JSON.parseof payloads over a megabyte (yes, V8's parser is fast - it's still on the main thread).- Anything that calls into a native add-on that does its work synchronously (
sharp,node-canvas,node-libxml). - Image, PDF, audio, or video processing of any kind.
- Tokenizers, embedding pre-processing, BPE encoding for LLM inputs.
- Heavy regex over large inputs (V8 has a fast path; this is not it).
- Tree traversals over data structures with millions of nodes.
Functions that look CPU-heavy but aren't:
Array.sorton collections under ~10,000 items (microseconds).JSON.stringifyof typical API responses - V8 is shockingly good at this.- Lodash operations on normal-sized data - also fast enough.
- Anything that's actually I/O-bound dressed up as compute (a "slow function" that's really waiting for a network call).
For the ones that genuinely belong off-thread, the pattern is always the same: a small dedicated worker file, a typed message contract, and a pool in front of it. Let's build that pool.
A Worker Pool That Doesn't Suck
You almost never want one-worker-per-call. The startup overhead defeats the point, and unbounded spawning is how you turn a CPU problem into an OOM. The shape you want is a fixed pool: a small set of long-lived workers that share a queue, idle when there's no work, and recycle themselves if they crash.
There are two ways to get one. The first is piscina (npmjs.com/piscina), which is the de-facto standard pool library - written by James M Snell, a Node TSC member, and stable for years. The second is to write your own, which is a useful exercise once. Let's do both.
The hand-rolled version, with the rough edges left in so you can see the moving parts:
import { Worker } from 'node:worker_threads';
import { fileURLToPath } from 'node:url';
import { EventEmitter } from 'node:events';
type Task<I, O> = {
input: I;
transferList?: Transferable[];
resolve: (out: O) => void;
reject: (err: Error) => void;
};
export class WorkerPool<I, O> extends EventEmitter {
private workers: Worker[] = [];
private idle: Worker[] = [];
private queue: Task<I, O>[] = [];
private busy = new WeakMap<Worker, Task<I, O>>();
constructor(private workerUrl: URL, private size: number) {
super();
for (let i = 0; i < size; i++) this.spawn();
}
private spawn() {
const worker = new Worker(fileURLToPath(this.workerUrl));
worker.on('message', (out: O) => {
const task = this.busy.get(worker);
if (!task) return;
this.busy.delete(worker);
task.resolve(out);
this.release(worker);
});
worker.on('error', (err) => {
const task = this.busy.get(worker);
if (task) {
this.busy.delete(worker);
task.reject(err);
}
// Replace the dead worker so the pool size is constant.
this.workers = this.workers.filter((w) => w !== worker);
this.idle = this.idle.filter((w) => w !== worker);
this.spawn();
});
this.workers.push(worker);
this.idle.push(worker);
}
private release(worker: Worker) {
const next = this.queue.shift();
if (next) this.assign(worker, next);
else this.idle.push(worker);
}
private assign(worker: Worker, task: Task<I, O>) {
this.busy.set(worker, task);
worker.postMessage(task.input, task.transferList ?? []);
}
run(input: I, transferList?: Transferable[]): Promise<O> {
return new Promise<O>((resolve, reject) => {
const task: Task<I, O> = { input, transferList, resolve, reject };
const worker = this.idle.shift();
if (worker) this.assign(worker, task);
else this.queue.push(task);
});
}
async destroy() {
await Promise.all(this.workers.map((w) => w.terminate()));
}
}
Used:
import { WorkerPool } from './pool.js';
const pool = new WorkerPool<{ buffer: ArrayBuffer; width: number }, ArrayBuffer>(
new URL('./resize-worker.js', import.meta.url),
4, // one worker per physical core, roughly
);
app.post('/resize', async (req, res) => {
const ab = req.body.buffer; // assume you already got it as an ArrayBuffer
const out = await pool.run({ buffer: ab, width: 800 }, [ab]); // transfer, not copy
res.set('content-type', 'image/jpeg').send(Buffer.from(out));
});
That [ab] second argument is the transferList. It says: "move this buffer to the worker - I won't touch it again." V8 doesn't copy; it changes ownership. After this call, ab.byteLength on the main thread is 0. The buffer now lives in the worker's heap. This is the single biggest performance lever you have when shipping binary data across threads, and it's the difference between a pool that helps and a pool that doesn't.
Now the piscina version, which does all of the above and more:
import Piscina from 'piscina';
import { fileURLToPath } from 'node:url';
export const resizePool = new Piscina({
filename: fileURLToPath(new URL('./resize-worker.js', import.meta.url)),
minThreads: 2,
maxThreads: 8,
idleTimeout: 30_000,
});
app.post('/resize', async (req, res) => {
const ab = req.body.buffer;
const out = await resizePool.run({ buffer: ab, width: 800 }, { transferList: [ab] });
res.set('content-type', 'image/jpeg').send(Buffer.from(out));
});
The differences that matter: piscina has a dynamic pool that grows and shrinks within min/max, an idleTimeout that lets workers retire so a quiet service doesn't hold a thread per core forever, and an AbortSignal you can pass to run() to actually cancel work. It also tracks queue depth, completed task counts, and average run time - you can drop those straight into a Prometheus exporter and have a graph of "worker pool saturation" with five lines of code.
Image Processing: The Most Common Use Case
If your service handles user uploads, there's a 90% chance you're running sharp somewhere, and a 60% chance it's on the main thread. sharp is built on libvips and is genuinely fast - but "fast" still means 30-150ms of synchronous CPU for a typical product photo, and most apps need to process several images per upload (original + 3-5 sizes for responsive srcset + a WebP/AVIF copy of each). That's a half-second of dead event loop, easily, per upload.
The pattern that scales is: one pool, one worker file, transfer the source buffer in, transfer the resized buffer out, do all variants inside the worker so you only cross the boundary twice.
import { parentPort } from 'node:worker_threads';
import sharp from 'sharp';
type Input = {
buffer: ArrayBuffer;
variants: Array<{ width: number; format: 'webp' | 'jpeg' | 'avif' }>;
};
type Output = {
variants: Array<{ width: number; format: string; data: ArrayBuffer }>;
};
parentPort!.on('message', async (msg: Input) => {
const buf = Buffer.from(msg.buffer);
const src = sharp(buf, { failOn: 'error' });
// Read metadata once so we don't upscale beyond the original.
const meta = await src.metadata();
const sourceWidth = meta.width ?? 0;
const out = await Promise.all(
msg.variants.map(async (v) => {
const targetWidth = Math.min(v.width, sourceWidth || v.width);
const pipeline = src.clone().resize({ width: targetWidth, withoutEnlargement: true });
const data =
v.format === 'webp' ? await pipeline.webp({ quality: 80 }).toBuffer()
: v.format === 'avif' ? await pipeline.avif({ quality: 60 }).toBuffer()
: await pipeline.jpeg({ quality: 85, mozjpeg: true }).toBuffer();
// Re-wrap as ArrayBuffer so we can transfer it back.
const ab = data.buffer.slice(data.byteOffset, data.byteOffset + data.byteLength);
return { width: targetWidth, format: v.format, data: ab };
}),
);
const transfer = out.map((v) => v.data);
parentPort!.postMessage({ variants: out } satisfies Output, transfer);
});
A few specifics that matter in production:
src.clone() before each variant is required - sharp instances are not safely reusable across multiple .toBuffer() calls, so cloning before each resize gives you independent pipelines that share the decoded source. This is documented in the sharp API docs; the perf win over re-decoding the source N times is huge.
withoutEnlargement: true prevents the worker from upscaling a 400px source into an 800px variant, which would just produce a blurry larger file. Pair it with the Math.min(v.width, sourceWidth) so the recorded width in your DB also reflects what the worker actually produced.
The data.buffer.slice(...) dance converts the Node Buffer (which is a view into a possibly larger pool) into a clean ArrayBuffer we own and can safely transfer. Skipping this and trying to transfer data.buffer directly will sometimes work and sometimes corrupt unrelated data - Node's buffer pool is shared.
failOn: 'error' is the difference between sharp throwing on a malformed upload and sharp silently producing a partial output. Catch the error in your route, return 400, log it. The alternative is debugging "why does this one PNG produce a one-pixel WebP" at 11pm.
What you don't want to do: read the file inside the worker with sharp(filename). Workers can do disk I/O, but now you've added a hop (main thread reads, sends path; worker reads file again) and you've lost the ability to handle uploads that arrive as streams or buffers. Pass the buffer, transfer it, you're done.
For a service doing 50 uploads/minute with 4 variants each, the difference between "sharp on main thread" and "sharp in a 4-worker pool on a 4-vCPU pod" is the difference between p99 latency of 1200ms (with constant event loop blocking) and p99 of 180ms. The CPU work is the same. The distribution is different.
AI Preprocessing: The 2026 Use Case
A workload that didn't exist five years ago and now dominates a real fraction of Node services: getting input ready for an LLM call. Tokenization, embedding model inference (for retrieval), chunking large documents, computing prompt token counts for budgeting, applying message templates. Most of this is pure compute, none of it is I/O, and almost all of it is happening on people's main threads right now because the libraries make it look like fast JavaScript.
Concrete examples that block:
tiktoken(the Node port of OpenAI's BPE tokenizer) on a 50KB prompt - 30-80ms.gpt-tokenizerdoing the same - similar range.transformers.jsrunning an embedding model in WASM - anywhere from 100ms to several seconds per chunk depending on model size.langchain's text splitters on a 500KB document - 100-500ms.- Anything that calls into
onnxruntime-nodefor local inference - model-dependent, almost always > 50ms per call.
The pattern is identical to images: one worker per logical task, pool with min/max, transfer strings as ArrayBuffer if they're huge (rare; usually just send the string). The wrinkle for AI work is that loading the model is also slow - transformers.js cold-loading a 100MB embedding model takes 2-5 seconds, and you do not want that happening per request.
The fix is module-level loading inside the worker, so the model loads once when the worker spawns and stays in memory for every subsequent task:
import { parentPort } from 'node:worker_threads';
import { pipeline } from '@xenova/transformers';
// This runs once per worker, at startup. The pool keeps the worker alive,
// so subsequent .run() calls hit a warm model.
const extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
type Input = { text: string };
type Output = { vector: number[] };
parentPort!.on('message', async (msg: Input) => {
const result = await extractor(msg.text, { pooling: 'mean', normalize: true });
// result.data is a Float32Array. Convert to a plain array for the channel,
// or transfer the underlying buffer if you can guarantee the receiver won't
// mutate it.
parentPort!.postMessage({ vector: Array.from(result.data as Float32Array) });
});
This worker, once warm, will handle embedding calls at roughly the model's native speed - for all-MiniLM-L6-v2 on a modern CPU, somewhere around 50-100ms per chunk. With a pool of 4, that's 40+ embeddings/second from a single Node pod, with zero impact on your main thread's ability to serve unrelated requests.
The same pattern works for tokenization with tiktoken:
import { parentPort } from 'node:worker_threads';
import { encoding_for_model } from 'tiktoken';
// Load the BPE table once per worker (it's about 1MB, not free).
const enc = encoding_for_model('gpt-4o');
parentPort!.on('message', (msg: { text: string }) => {
const tokens = enc.encode(msg.text);
parentPort!.postMessage({ count: tokens.length });
});
Concurrency Patterns Beyond The Simple Pool
The fixed pool covers maybe 80% of real cases. The other 20% need one of these.
Streaming Through A Worker
For workloads where the input or output is genuinely large (think: gigabyte CSV ingestion, multi-megabyte JSON transforms, audio resampling), you don't want to load the whole thing into a buffer and transfer it. You want a stream that hands chunks to a worker as they arrive.
The trick is to set up a MessageChannel, give one end to the worker via workerData, and treat the channel as your stream. Each postMessage on the main side becomes a chunk; the worker processes it and posts a result back. Backpressure is manual - you watch for the worker falling behind and pause your readable. There is no built-in pipeline() integration with workers; you build it yourself.
This is the area where worker_threads feels least polished. You can make it work, but the ergonomics are nowhere near as nice as the pool case. If you find yourself wanting this, consider whether you actually need a worker at all - for streaming transforms over big files, a child_process running a dedicated binary (or a separate microservice) is often a better fit because you're going to have IPC overhead either way.
Shared Memory With SharedArrayBuffer
SharedArrayBuffer does what it says: a single chunk of memory that's visible from multiple threads at once, with no copy and no transfer. Combined with Atomics, you can build lock-free data structures, ring buffers, and producer/consumer queues that don't pay the message-channel tax on every operation.
The cases where you actually need this:
- A piece of reference data (a model, a big lookup table, an in-memory index) that several workers all need to read constantly. Loading it once into a
SharedArrayBufferand giving every worker a view saves N copies of the data. - A high-throughput producer/consumer pattern where
postMessageoverhead is the bottleneck - typically only at message rates above 10,000/sec. - Coordinating between workers without going through the main thread (e.g., one worker hands off to another via a shared queue).
The cases where you don't:
- Your data fits in a few MB and gets sent occasionally. Just transfer the
ArrayBuffer. The performance difference is in the microseconds and the complexity difference is enormous. - You're reaching for
Atomics.store()andAtomics.load()to "share state between threads" generally. Sharing mutable state across threads in any language is hard, and JavaScript doesn't have the tooling (novolatile, no memory model in the spec strong enough to reason about easily) to make it easy. Use message passing.
If you do go this route, the MDN page on Atomics is the reference, and you'll need to understand Atomics.wait() / Atomics.notify() to do anything useful - those are how you build "block this thread until something happens" without busy-waiting.
Cancellation With AbortSignal
The default Worker API has no cancellation. You can worker.terminate(), but that kills the worker outright - which means you lose the warm model, the loaded library, and you pay the full restart cost on the next call. For a pool, that's catastrophic.
The clean answer is cooperative cancellation: the worker checks an AbortSignal at sensible points and bails out early. Piscina has this built in - run(task, { signal }) is documented and works as you'd expect. Your worker code receives the signal as part of the task envelope and is expected to check it. If you're hand-rolling a pool, you'll need to send the abort through your message channel and have the worker honour it.
For tasks that call into native add-ons (sharp, sqlite, etc.), cancellation is almost always not possible mid-call - the native code holds the C stack, and JavaScript can't interrupt it. The best you can do is check the signal between operations. For a multi-variant image resize, that's between variants. For a single 200ms sharp call, you're committed.
The Failure Modes You'll Actually Hit
These are the things that aren't in the official docs but will burn you within the first week of running workers in production.
Memory leaks compound across workers. A leak in your route handler kills the process eventually. A leak in your worker kills the worker eventually, the pool replaces it, and you keep going - but your overall memory usage stays elevated forever because the OS doesn't reclaim until the whole process exits. Monitor process.memoryUsage() per worker if you can, and don't be afraid to set a maxOldGenerationSizeMb per worker (via resourceLimits in the Worker constructor) so a runaway worker terminates instead of consuming the whole pod.
Unhandled rejections in workers are silent by default. If your worker code does someAsyncThing() without an await or a .catch(), and it rejects, you'll get an unhandledRejection event on the worker - which, if you didn't wire it up, defaults to a process-wide crash in Node 15+. Wire up the listener:
process.on('unhandledRejection', (err) => {
console.error('Worker unhandled rejection:', err);
process.exit(1); // Let the pool restart us.
});
The process.exit(1) is intentional: a worker that lost a promise is a worker in an undefined state. Restart is the cheaper option.
Native add-ons don't always play nice. Most do - sharp, bcrypt, argon2, sqlite3, better-sqlite3 - but some have global state that breaks when loaded in multiple threads. Test before you ship. The symptom is usually a segfault on the second worker that loads the module, which is unpleasant to debug because there's no JavaScript stack trace.
The cost of postMessage is structured cloning. When you don't transfer, you copy. For small messages, this is fine. For a 50MB object, structured cloning is slower than the JSON round-trip you might assume it replaces, because it walks the entire object graph and handles types JSON doesn't. If you're sending big data without transfer, you're paying that cost twice (once to clone in, once to clone out). Always check whether the payload could be an ArrayBuffer (it usually can) and transfer.
require.cache is per-worker. Each worker has its own copy of every loaded module. If your library has a side-effecting top-level (opens a DB connection on load, registers something with a singleton), it runs N times. The DB driver opening a pool per worker is usually what you want. A library registering a global middleware N times is usually not. Read your worker file's transitive imports the first time you set this up.
Termination is not graceful. worker.terminate() is the V8 equivalent of kill -9. Open file handles aren't closed. In-flight promises don't reject. The exit handler does run, but the worker code has zero opportunity to finish what it was doing. For graceful shutdown of a pool, you want to stop accepting new tasks, wait for in-flight tasks to drain (with a timeout), and only terminate() what's still running after the timeout. Piscina does this for you in destroy(); if you're hand-rolling, you'll need to wire it up.
When Worker Threads Are The Wrong Answer
A short list, because most articles on this topic skip it.
If your bottleneck is throughput across requests rather than blocking on a single request, you want more processes (cluster, more pods, a load balancer), not more threads in the same process. Workers help you keep one process responsive while doing work; they don't help you do more work overall on the same hardware than a single-threaded Node could, because the CPU is the CPU.
If you're doing the same expensive computation over and over with the same inputs, what you want is a cache, not a worker. A lru-cache in front of your hashing function will outperform any worker pool, because not doing the work at all beats doing the work fast.
If the work is genuinely massive - minutes of compute, gigabytes of intermediate state - workers will technically run it, but you've outgrown the "in-process worker" model. Push it to a queue (BullMQ, SQS, Pub/Sub), run a dedicated worker service for it, and let your web tier go back to being a thin HTTP layer. The boundary is roughly "tasks that finish within an HTTP request timeout stay in the worker; tasks that don't, get queued."
If you're trying to share lots of complex state between the main thread and the worker, you're going to have a bad time. Worker threads are designed around message passing and binary transfer. The moment you start trying to keep a mutable graph of objects in sync across threads, you've reinvented Java circa 2005 and you don't have any of the tooling to make it work safely. Either keep state on one side, or pick a different model entirely.
A Realistic Checklist For Shipping Workers
When you're about to deploy a feature that depends on worker_threads, run through this. Most production incidents I've seen with workers come from skipping one of these.
The pool is fixed in size, bounded above, and small enough that all workers fit in memory at once with the rest of the process plus a buffer for GC. The pool size respects availableParallelism() and cgroup limits, not raw cpus().length.
The worker file is its own compiled artifact, loaded by URL, with module-level setup that's deliberately expensive (load the model, decode the BPE table) so you only pay that cost N times at startup, not per call.
Every payload that's binary uses transferList. Every payload that's small enough to clone cheaply doesn't bother. You've measured at least once which case you're in.
The worker has an unhandledRejection handler that exits the process so the pool can restart it. The pool has an error handler that replaces dead workers without affecting in-flight tasks on healthy workers.
Cancellation is wired through AbortSignal if any of your tasks can take long enough to matter. Native add-on calls have a documented "we can't cancel mid-call" caveat in the code.
Metrics export queue depth, in-flight count, average task duration, and worker restart count. The graph of "queue depth" is the one that tells you whether you're sized right; if it stays at zero, you're over-provisioned; if it grows unbounded, you're under-provisioned.
Graceful shutdown drains in-flight tasks with a timeout before terminating workers. The HTTP server stops accepting new requests before the pool starts draining, so you don't have a window where requests arrive but workers refuse them.
Most of that is two or three lines of code each. The reason this list exists is that each one is the thing that breaks the day you ship.
What To Take Away
The event loop being single-threaded is not a flaw. It's the constraint that makes Node fast for I/O and easy to reason about - until you put real compute on it, at which point it becomes the thing that's keeping your service from scaling. Worker threads are the in-process escape hatch for that exact case: cheaper than cluster, faster than child_process, more shared-memory-friendly than either, and built into the runtime so you don't pay an integration tax.
Use them when a function on your hot path costs more than 10ms of CPU and you can't cache the result. Pool them, because per-call spawning kills the win. Transfer binary data instead of cloning it. Load expensive setup once per worker, not once per task. Wire up the failure modes - unhandled rejections, dead workers, cancellation, graceful shutdown - before you ship, not after the first incident.
And know when to walk away. Workers solve "this request shouldn't freeze my event loop." They don't solve "I need ten times the throughput" - that's still a process problem, and the answer is still more pods. The two scale orthogonally, and the trick to using Node well at this point is knowing which lever to pull when.





