So, you push a deploy. Kubernetes rolls the pods. Within thirty seconds your dashboard lights up with a thin column of 502s, a spike in failed background jobs, and one customer who somehow got charged twice. You read the postmortem the next morning and the explanation is one line: "the pod was terminated while requests were still in flight."

This is what an ungraceful shutdown looks like, and the thing that stings is that Go makes it easy to fix. The whole machinery is right there in the standard library: signals, contexts, http.Server.Shutdown, sync.WaitGroup, errgroup. There is no magical third-party library you need. There is a small set of patterns, and once you wire them up correctly, your service stops getting paged at deploy time.

That's the whole article. We'll go through signals first, then HTTP server drain, then background workers, then everything else (databases, message consumers, gRPC), and we'll finish by assembling the full lifecycle into one program you could lift straight into production. Code is plain go using the standard library plus a couple of widely-used third-party packages where they earn their place (golang.org/x/sync/errgroup, github.com/jackc/pgx/v5/pgxpool).

Because this is a Go-specific topic, we'll stay in Go the whole way. No Python, no Node. Switching languages on a topic like this would only blur the details.

What "graceful" actually means

Before any code, let's pin the word down. A graceful shutdown is one where the process stops accepting new work, finishes the work it already started (or hands it off safely), and exits cleanly within a deadline. There are four load-bearing pieces in that sentence, and most outages happen because one of them got dropped:

Text
stops accepting new work    -> readiness probe flips, listener stops accepting
finishes in-flight work     -> HTTP requests drain, jobs complete or re-enqueue
hands off safely            -> queue commits, transactions rollback, locks release
within a deadline           -> if drain takes too long, force-exit instead of hanging

The deadline matters as much as the drain. A "graceful" shutdown that hangs forever isn't graceful. It just turns a 502 into a SIGKILL from the orchestrator thirty seconds later. The whole point is to do the cleanup within the budget your platform gives you, and to fail fast and loud if you can't.

On Kubernetes that budget is terminationGracePeriodSeconds (default 30s). On systemd it's TimeoutStopSec. On ECS it's stopTimeout. Whatever the platform, the contract is the same: you get a signal, you get a few seconds, then something less polite happens. Your shutdown code needs to know that number and respect it.

Graceful Shutdown Lifecycle timeline: four horizontal swim-lane bars showing the four shutdown phases from T=0 SIGTERM to T=30s SIGKILL, with HTTP drain and worker cancellation running in parallel from T=2s.

The picture above is the whole arc. The rest of the article is just filling in each lane with real Go code.

Step 1: catch the signal correctly

Every shutdown sequence starts with a signal. On Linux that's almost always SIGTERM from the orchestrator, sometimes SIGINT (Ctrl-C in your terminal), occasionally SIGHUP (config reload; more on that later). SIGKILL is the one you can't catch; that's the platform losing patience.

The naive version most Go tutorials still show looks like this:

Go signal_old.go
ch := make(chan os.Signal, 1)
signal.Notify(ch, syscall.SIGINT, syscall.SIGTERM)
<-ch
log.Println("shutting down...")

It works. But it's clunky, and it doesn't compose with the rest of your code, which is increasingly written around context.Context. Since Go 1.16 there's a better primitive: signal.NotifyContext. It gives you a context that cancels the moment one of the listed signals arrives.

Go signal.go
package main

import (
    "context"
    "log/slog"
    "os/signal"
    "syscall"
)

func main() {
    ctx, stop := signal.NotifyContext(
        context.Background(),
        syscall.SIGINT,
        syscall.SIGTERM,
    )
    defer stop()

    if err := run(ctx); err != nil {
        slog.Error("server exited with error", "err", err)
    }
}

Now ctx is the shutdown context for your whole program. Anywhere you accept a context (HTTP handlers, database queries, worker loops) you pass this one down. When SIGTERM arrives, ctx.Done() fires, and every part of your program learns at the same time that it's time to stop.

The stop() function is worth a second look. Calling it stops the signal handler from intercepting signals, which means a second Ctrl-C after you've already started shutting down will just kill the process immediately. That's usually what you want: one signal = graceful, two signals = "I mean it, go now". You get that behavior almost for free.

A few signal nuances worth knowing:

  • SIGTERM is the polite stop. Catch this.
  • SIGINT is interactive (Ctrl-C). Catch this too; useful in development.
  • SIGHUP traditionally means "reload config"; only catch it if you actually implement reload.
  • SIGKILL and SIGSTOP cannot be caught. Don't bother trying. If you're getting killed by these, your graceful path is too slow.

Step 2: drain the HTTP server

Now the signal is caught and the shutdown context is canceled. The biggest source of dropped requests in a Go service is the HTTP server, so that's where we start.

http.Server has had a Shutdown method since Go 1.8, and most people use it wrong in exactly one way: they pass the shutdown context as-is, with no deadline. That context was just canceled. srv.Shutdown(ctx) will immediately return because the context is already done, in-flight requests get killed, and you're back to 502s. The fix is to give shutdown its own deadline-bounded context.

Go server.go
func run(ctx context.Context) error {
    mux := http.NewServeMux()
    mux.HandleFunc("/", handle)

    srv := &http.Server{
        Addr:              ":8080",
        Handler:           mux,
        ReadHeaderTimeout: 5 * time.Second,
        IdleTimeout:       60 * time.Second,
    }

    serverErr := make(chan error, 1)
    go func() {
        slog.Info("http server listening", "addr", srv.Addr)
        if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
            serverErr <- err
        }
        close(serverErr)
    }()

    select {
    case <-ctx.Done():
        slog.Info("shutdown signal received, draining http server")
    case err := <-serverErr:
        return fmt.Errorf("http server crashed: %w", err)
    }

    shutdownCtx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
    defer cancel()

    if err := srv.Shutdown(shutdownCtx); err != nil {
        slog.Error("http server shutdown returned error", "err", err)
        // fall back to force close so we don't hang
        _ = srv.Close()
        return err
    }

    slog.Info("http server drained cleanly")
    return nil
}

Two things are doing the real work here. The first is context.WithTimeout(context.Background(), 25*time.Second). We explicitly start a fresh context for the shutdown phase, because the program context is already canceled. We pick 25 seconds because the platform gives us 30 and we want a margin to clean up everything else after the HTTP server is done.

The second is the select. We wait for either the shutdown signal or a server crash. If the server crashes on its own, we don't want to also try to gracefully drain it.

What does Shutdown actually do? It calls Close on the underlying listener so the OS stops accepting new connections, then closes any idle keep-alive connections, then waits for the active connections to return to idle, then returns. It does not wait for Hijack-ed connections: websockets, server-sent events, anything you've taken over by hand. If you have long-lived connections, you have to coordinate their shutdown yourself.

Go websocket_shutdown.go
srv.RegisterOnShutdown(func() {
    // close your websocket hub, broadcast a "going away" frame, etc.
    wsHub.CloseAll(websocket.CloseGoingAway, "server shutting down")
})

RegisterOnShutdown runs during Shutdown and is the right hook for these. The handler closes its long-lived connections, the clients reconnect to a healthy pod, and the drain can finish.

Flip readiness before you drain

Here's the subtle part. If you call srv.Shutdown the instant SIGTERM arrives, you'll still get a few seconds of new traffic, because Kubernetes hasn't yet propagated the pod's removal from the Service endpoints. The kube-proxy on every node has its own cache, and the convergence window is anywhere from 100ms to several seconds depending on cluster size.

The fix is to flip readiness to false first, wait a couple of seconds for the propagation, then start draining. The pattern looks like this:

Go readiness.go
var ready atomic.Bool

func readyHandler(w http.ResponseWriter, r *http.Request) {
    if !ready.Load() {
        http.Error(w, "shutting down", http.StatusServiceUnavailable)
        return
    }
    w.WriteHeader(http.StatusOK)
}

// ... at startup:
ready.Store(true)

// ... on shutdown signal, BEFORE srv.Shutdown:
ready.Store(false)
time.Sleep(5 * time.Second) // let kube-proxy notice
// now drain

That time.Sleep looks ugly but it's load-bearing. The right value depends on your cluster's --iptables-sync-period and how aggressive your readiness probes are. Five seconds is a sane default for most setups; tune it once and forget about it. Some people prefer to delegate this to a Kubernetes preStop hook that runs sleep 5 before sending SIGTERM to the container. Same effect, slightly different ownership of the timing.

Step 3: stop background workers

If your service is pure HTTP, the previous section is most of the work. But the moment you have goroutines doing background work (consuming from a queue, processing a stream, running scheduled jobs) you need a second shutdown lane in parallel with HTTP drain.

The pattern Go gives you is the same one you already know from context: derive child contexts, cancel them, wait for the goroutines to return. The bookkeeping happens in sync.WaitGroup or, more ergonomically, errgroup.Group.

Go worker.go
import "golang.org/x/sync/errgroup"

func runWorkers(ctx context.Context, q Queue) error {
    g, gctx := errgroup.WithContext(ctx)

    for i := 0; i < 8; i++ {
        id := i
        g.Go(func() error {
            return workerLoop(gctx, id, q)
        })
    }

    return g.Wait()
}

func workerLoop(ctx context.Context, id int, q Queue) error {
    for {
        select {
        case <-ctx.Done():
            slog.Info("worker exiting", "id", id, "reason", ctx.Err())
            return nil
        default:
        }

        job, err := q.Fetch(ctx, 5*time.Second)
        if err != nil {
            if errors.Is(err, context.Canceled) {
                return nil
            }
            slog.Error("fetch failed", "id", id, "err", err)
            continue
        }
        if job == nil {
            continue // poll timed out, loop back
        }

        if err := process(ctx, job); err != nil {
            slog.Error("job failed", "id", id, "job_id", job.ID, "err", err)
            q.Nack(job)
            continue
        }
        q.Ack(job)
    }
}

The shape that matters here is the select at the top of every iteration. Before each Fetch, the worker checks if its context is done. The moment shutdown is triggered, the next iteration of the loop exits. But a job already mid-process keeps running with the canceled context, and process is responsible for noticing.

That last point is where most people get tripped up. A worker that calls a 30-second HTTP request inside process(ctx, job) will only honor cancellation if every layer of that call accepts ctx. If you have a database/sql query that ignores context, or a third-party SDK that doesn't take one, your "graceful" shutdown is going to hang on that one inflight job until the orchestrator kills you.

A useful audit: search your codebase for context.Background() and context.TODO() inside non-test code. Each one is a place where a parent's cancellation can't reach. Most of them should be plumbed through.

Drain vs cancel

There's a real design choice in worker shutdown: do you let in-flight jobs finish naturally, or do you cancel them and re-enqueue?

Both are valid. The right answer depends on whether your jobs are idempotent and how long they take.

  • Idempotent + short (<1s): let them finish. The drain is fast and you avoid retries.
  • Idempotent + long (>30s): cancel mid-job and let the retry path re-pick it on the next pod. Don't fight the deadline.
  • Non-idempotent: design for "at-least-once" on the queue side (visibility timeouts on SQS, message acknowledgements on RabbitMQ, manual offsets on Kafka), and let the orchestrator re-deliver if you cancel before ack.

The pattern in the code above leans on the queue's ack/nack semantics. If process returns successfully, we Ack. If it fails (including because of cancellation), we Nack and the queue redelivers. As long as you never Ack work you didn't actually finish, you're safe.

Go ack_semantics.go
// Wrong: ack before completion
q.Ack(job)
if err := writeToDB(ctx, job); err != nil {
    // the job is already ack'd — work is lost
}

// Right: ack only after success
if err := writeToDB(ctx, job); err != nil {
    q.Nack(job)
    return err
}
q.Ack(job)

That ordering is the single most important rule in worker shutdown. Get it right and a SIGKILL mid-job costs you a retry, not a lost message. Get it wrong and your "graceful" shutdown story is irrelevant because you have a worse data-loss problem hiding underneath it.

Step 4: close everything else

After HTTP is drained and workers are stopped, you still have dependencies open: database pools, message consumers, gRPC servers, file handles, distributed locks. Closing them in the wrong order produces fun bugs.

The rule is close in reverse order of opening, and close consumers before pools, pools before connections.

Text
open order  : config -> logger -> db pool -> queue consumer -> http server
close order : http server -> queue consumer -> db pool -> logger -> config

A concrete example with pgxpool and a message queue:

Go shutdown_deps.go
func shutdown(ctx context.Context, srv *http.Server, pool *pgxpool.Pool, q *queue.Consumer) {
    shutdownCtx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
    defer cancel()

    // 1. stop accepting HTTP
    if err := srv.Shutdown(shutdownCtx); err != nil {
        slog.Error("http shutdown error", "err", err)
        _ = srv.Close()
    }

    // 2. stop pulling messages, drain in-flight
    if err := q.Stop(shutdownCtx); err != nil {
        slog.Error("queue shutdown error", "err", err)
    }

    // 3. close the database pool last — workers and handlers might still be flushing
    pool.Close()
}

pgxpool.Pool.Close() blocks until all in-flight connections are returned, so order matters. If you close the pool first and a worker is still mid-query, the query gets a conn closed error and you turn a graceful shutdown into an angry one.

gRPC

*grpc.Server has its own GracefulStop, with the same shape as http.Server.Shutdown but missing the context parameter. To put a deadline on it, race it against a timer:

Go grpc_shutdown.go
done := make(chan struct{})
go func() {
    grpcServer.GracefulStop()
    close(done)
}()

select {
case <-done:
    slog.Info("grpc drained cleanly")
case <-time.After(20 * time.Second):
    slog.Warn("grpc drain exceeded deadline, force-stopping")
    grpcServer.Stop()
}

GracefulStop waits for all in-flight RPCs. Stop is the force-quit. Always wrap the first in a deadline, always fall back to the second. The same pattern works for any "close" function that doesn't take a context: wrap it in a goroutine, race the result against a timer.

Kafka, NATS, RabbitMQ

The right close sequence for a message consumer is almost always:

  1. Stop the fetch loop (no more new messages).
  2. Wait for in-flight handlers to return.
  3. Commit offsets / acknowledge final messages.
  4. Close the underlying connection.

Most of the Go client libraries follow this shape if you call their Close() after you've stopped pulling. With Kafka via franz-go, that means cancelling the context you passed to PollFetches and then calling client.Close(). With NATS, drain the subscription before closing the connection (sub.Drain() then nc.Close()). With RabbitMQ, cancel the consumer tag first, then let in-flight deliveries ack, then close the channel and connection.

The point isn't to memorize each one. The point is: every queue client has a "stop fetching, finish what's in flight, then close" sequence. Find it in your client's docs and use it. The default "just call Close()" is usually the rude version.

Putting it all together

Here's a skeleton you can adapt: one binary, HTTP server, worker pool, database pool, and a graceful shutdown that respects all of the above.

Go main.go
package main

import (
    "context"
    "errors"
    "fmt"
    "log/slog"
    "net/http"
    "os"
    "os/signal"
    "sync/atomic"
    "syscall"
    "time"

    "github.com/jackc/pgx/v5/pgxpool"
    "golang.org/x/sync/errgroup"
)

func main() {
    if err := run(); err != nil {
        slog.Error("fatal", "err", err)
        os.Exit(1)
    }
}

func run() error {
    ctx, stop := signal.NotifyContext(context.Background(),
        syscall.SIGINT, syscall.SIGTERM)
    defer stop()

    pool, err := pgxpool.New(ctx, os.Getenv("DATABASE_URL"))
    if err != nil {
        return fmt.Errorf("connect db: %w", err)
    }
    defer pool.Close()

    q, err := newQueueConsumer(ctx)
    if err != nil {
        return fmt.Errorf("connect queue: %w", err)
    }

    var ready atomic.Bool
    ready.Store(true)

    mux := http.NewServeMux()
    mux.HandleFunc("/", appHandler(pool))
    mux.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
        w.WriteHeader(http.StatusOK)
    })
    mux.HandleFunc("/readyz", func(w http.ResponseWriter, r *http.Request) {
        if !ready.Load() {
            http.Error(w, "draining", http.StatusServiceUnavailable)
            return
        }
        w.WriteHeader(http.StatusOK)
    })

    srv := &http.Server{
        Addr:              ":8080",
        Handler:           mux,
        ReadHeaderTimeout: 5 * time.Second,
        IdleTimeout:       60 * time.Second,
    }

    g, gctx := errgroup.WithContext(ctx)

    // HTTP server
    g.Go(func() error {
        slog.Info("http listening", "addr", srv.Addr)
        if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
            return fmt.Errorf("http server: %w", err)
        }
        return nil
    })

    // Worker pool
    g.Go(func() error {
        return runWorkers(gctx, q, pool, 8)
    })

    // Shutdown sequencer
    g.Go(func() error {
        <-gctx.Done()
        slog.Info("shutdown started", "reason", gctx.Err())

        // 1. flip readiness
        ready.Store(false)
        time.Sleep(5 * time.Second)

        // 2. fresh deadline-bound context for cleanup
        shutdownCtx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
        defer cancel()

        // 3. drain HTTP
        if err := srv.Shutdown(shutdownCtx); err != nil {
            slog.Error("http shutdown", "err", err)
            _ = srv.Close()
        }

        // 4. stop the queue consumer (workers will see ctx.Done and exit)
        if err := q.Stop(shutdownCtx); err != nil {
            slog.Error("queue stop", "err", err)
        }

        return nil
    })

    if err := g.Wait(); err != nil {
        return err
    }

    slog.Info("shutdown complete")
    return nil
}

A few things worth pointing out in this skeleton:

The shutdown sequencer is just another goroutine in the errgroup. It waits on gctx.Done(), which fires when either a signal arrives or any of the other goroutines returns an error. That second case matters: if the HTTP server crashes, the worker pool should also stop. errgroup.WithContext gives you that fan-out cancellation for free.

The time.Sleep(5 * time.Second) happens before srv.Shutdown. That's the readiness-propagation window from earlier. You can replace it with a Kubernetes preStop hook if you'd rather externalize it.

The pool.Close() is deferred at the top of run(), after the goroutines have all returned via g.Wait(). That ordering (defer in reverse open order) handles the "close the DB pool last" rule automatically.

And os.Exit(1) only runs on a fatal error. On a normal graceful shutdown, the binary returns from main and exits 0. Your container runtime should see exit 0 and be happy.

Pitfalls that bite people anyway

A short list of the failure modes I've seen most often in real shutdown code. None of these are exotic. They're the ones that survive code review and only show up under load.

The "I'll just defer Shutdown" trap. A defer srv.Shutdown(ctx) at the top of run() looks neat but it almost never does what you want. By the time defer runs, the context is canceled, Shutdown returns instantly, and connections get severed. Always create a fresh deadline-bound context for shutdown.

The hidden infinite loop in a worker. A worker that does for { fetch(); process(); } with no select on ctx.Done() will not stop on a signal. It'll only stop when fetch itself returns an error because the underlying connection is broken, which is exactly when you didn't want to find out. Every long-running goroutine needs a context check on every iteration.

Sharing the program context with shutdown. srv.Shutdown(ctx) where ctx is the same context you canceled to initiate shutdown is a self-cancelling call. The fix is the deadline-bound child context pattern shown above.

Closing the DB pool before workers finish. You see this with linear defer statements in the wrong order or with explicit Close calls in the wrong sequence. The symptom is a flood of connection closed errors in the last second of shutdown logs. The fix is "close in reverse of open", or equivalently, "drain the consumers of a resource before closing the resource."

Ignoring the deadline. A graceful shutdown that hangs is not graceful. If your drain budget is 30 seconds and your DB pool's Close() can take 60, you need to either tune the pool (MaxConnLifetime, idle timeouts), give yourself a wider window, or accept that the platform's SIGKILL is your real backstop. Don't pretend the deadline doesn't exist.

Logging os.Exit from inside a defer. os.Exit skips deferred functions. So does log.Fatal. If your shutdown logging happens inside a defer slog.Info("shutdown complete"), an os.Exit(1) further up will skip it and your last log line will be a lie. Use return err from main (via a wrapped run() function) instead.

A word on observability during shutdown

The shutdown window is where outages get diagnosed, so log it like you mean it. Two log lines that are almost always worth adding:

  • One when the signal is received, with the signal name and the current count of in-flight requests / pending jobs.
  • One per phase as it completes, with timing.
Go logging.go
start := time.Now()
slog.Info("shutdown phase started", "phase", "http_drain")
if err := srv.Shutdown(shutdownCtx); err != nil {
    slog.Error("shutdown phase failed", "phase", "http_drain", "err", err, "elapsed", time.Since(start))
} else {
    slog.Info("shutdown phase completed", "phase", "http_drain", "elapsed", time.Since(start))
}

When someone files a postmortem six weeks from now, those timings are how you'll prove that the HTTP drain finished in 1.2 seconds but the queue consumer took 19. That's the difference between "we have a graceful shutdown bug" and "the queue client's drain timeout needs lowering".

If you're emitting metrics, a shutdown_phase_duration_seconds histogram with a phase label is the one I reach for. Once it's there you start noticing when a deploy starts taking longer to drain, which is usually a real signal about something that was added without thinking about shutdown.

The whole thing in one sentence

Graceful shutdown in Go is: catch one signal into a context, derive a deadline-bound shutdown context, flip readiness, drain HTTP, stop workers, close dependencies in reverse order, and force-exit if it all takes too long.

Everything in this article is variations on that sentence. Once you've wired it up once in your service template, you stop thinking about it. Deploys become boring. Pods restart cleanly. The 502 column on your dashboard stays flat. That's the whole win.