Serverless Cold Starts: Taming the Latency Dragon

Serverless cloud infrastructure promises a dream: pay only for what you use, infinite scale, no servers to patch. Then reality arrives in the form of a cold start—that painful multi-second delay when a function instance spins up from nothing. For many backend systems, cold starts are not an edge case; they are the default experience for the first user of every function revision. In this article, we dissect the anatomy of cold starts across different compute runtimes, drawing from extensive engineering notes collected over two years of production workloads. We measure exactly what happens between the trigger event and the first line of your handler code. The results surprised even us, and they directly challenge some common assumptions about software architecture in the serverless world.

Our engineering notes begin with a controlled experiment. We deployed identical functions—one in a lightweight interpreted language, one in a compiled language with a virtual machine, and one in a language that compiles to native code. Each function did nothing except return "hello." Then we invoked them simultaneously after a period of inactivity, measuring every millisecond. The software architecture of the runtime matters enormously. Interpreted runtimes cold-started in under 300ms but consumed more CPU during execution. Compiled VM runtimes (like those using a JIT) took nearly two seconds to cold start because they had to load the VM, initialize the garbage collector, and warm up the JIT compiler. Native code runtimes sat in the middle. These engineering notes became the baseline for every optimization we tried later.

One of our most effective techniques came from rethinking development tools. Most serverless platforms allow you to provision "provisioned concurrency"—keeping a pool of warm instances ready. However, this erases the cost benefits of serverless cloud infrastructure for many use cases. Instead, we developed a pattern called "Proactive Keep-Alive Pooling" that uses scheduled invocations combined with synthetic traffic to keep functions warm only when needed. The engineering notes show that sending a lightweight health check every five minutes reduces cold starts by 94% for functions with predictable traffic patterns. The software architecture trick is to separate warmers from workers: a dedicated warmer function runs on a schedule, invoking the real function with a special "ping" header that causes an immediate fast-path return.

Backend systems that rely on large dependency trees suffer disproportionately from cold starts. Our engineering notes tracked a case where a function imported a machine learning library just to normalize a string. The cold start time was 8 seconds; the actual work took 12 milliseconds. The solution was software architecture refactoring: moving heavy dependencies into a separate, always-warm sidecar service while keeping the serverless function lean and mean. This required changes to cloud infrastructure—specifically, using a NAT gateway and VPC configurations that added only minimal latency. The development tools we built to analyze function bundle sizes became so popular internally that we released them as open-source prototypes, documented across several platform updates.

We also explored runtime-specific optimizations that border on engineering notes wizardry. For example, some backend systems can cache database connections and reusable objects in global variables. The catch? Global variables persist across invocations only if the same instance stays warm. Our software architecture pattern called "Connection Immortality" uses a background keepalive ping to ensure that any instance that survives the first cold start will keep its database pool open. The engineering notes include a cautionary tale: one team cached an HTTP client that held TLS sessions open, and after a certificate rotation, their function spent minutes failing before retrying. Always validate external state before reuse. These details rarely appear in official documentation, but they are the difference between a serverless cloud infrastructure that works and one that frustrates everyone.

The final lesson from these engineering notes is that cold starts are not a bug in serverless cloud infrastructure; they are a feature of the billing model that you must design for. The best software architecture for latency-sensitive backend systems is not "go all-in on serverless" nor "avoid it completely." Instead, hybrid software architecture—keeping warm the parts that need low latency while letting cold-tolerant background jobs scale to zero—gives you the best of both worlds. Our platform updates will continue to track how runtimes evolve, and we will publish fresh engineering notes as new development tools emerge. For now, remember: measure first, optimize second, and never assume that your serverless cloud infrastructure will behave the same way in production as it does in a local test.

Serverless Cold Starts: Taming the Latency Dragon

stacklogicmesh@stacklogicmesh.com

+81 45-571-4107

452 Shimomaruyachō, Nakagyo Ward, Kyoto, 604-0084, Japan