Note: This article is limited to the internals of nginx. Other servers might have different solutions to the problems stated in this article and I plan to revise with new info as I learn and explore along the way.

Well, load balancing might be a poorly used term to use here and this is not about high-level system design of load balancing a load balancer/web server. I am talking about something much sneakier that goes on deeper down the stack, under the hood in nginx. I was always curious about what happens at a socket level in web servers. How are so many requests handled? How are the requests buffered? When do requests start getting dropped?

To answer all these questions, we first need to understand a little bit about the internals of the web server. I will be using nginx as a reference. Everything from here is in relation to nginx and other technologies (Envoy, Pingora (Cloudflare’s internal tool which replaced nginx)) might have different solutions to the problems stated above.


Nginx is a popular open-source web server. At its core, nginx uses an asynchronous event-driven non-blocking architecture to handle requests. It uses I/O multiplexing and event notifications so that one worker process can juggle between multiple connections rather than the previous 1 thread/process per connection (initially used by Apache). This allows it to scale to handle many concurrent connections with a low memory footprint.

A little history: nginx was created to tackle the C10K problem of achieving 10,000 simultaneous connections for a web server. It is suitable for nonlinear scalability which is a root requirement of a scalable web server.



Components of nginx:

Here’s a brief lifecycle of a request:



The core of nginx is responsible for maintaining a tight run-loop and executing appropriate sections of modules' code at each stage of request processing. Modules in nginx take care of most of the application layer work. They read from and write to the network and storage, transform content, do outbound filtering, apply server-side include actions, and pass the requests to the upstream servers when proxying is activated. nginx modules come in slightly different incarnations, namely core modules, event modules, phase handlers, protocols, variable handlers, filters, upstreams, and load balancers.

To read more about the internals of nginx:

How connections are handled



  1. The worker waits for events on the listen and connection sockets.
  2. Events occur on the sockets and the worker handles them:
    • An event on the listen socket means that a client has started a new chess game. The worker creates a new connection socket.
    • An event on a connection socket means that the client has made a new move. The worker responds promptly.

Diving Deeper

Let’s dive a bit deeper into the code and understand the flow of some critical pieces here.

The ngx_spawn_process launches worker processes.

// src/os/unix/ngx_process.c

ngx_spawn_process(ngx_cycle_t *cycle, ngx_spawn_proc_pt proc, void *data,
    char *name, ngx_int_t respawn)

    pid = fork();

    switch (pid) {

    case -1:
        ngx_log_error(NGX_LOG_ALERT, cycle->log, ngx_errno,
                      "fork() failed while spawning \"%s\"", name);
        ngx_close_channel(ngx_processes[s].channel, cycle->log);
        return NGX_INVALID_PID;

    case 0:
        ngx_parent = ngx_pid;
        ngx_pid = ngx_getpid();
        proc(cycle, data);

The proc function passed above is ngx_worker_process_cycle which is the event loop. The core logic of the listening loop is:

// src/os/unix/ngx_process_cycle.c 

for ( ;; ) {
    ngx_log_debug0(NGX_LOG_DEBUG_EVENT, cycle->log, 0, "worker cycle");


    if (ngx_terminate || ngx_quit) {
        ngx_log_error(NGX_LOG_NOTICE, cycle->log, 0, "exiting");

    if (idle) {
        ngx_log_debug0(NGX_LOG_DEBUG_EVENT, cycle->log, 0, "idle worker");

NGINX provides the following connection methods which is set in the ngx_event_core_init_conf method in ngx_event.c :

These are used for I/O multiplexing and event notification which is at the core of nginx. For more about I/O multiplexing read

Few more observations:







But hang on, there’s more…

Socket Sharding

man 7 socket

man 7 socket

The socket() interface offers what’s called SO_REUSEPORT functionality which enables multiple sockets to be bound to the same address, effectively listening from the same PORT. Note that all processes binding should have the same UID (for security to prevent port hijacking). This allows “load distribution” by using different listener sockets for each thread.

This article from goes deeper into discussing SO_REUSEPORT. As mentioned, the 2 traditional approaches in load balancing are :

  1. Have a separate thread call accept() and distribute across other workers. Here the listening thread becomes the bottleneck
  2. Have all threads wait on accept(). Under high load, incoming connections may be distributed unevenly across all threads. This also is a concern in the thundering-herd problem

By contrast, SO_REUSEPORT allows for even distribution of load across all the listening threads/processes. The kernel takes care of this distribution.

Check this article for a deeper discussion and numbers of how the load balancing plays, and also some of the issues with this approach. Unfortunately in high-load situations, the latency distribution might degrade even for SO_REUSEPORT. The best approach seems to be to use epoll with FIFO behavior and EPOLLEXCLUSIVE flag noted in this article. Cloudflare seems to address these issues in its NGINX replacement, Pingora.

Closing Notes