Understanding Load Balancing in Express.js

Understanding Load Balancing in Express.js

This Blog explains what is load balancing and how it fits into Express.js

January 15, 2026

Why Do We Need Load Balancing?

Before diving into what load balancing is, let’s first understand why it is needed.

Suppose we have a single Express backend server running on a machine with 2 CPU cores and 4GB RAM. Initially, this setup works fine. But as the number of users grows, the server starts receiving a huge number of concurrent requests.

Because of its limited resources, the server may:

  • Respond slowly
  • Drop requests
  • Crash under heavy load

So the question becomes: How do we scale our backend to handle more traffic?

  • There are two common approaches to solve this problem:
    1. Vertical Scaling
    1. Horizontal Scaling
  1. Vertical Scaling
  • In Vertical Scaling, we increase the resources of the same machine. For example: From 2 CPU, 4GB RAM To 4 CPU, 8GB RAM In simple terms, we are adding more power to the existing server.

Problems with Vertical Scaling

  • i. There is a hard upper limit to how much we can upgrade a machine
  • ii. It is usually expensive
  • iii. The system still has a single point of failure If this server goes down, the entire application goes down

Vertical scaling works only up to a certain point and is not ideal for large-scale systems.

  1. Horizontal Scaling

In Horizontal Scaling, instead of upgrading one server, we spin up multiple instances of the same Express application. So instead of: 1 Express Server We now have: Multiple Express Servers running the same app Each server can handle requests independently, which greatly improves availability and fault tolerance.

The New Problem

Now a new question arises:

  • How do we make sure that no single server gets overwhelmed while others stay idle?

If all requests go to just one server, horizontal scaling loses its purpose.

This is where Load Balancing comes in.

What is Load Balancing?
What is Load Balancing?

Load Balancing refers to the ability to distribute incoming traffic across multiple servers in a controlled and efficient manner.

Suppose we have three instances of the same server application:

  • Server S1
  • Server S2
  • Server S3

All three servers are capable of handling client requests. Instead of sending requests directly to these servers, we place a Load Balancer in front of them.

How it Works

  • All client requests first reach the Load Balancer
  • The Load Balancer decides which server should handle the request
  • The request is then forwarded to one of the servers (S1, S2, or S3)

The selected server processes the request and sends the response back

The Load Balancer makes this decision using predefined strategies to ensure:

  • Traffic is evenly distributed
  • No single server gets overwhelmed
  • Other servers are not left idle
Writing More Sections...
Written with ❤️ by Akarsh Jha