Understanding Load Balancing in Express.js

Before diving into what load balancing is, let’s first understand why it is needed.

Suppose we have a single Express backend server running on a machine with 2 CPU cores and 4GB RAM. Initially, this setup works fine. But as the number of users grows, the server starts receiving a huge number of concurrent requests.

Because of its limited resources, the server may:

Respond slowly
Drop requests
Crash under heavy load

So the question becomes: How do we scale our backend to handle more traffic?

There are two common approaches to solve this problem:
1. Vertical Scaling
1. Horizontal Scaling

Vertical Scaling

In Vertical Scaling, we increase the resources of the same machine. For example: From 2 CPU, 4GB RAM To 4 CPU, 8GB RAM In simple terms, we are adding more power to the existing server.

Problems with Vertical Scaling

i. There is a hard upper limit to how much we can upgrade a machine
ii. It is usually expensive
iii. The system still has a single point of failure If this server goes down, the entire application goes down

Vertical scaling works only up to a certain point and is not ideal for large-scale systems.

Horizontal Scaling

In Horizontal Scaling, instead of upgrading one server, we spin up multiple instances of the same Express application. So instead of: 1 Express Server We now have: Multiple Express Servers running the same app Each server can handle requests independently, which greatly improves availability and fault tolerance.

The New Problem

Now a new question arises:

How do we make sure that no single server gets overwhelmed while others stay idle?

If all requests go to just one server, horizontal scaling loses its purpose.

This is where Load Balancing comes in.

Load Balancing refers to the ability to distribute incoming traffic across multiple servers in a controlled and efficient manner.

Suppose we have three instances of the same server application:

Server S1
Server S2
Server S3

All three servers are capable of handling client requests. Instead of sending requests directly to these servers, we place a Load Balancer in front of them.

How it Works

All client requests first reach the Load Balancer
The Load Balancer decides which server should handle the request
The request is then forwarded to one of the servers (S1, S2, or S3)

The selected server processes the request and sends the response back

The Load Balancer makes this decision using predefined strategies to ensure:

Traffic is evenly distributed
No single server gets overwhelmed
Other servers are not left idle