Implementation mechanism and precautions of DNS round-robin
DNS round-robin, simply put, involves configuring multiple IP addresses for the same domain name on a DNS server. When a user initiates a resolution request, the DNS server returns different IP addresses in a round-robin order, thus distributing the access request among multiple servers. For example, imagine you're at a cafeteria with three windows. The cook takes turns calling out, "Window number one, window number two, window number three," and you go to the window that calls out the number. This is the most basic way DNS round-robin works. The first user accesses your website, the DNS server returns 192.168.1.10; the second user accesses, it returns 192.168.1.11; the third user accesses, it returns 192.168.1.12; and when the fourth user comes, it starts again from the beginning, returning to 192.168.1.10. Statistically, the traffic is thus roughly evenly distributed across the three servers.
Achieving this effect is actually quite simple. You don't need to install any additional software on the server, nor do you need to buy any expensive hardware. Simply add multiple A records for the same hostname in your domain's DNS management backend. For example, if you already have one A record pointing www.yourcompany.com to 203.0.113.10, and now you've bought two new servers with IPs of 203.0.113.11 and 203.0.113.12 respectively, then in your DNS management backend, add three A records for the same hostname www.yourcompany.com, pointing to each of the three IP addresses. After saving and applying the changes, DNS load balancing is essentially in effect. This is the most attractive aspect of DNS round-robin—simple configuration, extremely low cost, and immediate results.
Of course, the simplest round-robin is just the beginning. In a real production environment, your server configurations may vary; some machines are powerful, while others are older. In this case, you need weighted round-robin. Weighted round-robin allows you to assign a weight to each server; higher-performing machines have a higher weight and are more likely to be selected. For example, if you have two new machines and one old machine, with the new machine having a weight of 2 and the old machine a weight of 1, the DNS system will allocate traffic in a 2:2:1 ratio when returning IP addresses, allowing the high-performance machine to handle more load. Most cloud service providers allow you to intuitively set these weights in their DNS consoles. Another method is called location-based round-robin, which returns the IP address of the server closest to the user's location. This is very useful in CDN and globally deployed scenarios, significantly reducing network latency and improving user experience.
However, while DNS round-robin is simple and easy to use, it's not a panacea. It has its own unavoidable limitations, and if you're unaware of these pitfalls, it's easy to encounter problems at critical moments.
The most fatal problem is the lack of health checks. DNS servers themselves do not have the ability to detect the health status of backend servers. It doesn't know if the server you configured is down, unresponsive, or restarting. It will continuously allocate traffic to a "dead" server, causing some users to be unable to access the website until you manually remove the faulty IP from the DNS records. In other words, if you have three servers and one of them goes down, DNS round-robin will still return the failed IP to about one-third of new users, who will then see an inaccessible page. This is a very serious problem. Fortunately, most mainstream commercial DNS services now offer health checks, periodically probing a specific port or URL path on the backend server. If a service anomaly is detected, the faulty IP is automatically and temporarily removed from the response list, and added back once the server recovers. Therefore, if you're using DNS round-robin, be sure to choose a service provider that offers health checks.
Another headache is DNS caching. To speed up resolution, DNS query results are cached at multiple levels—your browser caches them, your operating system caches them, your local router caches them, and your internet service provider's DNS server caches them. Each cached record has a TTL (Time To Live) value, which determines how long the record is considered valid. If the TTL is set too long, such as several hours, even if you urgently remove a faulty IP from the DNS, users who have cached that IP will still attempt to connect to the faulty server before the TTL expires. Conversely, setting the TTL too short, such as a few seconds, while speeding up failover, significantly increases DNS lookup load and slightly increases user access latency. This presents a dilemma, requiring a balance between stability and flexibility. A common practice is to normally set the TTL to around one hour, then reduce it to 300 seconds or even 60 seconds before planned server migrations or changes, and restore it to normal after the change is complete.
Another often overlooked issue is session persistence. Session persistence is crucial if your website requires user login, such as e-commerce sites, social media, or internal enterprise systems. If a user's first request is assigned to server A and a login session is created, but their second request is assigned to server B due to DNS round-robin, and server B doesn't have the user's session information, the user will be logged out, or items in their shopping cart may suddenly disappear. This problem becomes more complex due to differences in the behavior of different DNS servers. Some DNS servers randomly shuffle the IP list, while others simply use round-robin, resulting in the client's actual IP address being completely uncontrollable. There are two common solutions to this problem: one is to store session data in a shared central storage, such as a Redis cluster or database, so that any server can access the user's session information; the other is to implement session stickiness at the load balancer level, ensuring that requests from the same user are always routed to the same backend server.
Furthermore, DNS round-robin faces an easily overlooked problem: inconsistent behavior among different DNS servers. Since round-robin is implemented at the DNS server level, different service providers may have different implementations. Some providers perform simple sequential round-robin, some randomly shuffle the list order, and some even dynamically adjust it according to load changes. This inconsistency directly leads to the client receiving a completely uncontrollable order of IP addresses, and the final actual traffic distribution may deviate significantly from your expectations.
From a practical application perspective, DNS round-robin is best suited for businesses that do not require high load balancing accuracy but are cost-sensitive. Examples include CDN distribution of static resources, file download services, non-login-based news websites, and internal systems of small and medium-sized enterprises. In these scenarios, DNS round-robin can achieve basic traffic distribution and redundancy capabilities at a very low cost. However, if your business involves scenarios requiring strict session persistence, such as user login, payment transactions, or real-time communication, or if you have high requirements for load balancing accuracy, then DNS round-robin may only be suitable as a coarse-grained distribution layer at the entry point. The backend will need to be paired with a more granular load balancer for actual routing.
Interestingly, although DNS round-robin is old, it hasn't been eliminated; instead, it plays an irreplaceable role in modern internet architecture. In large systems, DNS round-robin typically serves as the first-layer entry point for global traffic distribution, distributing traffic to load balancers in different data centers or regions. These load balancers then perform more granular traffic distribution internally. This layered architecture leverages the advantages of DNS round-robin, such as simplicity, low cost, and cross-regional distribution, while avoiding its shortcomings, such as insufficient accuracy and lack of session persistence. In short, DNS round-robin solves the "coarse-grained distribution" problem in load balancing, while the internal load balancer solves the "fine-grained distribution" problem. The two complement each other, building a complete multi-layered load balancing system.
CN
EN