Application Load Balancing in Google Cloud is incredibly powerful right out of the box. But as your architecture scales, basic traffic routing often isn’t enough. You will inevitably face challenges like routing complex multi-service paths, managing cross-regional latency, and optimizing how individual VMs receive requests.
To solve these problems, GCP provides specific control points to precisely manipulate how traffic flows—from the user’s browser down to a single container or VM.
Here is a look under the hood at how Application Load Balancing works, focusing on three critical (but often misunderstood) configuration elements:
- URL Map
- serviceLBPolicy
- localityLBPolicy
1. URL Map: The main router
Sometimes a single backend service is not enough. When multiple microservices power your application, you need a way to inspect incoming HTTP(S) requests (hostnames, paths, headers) and route them to the correct downstream service.
The URL Map is the core routing brain of the Application Load Balancer. It evaluates the incoming request and uses rules to map it to a specific Backend Service or one of weighted BackendServices.
Configuration Example:
resource "google_compute_url_map" "default" {
name = "my-app-url-map"
default_service = google_compute_backend_service.default.id
host_rule {
hosts = ["``api.example.com``"]
path_matcher = "api-paths"
}
path_matcher {
name = "api-paths"
default_service = google_compute_backend_service.api_backend.id
# Route specific paths to specific microservices
path_rule {
paths = ["/v1/users/*"]
service = google_compute_backend_service.users_backend.id
}
}
}
2. serviceLBPolicy: The global traffic distributor
When your application spans multiple regions or zones, relying purely on default traffic distribution isn’t always optimal. You might want to heavily prioritize keeping traffic in the same zone to minimize latency, or you might want to evenly spray traffic globally across all available capacity to avoid sudden regional spikes.
Attached to the Backend Service, the serviceLBPolicy governs the global load balancing algorithms used to distribute traffic across your Instance Groups (IGs) or Network Endpoint Groups (NEGs).
The primary algorithms available are:
-
WATERFALL_BY_REGION (Default): Directs traffic to the nearest region with capacity, evenly loading all backend groups in that region before spilling over to the next closest region.
-
WATERFALL_BY_ZONE: Tries to keep traffic strictly within the single zone closest to the client. It only spills over to other zones if the primary zone runs out of capacity.
-
SPRAY_TO_REGION: Spreads a client’s traffic across all backend groups within the closest region, rather than just hitting a single localized group.
Note: For even finer control over your global routing, serviceLBPolicy also allows you to configure Auto-Capacity Draining (to quickly route traffic away from degrading backends) and custom Failover Thresholds (setting the exact percentage at which traffic should spill over).
Configuration Example:
resource "google_network_services_service_lb_policy" "default" {
name = "my-global-policy"
location = "global"
load_balancing_algorithm = "WATERFALL_BY_ZONE"
# Optional control functionality
auto_capacity_drain {
enable = true
}
failover_config {
failover_health_threshold = 70
}
}
3. localityLBPolicy: The local distributor
Once the URL Map and serviceLBPolicy have routed traffic to a specific local backend (like a NEG or Instance Group), how do you decide which exact VM or container gets the request? A simple round-robin isn’t always ideal, especially for caching tiers or workloads with highly variable processing times.
localityLBPolicy defines the load balancing algorithm used within the scope of the local instance group or NEG.
-
ROUND_ROBIN: Sequential distribution (the default).
-
LEAST_REQUEST: Sends traffic to the endpoint with the fewest active requests. Excellent for variable-length tasks to prevent bottlenecks on a single VM.
-
RING_HASH / MAGLEV: Provides consistent hashing based on request characteristics (like user ID or IP). This is crucial for caching backends so the same user reliably hits the same cache node.
-
WEIGHTED_ROUND_ROBIN: Distributes traffic based on custom utilization metrics reported by the instances themselves.
Configuration Example:
resource "google_compute_backend_service" "default" {
name = "my-local-backend"
load_balancing_scheme = "EXTERNAL_MANAGED"
# Optimize for active connections rather than strict sequential routing
locality_lb_policy = "LEAST_REQUEST"
backend {
group = google_compute_network_endpoint_group.my_neg.id
}
}
Summary
By mastering these three control points, you transition from basic point-to-point routing to highly resilient, globally optimized traffic management. The URL Map dictates what service handles the request, serviceLBPolicy governs where in the world that traffic should be distributed, and localityLBPolicy ensures your individual instances are utilized as efficiently as possible.
Specific implementation and results of these policies may vary significantly based on your unique use cases and traffic patterns; therefore, it is always best to consult the official documentation for the latest guidance and technical limits.

