Hello! I recently followed the steps at Expanding Apigee to multiple regions | Google Cloud to add another Apigee instance in a new region. Our original Apigee instance is in us-central1 and the new one we’ve spun up is in asia-northeast1. We’re using the PSC routing setup.
All of our services are running in a GKE cluster in us-central1 and are exposed via nginx ingresses, and the target servers in our Apigee environments point to their corresponding host IP addresses.
We’ve discovered that the new Apigee instance in asia-northeast1 is unable to connect to the target services in us-central1. Any incoming requests that get routed through the asia-northeast1 instance end up returning a 504 after timing out on the Target Request Flow Started started:
I’m not really sure how to proceed here – how can we configure our Apigee or our services such that Apigee can access services running in a cluster in a different region?
1 Like
Hello @NickHumeAI ,
A couple points on the above:
- What does your VPC set up look like - is there a specific route (ie: via VPC peering) to the service located in us-central1? As a reminder VPC peering is non-transitive, which means that an explicit route would need to be defined through your networking topology
- All Kubernetes resources created are non-regional, correct?
The other option would be to verify your networking connectivity by installing a Virtual Machine in the same region in issue (asia-northeast1) and testing the target service via ping/telnet/etc (and reviewing the networking trace/results)
Please let me know if you have any questions and/or concerns - thanks!
1 Like
Hey! Thanks for the response.
I did already do the VM thing – and indeed discovered that we could hit services within the same region but not outside.
And after doing more research/testing, I did realize that our Kubernetes services are managed with the gce-internal class of ingress, which as far as I understand it, only creates regional internal load balancers, meaning they cannot service traffic cross region. I believe this is the crux of our issue, as otherwise all services (and Apigee NEGs) are running within the same VPC.
My conclusion is that we should likely take this as an opportunity to migrate to the the GKE version of Gateway API so that we can enable global load balancers for our Kubernetes services. If there is a much simpler way to resolve to resolve this, though, I would definitely be very open to suggestions!
I will also say the other wrinkle is that we were unable to use the gce-interal ingresses for one of our services due to an issue with websocket support, so that one is nginx based. At least – that’s what I’ve been told, since the person who originally set that up left before I joined the company. I’m hopeful that the Gateway API switch will also allow us to switch that service over without any websocket issues.
I’d also be interested to know if you have any recommendations on minimizing (or even eliminating) downtime in making this migration (assuming you do think that a Gateway API migration is the lowest friction option). Right now my thought was to spin up a Gateway API ingress side-by-side with the existing ingresses with new IPs, having both point to our respective services, and then just do a cut-over in the Apigee environment configuration to point to the new IP addresses. Would that be relatively smooth? Any other thoughts would be great.
1 Like
I see there is also a “global” option for “dynamic routing” in the VPC settings. Would this solve our problem more simply? What would the other implications be? I’m having a surprisingly difficult time finding helpful information on this setting.
Hey @NickHumeAI you are correct in that the gce-internal class of ingress OOTB is a regional resource only, and I would absolutely recommend migrating to the newer Gateway API for a wide variety of reasons (WebSocket support, more robust traffic routing patterns, etc)
As per rollout, most of what you mentioned I would recommend (ie: some facet of a blue green deployment, where you have multiple ingress points talking to the same service). From an Apigee POV you can use the Weighted load balancing type to orchestrate the traffic to the two services, slowly upping the traffic to new (and decreasing should something unexpected occur): Load balancing across backend servers | Apigee | Google Cloud
Let me know your thoughts!
Thank you so much @hartmann ! Do you have any thoughts on what the global dynamic routing setting would accomplish, if anything? I was wondering if that might be a useful immediate stopgap, while I work on migrating to the Gateway API.
Hey @NickHumeAI This could be a viable stop gap assuming all resources are running in the same VPC (which I believe you verified to be true). I don’t know if there are any other regional-only resources within your environment/networking constructs, but at a minimum setting the dynamic routing to global would publish your ingress IP address across all regions in your VPC. I would still recommend migrating to the Gateway API regardless -thanks!
Thanks, @hartmann , this has been extremely helpful. I’m in the midst of migrating to the Gateway API, and there’s one more wrench that I’ve encountered – we were using an nginx ingress for our websocket services, and there are some configurations we’ve been using that are critical for our setup. I’m wondering if you know whether there are analogous configurations in the Gateway API world?
As I understand it, some of these configurations should be replicable, but it’s not clear to me that all of them are still doable in the Gateway universe. Our current ingress looks like:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: {{ .Values.name }}-internal
labels:
app.kubernetes.io/managed-by: {{ .Release.Service }}
annotations:
kubernetes.io/ingress.regional-static-ip-name: {{ .Values.name }}-internal
nginx.org/websocket-services: "{{ .Values.name }}-service"
nginx.org/max-conns: "{{ .Values.max_conns }}"
nginx.org/proxy-read-timeout: "3600s"
nginx.org/proxy-send-timeout: "3600s"
nginx.org/client-max-body-size: "30m"
spec:
ingressClassName: nginx
rules:
- host: {{ .Values.host_name }}
http:
paths:
- path: /
pathType: ImplementationSpecific
backend:
service:
name: {{ .Values.name }}-service
port:
number: 80
And I’m trying to figure out how we can configure the same setup in Gateway. So far, as I understand it:
kubernetes.io/ingress.regional-static-ip-name is easily replaced by addresses in the Gateway
nginx.org/websocket-services is not necessary since Gateway supports websockets out of the box.
nginx.org/max-conns has no direct analog, but we can approximate it using maxRatePerEndpoint on a GCPBackendPolicy, sort of. Is that correct? It’s definitely not ideal, so it would be great if there were something for maximum number of concurrent connections.
nginx.org/proxy-read-timeout and nginx.org/proxy-send-timeout can be replaced collectively by timeoutSec in the GCPBackendPolicy.
nginx.org/client-max-body-size – I could not find any way to replace this value. Any suggestions?
Any thoughts you have on the above would be super appreciated. Let me know!
@hartmann Just want to follow up quickly on the previous to see if you have any suggestions! Would be nice to have some more visibility into analogous configuration options when migrating to Gateway API in general.
@hartmann one more time to see if you have any suggestions for me here. I’m blocked on migrating to Gateway API if I can’t have confidence that it won’t break our current ingress configurations.