Hi.
I’ve a Cloud Run service where I mount two volumes from a VM implementing an NFS server. It’s very simple and straightforward, no fancy config.
The VM and the CR service are on the same subnet, there’s a firewall rule allowing TCP+UDP/2049 from CR to VM. The volumes seems to be mounted correctly, ss -tuna | grep :2049 on the server shows open connections from several IPs.
Also, when I start a new revision, if I type an invalid mountpoint it fails to start, so I expect the configuration and the connection to be correct.
I made test on a VM in the same subnet of the NFS and of the Cloud Run instance, the NFS shares can be mounted, files listed, created and deleted.
Unfortunately, when the CR application (PHP Symfony app) attempts to access a mounted path it crashes and I get the following error in the logs:
textPayload: “terminated: Application failed to start: container 1: failed to mount volume (type: nfs, name: datastore-staging): The NFS server may not be reachable. Check your VPC connectivity and firewall settings.”
I’m struggling to debug the issue, and I ran out of ideas on what could be wrong in the setup. Any hint on what could be wrong or what could be used to debug is welcome.
The NFS shares are actually mounted, and I can access them from within the app. But occasionally I get the NFS mount error in the logs, and I cannot figure out where it comes from. As I said, the NFS works and it’s correctly mounted.
My guess is that the NFS server is not able to handle the client capacity, but the application is very low traffic and it shouldn’t really create an overload on the NFS server. It has a server load around 0.10 and 30% RAM occupation. I also increased the number of NFS threads from the default 8 to 24, but nothing changed.
I did some debugging on the NFS side, I increased grace and reduced lease, increased threads number but the issue persists.
I’ve set the maximum number of CR instances to be 1, to exclude overload or concurriencies that could somehow impact the NFS server, but no change.
Around every ~3/4 minutes I get the error in the Cloud Run logs. It seems like when CR spins up a new container or recycle existing ones the error is thrown.
As said, running container have no problems in mounting and accessing the NFS positions.