Node mounted volumes as ReadOnly after upgrade

Hi all,

After an automatic upgrade from version 1.30.8 to 1.30.9, one or more of the nodes of the node pool mounted some volumes in read only mode causing our application to crash. It happened again when we updated from 1.30.10-gke.1070000 to 1.31.6-gke.1064001. It’s a bit strange because usually the first node of the pool seems not to be affected by the problem.

We have problems with both persistent volumes (nfs on Filestore with accessModes: ReadWriteMany) and emptyDir (medium: Memory) volumes.

The solution at the moment seems to be:

  • deactivate the automatic upgrade for the node pool
  • restart each single node of the node pool having this problem OR - create a new node pool with the upgraded version

Has anybody experienced the same problem? What can we do to solve the problem and reactivate automatic upgrades?

Thank you in advance

Looks like https://issuetracker.google.com/issues/398887931 is the public tracker for this issue.

Hi @firecloud ,

Welcome to Google Cloud Community!

The version 1.31.6-gke.1064001 is no longer available in the Rapid Channel, which may be causing the issues you’re experiencing with your persistent volumes due to incompatibilities or limitations in that specific version. We recommend upgrading your current GKE versions, as newer versions are now available for creating new clusters as well as for opt-in control plane and node upgrades on existing clusters.

For details on versioning and upgrade processes, refer to the GKE versioning and support documentation, and regularly check the GKE Release Notes to stay informed about known issues with specific versions.

Additionally, consider consulting with our Google Cloud Support for further guidance and assistance.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

We don’t use the Rapid Channel so I don’t know if that could be the problem. The 1.31.6-gke.1064001 version is the recommended version for the stable channel. We had the same problem also with an EmptyDir volume, not just a Persistent Volume.
Since the problem was solved by rebooting the node, which remounted the volumes, it suggests that there is an issue with the update procedure or, even worse, a problem with the creation of the node. It does appear to be at least similar to the issue reported in the public tracker.