Sudden "cannot execute binary file" error in Google Cloud Batch (Since March 1st)

Hi,

I’m reaching out to see if anyone else has been experiencing issues with Google Cloud Batch starting around March 1st.

I have a stable job that has been running without any issues, but suddenly it started failing with the following error:
textPayload: “/bin/bash: /bin/bash: cannot execute binary file”

This is happening even though I haven’t changed my job configuration or my Docker images. Here is our setup:

  • Machine Type: n1-standard-8

  • GPU: NVIDIA Tesla T4 (1 count)

  • Region: us-central1

  • Image: Custom image based on Standard Ubuntu

What I have tried so far:

  1. Architecture Check: Verified that both the instance (n1) and the Docker image are using amd64. I confirmed this by running docker inspect on the image.

  2. Entrypoint Variations: I tried changing the entrypoint to /bin/bash, /bin/sh, and directly to python3.11, but all of them result in the same “cannot execute binary file” error.

  3. Base Image Change: I tried switching from a vanilla Ubuntu image to other types of images, but the problem persists.

  4. Clean Build: I rebuilt the image from scratch using --no-cache and pushed it to the registry again, but it didn’t help.

It seems like the system is suddenly failing to recognize standard binaries as executable within the container runtime. Since this started on March 1st without any changes on my side, we suspect there might have been a platform-level update that affected how entrypoints are handled.

Is anyone else seeing this behavior? Any suggestions on how to resolve this would be greatly appreciated.

Thanks!

The issue has been resolved without any changes to the execution environment.

Hi Komei_Soda,

I wanted to follow up and apologize for the delayed response, but also to provide some clarity on the ‘cannot execute binary file’ error you encountered in early March.

Your suspicion was correct—it was indeed a platform-level update. We had introduced a change to improve GPU driver discovery that unintentionally interfered with how standard binaries were executed within the container runtime.

We rolled that specific change back on March 3rd, which aligns with when you saw the issue resolve itself. Since then, we have deployed a refined, permanent fix (as of March 24th) that is designed to provide those GPU improvements without affecting your standard execution environment.

Thank you for reporting this and for your patience while we stabilized the service. Please let us know if you encounter any further unexpected behavior.