Cloud Batch job passing unexpected arguments to runnables

I have a cloud batch job which first has two runnables:

  1. execute a bash script to install pinned GPU drivers (the install_gpu_drivers = True option does not work for me as it installed CUDA13 drivers)
  2. a python script using dask to perform a data science process, using the GPUs

Since 24th February which is coincidentally when batch-cos-stable-official-20260218-01-p00 was released, the job now leaks arguments from the first bash script runnable to the second, which causes it to fail (with unrecognised arguments errors).

I can’t be the only one with this problem, yet there is no fix?

Actually it isn’t leaking variables, but the batch-agent is possibly passing additional variables to the second task. This is new behaviour as of 24th February: what’s going on?

and today my jobs failed because the ENTRYPOINT in the docker container isn’t being respected any more.

I’ve had to manually specify it in the batch job params.

What is going on with this service? Is it stable?

Hi output4616,

First, please accept our apologies for the late response and the frustration these issues caused while you were waiting for a resolution.

We have been investigating the specific behaviors you reported, and I can now provide some clarity on what happened:

  • Argument “Leaking” (Feb 24th): You were correct—an internal update intended to improve GPU driver discovery on Cloud Batch (specifically for COS images) unintentionally altered how container commands and arguments were wrapped.

  • This problematic change was rolled back, and have since deployed a permanent fix that resolves the GPU discovery issue without affecting standard container execution.

  • ENTRYPOINT Issues (March 19th): It seems that the new logic introduced a side effect causing the service to stop respecting container ENTRYPOINTs.

    Could you let us know if you are still seeing this issue today? If your jobs are still failing to respect the ENTRYPOINT or requiring manual overrides, we want to know so we can refine the logic and provide a proper fix that doesn’t require workarounds.

Thank you for your patience and for surfacing this.