Authentication Script Error in Google Batch When Using Multiple Container Registries

I use Google Batch as the backend for a widely used bioinformatics workflow engine — Cromwell, maintained by the Broad Institute.

Starting last week, I began experiencing issues with job startup in Google Batch. Whenever there are Runnables that reference images from different registries, the authentication script fails because it inserts a newline character where a comma should be.

I’ve attached screenshots with the logs and the script — which I believe comes from the VM OS image — highlighting the exact point where the error occurs.

These are some of the log entries from the failed job:

version:"cloud-batch-agent_20250924.00_p00"
os_release:{key:"BUILD_ID" value:"18613.263.4"}
os_release:{key:"ID" value:"cos"}
os_release:{key:"NAME" value:"Container-Optimized OS"}
os_release:{key:"VERSION" value:"117"}
os_release:{key:"VERSION_ID" value:"117"}
image_version:"batch-cos-stable-20250929-00-p00"

Can anyone help me figure this out?

2 Likes

I was able to reproduce the issue previously reported with Google Cloud Batch, even without using Cromwell, by running the following command:

gcloud beta batch jobs submit job-mgqwh0ci1 --location us-central1 --config - <<EOD
{"name":"projects/project_id/locations/us-central1/jobs/job-mgqwh0ci1","taskGroups":[{"taskCount":"1","parallelism":"1","taskSpec":{"computeResource":{"cpuMilli":"1000","memoryMib":"512"},"runnables":[{"container":{"imageUri":"quay.io/aptible/hello-world:latest","entrypoint":"","volumes":[]}}, {"container":{"imageUri":"public.ecr.aws/amazonlinux/amazonlinux:latest","entrypoint":"","volumes":[]}}, {"container":{"imageUri":"gcr.io/google-containers/busybox","entrypoint":"","volumes":[]}}, {"container":{"imageUri":"us-docker.pkg.dev/cloudrun/container/hello-job","entrypoint":"","volumes":[]}}],"volumes":[]}}],"allocationPolicy":{"instances":[{"policy":{"provisioningModel":"STANDARD","machineType":"e2-medium"}}]},"logsPolicy":{"destination":"CLOUD_LOGGING"}}
EOD

The generated internal setup script fails to configure the Google Artifact Registry properly because it inserts a newline instead of a comma in the Docker registry list section. This causes a syntax error at line 37 of the automatically generated script.

Excerpt from the faulty script:

  REGISTRIES=gcr.io
  us-docker.pkg.dev

It should instead be:

  REGISTRIES=gcr.io,us-docker.pkg.dev

This syntax error prevents the script from correctly registering the Artifact Registry (us-docker.pkg.dev) as a valid Docker credential helper target. As a result, pulling images from Artifact Registry fails with an authentication or configuration error.

Reproduction steps:

  1. Run the provided gcloud beta batch jobs submit command.

  2. Observe that the Batch job fails during container initialization.

  3. Inspect the job logs — note the incorrect line break in the Docker credential helper configuration.

Expected behavior:
The REGISTRIES variable should include all registries separated by commas, e.g.:

REGISTRIES=gcr.io,us-docker.pkg.dev

Impact:
This bug prevents Google Cloud Batch from pulling images hosted in Google Artifact Registry when more than one registry is configured.

Environment:

  • Service: Google Cloud Batch (Beta)

  • Region: us-central1

  • Example job ID: job-mgqwh0ci1

  • Reproduced without Cromwell

Suggested fix:
Ensure that the script generating the REGISTRIES variable joins multiple registries using commas rather than newlines.

The whole script:

# This script helps to install docker credential helper for jobs using GCR or
# GAR images.
# It has been tested successfully with the following operating systems:
# - Debian GNU/Linux 10 (buster, amd64 built on 20220317, supports Shielded VM features)
# - Debian GNU/Linux 11 (bullseye, amd64 built on 20220822, supports Shielded VM features)
# - Debian GNU/Linux 11 (bullseye, arm64 built on 20230912, supports Shielded VM features)
# - CentOS 7 (x86_64 built on 20220406, supports Shielded VM features)
# - Rocky 8 (x86_64 built on 20240111, supports Shielded VM features)
# It may or may not work correctly with other operating systems and versions.

docker_credential_gcr_installed() {
  echo \"[BATCH Docker Credential GCR Installation]: Checking for existing docker-credential-gcr installation.\"
  docker_credential_gcr_version=$(docker-credential-gcr version 2> /dev/null || echo \"\")
  if [ ! -z \"$docker_credential_gcr_version\" ]; then
    echo \"[BATCH Docker Credential GCR Installation]: Found docker-credential-gcr with version $docker_credential_gcr_version.\"
    return 0
  fi
  echo \"[BATCH Docker Credential GCR Installation]: No docker-credential-gcr found, will install.\"
  return 1
}

checkDockerImagePullError() {
  result=$?
  if [ $result != 0 ]; then
    echo \"[Batch Action] Docker image pull error.\"
    return $result;
  fi
}

configure_docker_credential_helper() {
  OSID=\"$(. /etc/os-release && echo \"$ID\")\"
  OSVERSION=\"$(. /etc/os-release && echo \"$VERSION_ID\")\"
  if [ ! \"$OSID\" = \"debian\" ] && [ ! -f /etc/centos-release ] && [ ! -f /etc/rocky-release ] && ! grep -qi cos /etc/os-release; then
    echo '[Batch Docker Credential Helper Installation] Warning: this is not a tested operating system, the installation may fail.'
  fi
  REGISTRIES=gcr.io
  us-docker.pkg.dev
  if [ -f /etc/centos-release ]; then
    if [ ! -x /usr/bin/python3 ]; then
      # CentOS 7 does not have pre-installed python3.
      yum install -y python3
    fi
  fi

  # docker-credential-gcr is pre-installed in Batch custom images.
  if ! grep -qi cos /etc/os-release; then
    if ! docker_credential_gcr_installed; then
      MACHINE=\"$(uname -m)\"
      if [ \"$OSID\" = \"rocky\" ] && [ \"${OSVERSION%%.*}\" = \"8\" ]; then
        # Use python3.9 as Rocky Linux 8 pre-installed python3.6 which does not work with CLOUD SDK.
        echo '[Batch Docker Credential Helper Installation]: Install python3.9 for Rocky Linux 8.'
        dnf install -y python39
        CLOUDSDK_PYTHON=/usr/bin/python3.9 gsutil cp gs://batch-agent-prod-us/docker-credential-gcr-tool/docker-credential-gcr-$MACHINE.tar.gz docker-credential-gcr.tar.gz
      else
        CLOUDSDK_PYTHON=/usr/bin/python3 gsutil cp gs://batch-agent-prod-us/docker-credential-gcr-tool/docker-credential-gcr-$MACHINE.tar.gz docker-credential-gcr.tar.gz
      fi
      tar -xzf docker-credential-gcr.tar.gz
      chmod +x docker-credential-gcr
      cp docker-credential-gcr /usr/local/bin/
      if [ -x /usr/local/bin/docker-credential-gcr ]; then
        echo \"[BATCH Docker Credential Helper]: Docker Credential Helper installed.\"
      else
        echo \"[BATCH Docker Credential Helper]: Docker Credential Helper installation failed.\"
        return 1  # Indicate failure
      fi
    fi

    if [ ! -z $REGISTRIES ]; then
      PATH=$PATH:/usr/local/bin docker-credential-gcr configure-docker --registries=$REGISTRIES
    fi
    PATH=$PATH:/usr/local/bin docker pull gcr.io/google-containers/busybox
    if ! checkDockerImagePullError; then return 1; fi
    PATH=$PATH:/usr/local/bin docker pull public.ecr.aws/amazonlinux/amazonlinux:latest
    if ! checkDockerImagePullError; then return 1; fi
    PATH=$PATH:/usr/local/bin docker pull quay.io/aptible/hello-world:latest
    if ! checkDockerImagePullError; then return 1; fi
    PATH=$PATH:/usr/local/bin docker pull us-docker.pkg.dev/cloudrun/container/hello-job
    if ! checkDockerImagePullError; then return 1; fi
  else
    if [ ! -z $REGISTRIES ]; then
      sudo HOME=/var/lib/google docker-credential-gcr configure-docker --registries=$REGISTRIES
    fi
    sudo HOME=/var/lib/google docker pull gcr.io/google-containers/busybox
    if ! checkDockerImagePullError; then return 1; fi
    sudo HOME=/var/lib/google docker pull public.ecr.aws/amazonlinux/amazonlinux:latest
    if ! checkDockerImagePullError; then return 1; fi
    sudo HOME=/var/lib/google docker pull quay.io/aptible/hello-world:latest
    if ! checkDockerImagePullError; then return 1; fi
    sudo HOME=/var/lib/google docker pull us-docker.pkg.dev/cloudrun/container/hello-job
    if ! checkDockerImagePullError; then return 1; fi
  fi
}

# Try a given function with n times including retries with exponential back off.
# function signature: retry retryTimes description functionName
retry() {
  local tries=$1
  local count=0
  local description=$2
  local wait=1
  until \"$3\"; do
    exit=$?
    # If failed, wait 3 ** count seconds until next retry.
    wait=$(($wait*3))
    count=$(($count + 1))
    if [ $count -lt $tries ]; then
      echo \"[Batch Action] $description exited $exit (tried $count/$tries), retrying in $wait seconds.\"
      sleep $wait
    else
      echo \"[Batch Action] $description exited $exit (tried $count/$tries), no more retries left.\"
      return $exit
    fi
  done
  echo \"[Batch Action] $description succeeded.\"
  return 0
}

retry 4 \"Docker credential helper\" configure_docker_credential_helper


1 Like