Authentication Script Error in Google Batch When Using Multiple Container Registries

I use Google Batch as the backend for a widely used bioinformatics workflow engine — Cromwell, maintained by the Broad Institute.

Starting last week, I began experiencing issues with job startup in Google Batch. Whenever there are Runnables that reference images from different registries, the authentication script fails because it inserts a newline character where a comma should be.

I’ve attached screenshots with the logs and the script — which I believe comes from the VM OS image — highlighting the exact point where the error occurs.

These are some of the log entries from the failed job:

version:"cloud-batch-agent_20250924.00_p00"
os_release:{key:"BUILD_ID" value:"18613.263.4"}
os_release:{key:"ID" value:"cos"}
os_release:{key:"NAME" value:"Container-Optimized OS"}
os_release:{key:"VERSION" value:"117"}
os_release:{key:"VERSION_ID" value:"117"}
image_version:"batch-cos-stable-20250929-00-p00"

Can anyone help me figure this out?

2 Likes

I was able to reproduce the issue previously reported with Google Cloud Batch, even without using Cromwell, by running the following command:

gcloud beta batch jobs submit job-mgqwh0ci1 --location us-central1 --config - <<EOD
{"name":"projects/project_id/locations/us-central1/jobs/job-mgqwh0ci1","taskGroups":[{"taskCount":"1","parallelism":"1","taskSpec":{"computeResource":{"cpuMilli":"1000","memoryMib":"512"},"runnables":[{"container":{"imageUri":"quay.io/aptible/hello-world:latest","entrypoint":"","volumes":[]}}, {"container":{"imageUri":"public.ecr.aws/amazonlinux/amazonlinux:latest","entrypoint":"","volumes":[]}}, {"container":{"imageUri":"gcr.io/google-containers/busybox","entrypoint":"","volumes":[]}}, {"container":{"imageUri":"us-docker.pkg.dev/cloudrun/container/hello-job","entrypoint":"","volumes":[]}}],"volumes":[]}}],"allocationPolicy":{"instances":[{"policy":{"provisioningModel":"STANDARD","machineType":"e2-medium"}}]},"logsPolicy":{"destination":"CLOUD_LOGGING"}}
EOD

The generated internal setup script fails to configure the Google Artifact Registry properly because it inserts a newline instead of a comma in the Docker registry list section. This causes a syntax error at line 37 of the automatically generated script.

Excerpt from the faulty script:

  REGISTRIES=gcr.io
  us-docker.pkg.dev

It should instead be:

  REGISTRIES=gcr.io,us-docker.pkg.dev

This syntax error prevents the script from correctly registering the Artifact Registry (us-docker.pkg.dev) as a valid Docker credential helper target. As a result, pulling images from Artifact Registry fails with an authentication or configuration error.

Reproduction steps:

  1. Run the provided gcloud beta batch jobs submit command.

  2. Observe that the Batch job fails during container initialization.

  3. Inspect the job logs — note the incorrect line break in the Docker credential helper configuration.

Expected behavior:
The REGISTRIES variable should include all registries separated by commas, e.g.:

REGISTRIES=gcr.io,us-docker.pkg.dev

Impact:
This bug prevents Google Cloud Batch from pulling images hosted in Google Artifact Registry when more than one registry is configured.

Environment:

  • Service: Google Cloud Batch (Beta)

  • Region: us-central1

  • Example job ID: job-mgqwh0ci1

  • Reproduced without Cromwell

Suggested fix:
Ensure that the script generating the REGISTRIES variable joins multiple registries using commas rather than newlines.

The whole script:

# This script helps to install docker credential helper for jobs using GCR or
# GAR images.
# It has been tested successfully with the following operating systems:
# - Debian GNU/Linux 10 (buster, amd64 built on 20220317, supports Shielded VM features)
# - Debian GNU/Linux 11 (bullseye, amd64 built on 20220822, supports Shielded VM features)
# - Debian GNU/Linux 11 (bullseye, arm64 built on 20230912, supports Shielded VM features)
# - CentOS 7 (x86_64 built on 20220406, supports Shielded VM features)
# - Rocky 8 (x86_64 built on 20240111, supports Shielded VM features)
# It may or may not work correctly with other operating systems and versions.

docker_credential_gcr_installed() {
  echo \"[BATCH Docker Credential GCR Installation]: Checking for existing docker-credential-gcr installation.\"
  docker_credential_gcr_version=$(docker-credential-gcr version 2> /dev/null || echo \"\")
  if [ ! -z \"$docker_credential_gcr_version\" ]; then
    echo \"[BATCH Docker Credential GCR Installation]: Found docker-credential-gcr with version $docker_credential_gcr_version.\"
    return 0
  fi
  echo \"[BATCH Docker Credential GCR Installation]: No docker-credential-gcr found, will install.\"
  return 1
}

checkDockerImagePullError() {
  result=$?
  if [ $result != 0 ]; then
    echo \"[Batch Action] Docker image pull error.\"
    return $result;
  fi
}

configure_docker_credential_helper() {
  OSID=\"$(. /etc/os-release && echo \"$ID\")\"
  OSVERSION=\"$(. /etc/os-release && echo \"$VERSION_ID\")\"
  if [ ! \"$OSID\" = \"debian\" ] && [ ! -f /etc/centos-release ] && [ ! -f /etc/rocky-release ] && ! grep -qi cos /etc/os-release; then
    echo '[Batch Docker Credential Helper Installation] Warning: this is not a tested operating system, the installation may fail.'
  fi
  REGISTRIES=gcr.io
  us-docker.pkg.dev
  if [ -f /etc/centos-release ]; then
    if [ ! -x /usr/bin/python3 ]; then
      # CentOS 7 does not have pre-installed python3.
      yum install -y python3
    fi
  fi

  # docker-credential-gcr is pre-installed in Batch custom images.
  if ! grep -qi cos /etc/os-release; then
    if ! docker_credential_gcr_installed; then
      MACHINE=\"$(uname -m)\"
      if [ \"$OSID\" = \"rocky\" ] && [ \"${OSVERSION%%.*}\" = \"8\" ]; then
        # Use python3.9 as Rocky Linux 8 pre-installed python3.6 which does not work with CLOUD SDK.
        echo '[Batch Docker Credential Helper Installation]: Install python3.9 for Rocky Linux 8.'
        dnf install -y python39
        CLOUDSDK_PYTHON=/usr/bin/python3.9 gsutil cp gs://batch-agent-prod-us/docker-credential-gcr-tool/docker-credential-gcr-$MACHINE.tar.gz docker-credential-gcr.tar.gz
      else
        CLOUDSDK_PYTHON=/usr/bin/python3 gsutil cp gs://batch-agent-prod-us/docker-credential-gcr-tool/docker-credential-gcr-$MACHINE.tar.gz docker-credential-gcr.tar.gz
      fi
      tar -xzf docker-credential-gcr.tar.gz
      chmod +x docker-credential-gcr
      cp docker-credential-gcr /usr/local/bin/
      if [ -x /usr/local/bin/docker-credential-gcr ]; then
        echo \"[BATCH Docker Credential Helper]: Docker Credential Helper installed.\"
      else
        echo \"[BATCH Docker Credential Helper]: Docker Credential Helper installation failed.\"
        return 1  # Indicate failure
      fi
    fi

    if [ ! -z $REGISTRIES ]; then
      PATH=$PATH:/usr/local/bin docker-credential-gcr configure-docker --registries=$REGISTRIES
    fi
    PATH=$PATH:/usr/local/bin docker pull gcr.io/google-containers/busybox
    if ! checkDockerImagePullError; then return 1; fi
    PATH=$PATH:/usr/local/bin docker pull public.ecr.aws/amazonlinux/amazonlinux:latest
    if ! checkDockerImagePullError; then return 1; fi
    PATH=$PATH:/usr/local/bin docker pull quay.io/aptible/hello-world:latest
    if ! checkDockerImagePullError; then return 1; fi
    PATH=$PATH:/usr/local/bin docker pull us-docker.pkg.dev/cloudrun/container/hello-job
    if ! checkDockerImagePullError; then return 1; fi
  else
    if [ ! -z $REGISTRIES ]; then
      sudo HOME=/var/lib/google docker-credential-gcr configure-docker --registries=$REGISTRIES
    fi
    sudo HOME=/var/lib/google docker pull gcr.io/google-containers/busybox
    if ! checkDockerImagePullError; then return 1; fi
    sudo HOME=/var/lib/google docker pull public.ecr.aws/amazonlinux/amazonlinux:latest
    if ! checkDockerImagePullError; then return 1; fi
    sudo HOME=/var/lib/google docker pull quay.io/aptible/hello-world:latest
    if ! checkDockerImagePullError; then return 1; fi
    sudo HOME=/var/lib/google docker pull us-docker.pkg.dev/cloudrun/container/hello-job
    if ! checkDockerImagePullError; then return 1; fi
  fi
}

# Try a given function with n times including retries with exponential back off.
# function signature: retry retryTimes description functionName
retry() {
  local tries=$1
  local count=0
  local description=$2
  local wait=1
  until \"$3\"; do
    exit=$?
    # If failed, wait 3 ** count seconds until next retry.
    wait=$(($wait*3))
    count=$(($count + 1))
    if [ $count -lt $tries ]; then
      echo \"[Batch Action] $description exited $exit (tried $count/$tries), retrying in $wait seconds.\"
      sleep $wait
    else
      echo \"[Batch Action] $description exited $exit (tried $count/$tries), no more retries left.\"
      return $exit
    fi
  done
  echo \"[Batch Action] $description succeeded.\"
  return 0
}

retry 4 \"Docker credential helper\" configure_docker_credential_helper


1 Like

Hi. Thank you @Lucas_Taniguti and @Dino_Aguilar for reporting this and sorry for the very late reply. I can confirm that we had this issue, but it should already be fixed for some time. Feel free to give it a try and let us know if this is still happening for you.

1 Like

Thank you for your response! I’m glad to know the issue has already been fixed.

I’d also like to ask what the best way is to report new problems or track potential instabilities when they happen. During the last occurrence, I checked status.cloud.google.com but nothing was reported there. Is there another channel or recommended procedure for these situations?

Thanks again for your support!

1 Like

Hi @Lucas_Taniguti. We typically only use status.cloud.google.com to report broad scale outages with a particular service. To report issues, you can submit a ticket to Google Cloud support, who will steer the issue to the proper team that can provide updates.

1 Like