ERROR 2024-05-11T17:18:58.802714628Z Unable to find image 'us-east1-docker.pkg.dev/PROJECT/DOCKER_NAME:latest' locally
ERROR 2024-05-11T17:18:58.840505637Z docker: Error response from daemon: Head "https://us-east1-docker.pkg.dev/v2/PROJECT/DOCKER_NAME/manifests/latest": denied: Unauthenticated request. Unauthenticated requests do not have permission "artifactregistry.repositories.downloadArtifacts" on resource "projects/PROJECT/locations/us-east1/repositories/ARTIFACTR_REGISTRY_NAME" (or it may not exist).
ERROR 2024-05-11T17:18:58.840538631Z See 'docker run --help'.
Hi @gradientopt , from the log info you provided, the docker image url is us-east1-docker.pkg.dev/PROJECT/DOCKER_NAME:latest, which seems not have the proper PROJECT and DOCKER_NAME info. Would you mind double check the image url you provided to the Batch job request, or would you mind sharing one of your failed Job request json file or Job UID for me to help check? Thanks!
If your tasks for network with/without external ip address are both partially failed, I would suggest you also add retry on exit_code 125 to unblock yourself for this issue. We are doing further investigation on the flakiness.
If for your network without external ip case, it fails on every job/task, I would suggest you check the private network setting. You should be able to access Artifact Registry if you set up private google access properly.
Besides, would you mind trying to reduce the VM numbers to see whether that helps the flakiness? From my perception, each of your job will have 5000/4=1250 VMs, and 10 Jobs in parallel means 12500 VMs during the same period. Batch will help fetch the docker image on each VM, which means a high request on Artifact Registry at that time.
Before I give you a more accurate answer to your follow up questions, would you mind sharing me one example of your failed job (task)'s batch json logs, so that I can understand more about what specific root cause might be for your case?
Feel free to send me more info through private message (or chat with me there) if you are more comfortable with that, thanks!