I have around 6-7 Cloud Run Jobs configured all with the exact same settings/resources and in us-east1 region.
Everything has been working just fine till today, some of the jobs seem to be stuck at the pending status and never give any kind of logs for the execution, this is the maximum information I have been able to get and it comes from the YAML File:
status:
observedGeneration: 1
conditions:
- type: Completed
status: 'False'
message: Resource readiness deadline exceeded.
lastTransitionTime: '2024-04-19T22:28:30.330446Z'
- type: Started
status: Unknown
message: Deadline exceeded. Container image import still in progress.
lastTransitionTime: '2024-04-19T22:28:30.330446Z'
- type: Retry
status: 'True'
reason: WaitingForOperation
message: System will retry after 1:00:00 from lastTransitionTime for polling interval
lastTransitionTime: '2024-04-19T22:33:31.183686Z'
severity: Info
I have already tried the following:
Made sure Service Account is correct and has the roles for the service.
Deleted the Docker Image from Artifact Registry and uploaded a new one.
Deleted the “old” job and created a new one with the new image.
I haven’t had any success yet with the jobs that are stuck at pending, however some of the other jobs (with the exact same configurations) are working just fine.
Any idea on why is this problem happening or how to fix it.
I am having a similar problem with Batch. A job submission script I’ve used in the past is not working any more for some reason. It just says “VM provisional model: Pending” and fails to spin up any VMs and keeps the status “Queued.”
Same here. My workflow was executing a Cloud run job fine till January. It suddenly stopped working yesterday.
After 10min of execution, my workflow return with the following messages:
“Resource readiness deadline exceeded.”
“Deadline exceeded. Container image import still in progress.”
“System will retry after 0:01:00 from lastTransitionTime for polling interval”
No modification or what so ever has been made. The google cloud job is not even been executed. Though nothing has been modified since last January, I double checked the image used by Cloud Run, but, still the same. I tried adding memories to the execution environment for the job (we never know…), but nothing…
That is frustrating as I don’t believe there anything else I can do at this stage.
It doesn’t seem like there’s any server issue on GCP
My Cloud Run jobs where created to run on the us-central1 region. I duplicated the Run Jobs with the exact same settings with the exception of the region location, where I set it to asia-northern1. After I made the change, I updated my worklow, to launch the new job with the correct new region (otherwise workflow won’t find the job of course), and Voila! My Workflow and Cloud Run Job is working like it used to. The execution still didn’t complete (it is a 1hour several run job execution) but as soon as worklow execute a Job, the status is Running instead of Pending.
Not sure if it is a Cloud Run region server issue, but if it was the case it is mentioned nowhere… When I’ll have more time I’ll investigate this further. I don’t want to face a similar issue without knowing the exact reason why I need to suddenly change the region…
I discovered that my image is stored in the Artifact Registry at asia-east1, while Cloud Run is located in us-central1. When I move the image to us-central1, everything works well.
But the question is: If I use previous image stored at asia-east1 (which was pushed at 2024-03-31), Cloud Run does not show any error. Only newly pushed image require setting the same region as Cloud Run.
Interesting. I checked on my side as well and my Artifact Registry image is also in asia-northern1 region. It was working well so far with image and cloud run being on a different region but I guess not anymore.
BTW my image was last uploaded back then in January and it was not working since Monday. To be more specific, the overall process is being executed only every Wednesday and every Monday. If I would have tried on last Thursday, maybe it would have not worked.
Anyway, happy to hear this workaround (which may not be one?) is working for everyone. Hopefully we’ll have some clarification from Google on this.