Vertex AI Custom Jobs stuck in "Pending" and failing with Internal Error

Hi all,

We’ve been experiencing an issue with Vertex AI Custom Jobs (under the Training section). Jobs are getting stuck in the Pending state for a while and eventually failing with an Internal error.

This started happening yesterday and has continued today. We haven’t made any changes to our configuration, and previously the same setup worked fine.

Has anyone else run into this problem recently? Is there an ongoing incident with Vertex AI Training or a known workaround?

Thanks!