Hello,
I tried to run a function to tune PaLM. The pipeline shows the following error:
RuntimeError: Job failed with:
code: 9
message: "The DAG failed because some tasks failed. The failed tasks are: [large-language-model-tuner].; Job (project_id = ai-assisted-vma-v2, job_id = 4202643503638904832) is failed due to the above error.; Failed to handle the job: {project_number = 117401557332, job_id = 4202643503638904832}"
The node info on Pipeline Run Analysis is:
com.google.cloud.ai.platform.common.errors.AiPlatformException: code=RESOURCE_EXHAUSTED, message=The following quota metrics exceed quota limits: aiplatform.googleapis.com/restricted_image_training_tpu_v3_pod, cause=null; Failed to create custom job.Project number: 117401557332, Job id: 4202643503638904832, Task id: 8599389292487245824, Task name: large-language-model-tuner, Task state: DRIVER_SUCCEEDED, Execution name: projects/117401557332/locations/europe-west4/metadataStores/default/executions/16871158654916253004; Failed to create external task or refresh its state. Task:Project number: 117401557332, Job id: 4202643503638904832, Task id: 8599389292487245824, Task name: large-language-model-tuner, Task state: DRIVER_SUCCEEDED, Execution name: projects/117401557332/locations/europe-west4/metadataStores/default/executions/16871158654916253004; Failed to handle the pipeline task. Task: Project number: 117401557332, Job id: 4202643503638904832, Task id: 8599389292487245824, Task name: large-language-model-tuner, Task state: DRIVER_SUCCEEDED, Execution name: projects/117401557332/locations/europe-west4/metadataStores/default/executions/16871158654916253004
I am unable to identify what the issue is. How can I resolve it?
Thanks in advance!