Problem in running Vertex AI Pipelines

kanavdua · January 3, 2024, 7:37am

I created a basic pipeline run using managed notebooks as well as instances in workbench. But my basic pipeline couldn’t even run with error quoting -

The DAG failed because some tasks failed. The failed tasks are: [concat].; Job (project_id = practice-training, job_id = 7480518563480993792) is failed due to the above error.; Failed to handle the job: {project_number = 385236764312, job_id = 7480518563480993792}.

The error on the node says -

com.google.cloud.ai.platform.common.errors.AiPlatformException: code=RESOURCE_EXHAUSTED, message=The following quota metrics exceed quota limits: aiplatform.googleapis.com/custom_model_training_cpus, cause=null; Failed to create custom job for the task. Task: Project number: 385236764312, Job id: 7480518563480993792, Task id: 6516721854944641024, Task name: concat, Task state: DRIVER_SUCCEEDED, Execution name: projects/385236764312/locations/asia-south1/metadataStores/default/executions/11616092682586157127; Failed to create external task or refresh its state. Task:Project number: 385236764312, Job id: 7480518563480993792, Task id: 6516721854944641024, Task name: concat, Task state: DRIVER_SUCCEEDED, Execution name: projects/385236764312/locations/asia-south1/metadataStores/default/executions/11616092682586157127; Failed to handle the pipeline task. Task: Project number: 385236764312, Job id: 7480518563480993792, Task id: 6516721854944641024, Task name: concat, Task state: DRIVER_SUCCEEDED, Execution name: projects/385236764312/locations/asia-south1/metadataStores/default/executions/11616092682586157127

Whereas it is just a 2 line component performing simple string concatenation.

Please help and I am not working in any organsiation that i can take Google Support nor can I afford it. Please help.

My Code -

!pip install google-cloud-aiplatform==1.37.0 --upgrade
!pip install google-cloud-pipeline-components==2.6.0 --upgrade
!pip install kfp==2.4.0 --upgrade

import kfp
from typing import NamedTuple

from kfp.dsl import pipeline
from kfp.dsl import component

from kfp import compiler

from google.cloud import aiplatform

PROJECT_ID = “practice-training”
PIPELINE_ROOT = “gs://vertexai-test-bucket-1234”
aiplatform.init(project = PROJECT_ID, location =‘asia-south1’)

Create components

@component(base_image=‘python:3.12’)
def concat(a: str, b:str)->str:
#logging.info(f"Concatenating ‘{a}’ and ‘{b}’ resulted in: ‘{a+b}’")
return a+b

compiler.Compiler().compile(concat, “concat.yaml”)

@component(base_image = ‘python:3.12’)
#def reverse(a: str) → dict:

return {“before”: a, “after”: a[::-1]}

def reverse(a: str)->NamedTuple(“outputs”,[(“before”,str),(“after”,str)]):
return a,a[::-1]

Create Pipeline

@pipeline(
name=“basic-pipeline-2”,
pipeline_root = PIPELINE_ROOT,
description = “My First Pipeline”
)
def basic_pipeline(x:str = “stres”, y:str = “sed”): # 2 pipeline parameters
concat_task = concat(a=x,b=y) # parameters of pipeline are input of first component
reverse_task = reverse(a = concat_task.output) # output of first component is input of second component

compiler.Compiler().compile(
pipeline_func=basic_pipeline, package_path=“basic_pipeline-2.json”)

pipeline specification created as a json

Build pipeline job that is run the pipeline. Run using APi or upload pipeline json file on vertex ai ui

from google.cloud.aiplatform import pipeline_jobs

job = aiplatform.PipelineJob(
display_name = “basic-pipeline-2”,
template_path = “basic_pipeline-2.json”,
parameter_values={“x”: “stres”,“y” :“sed”},
enable_caching = False
)
job.run(sync=False)

PLease Help!!!

kanavdua · January 4, 2024, 7:51am

Also, It shows that the system is retrying and it is a retriable error i.e. error code 8 but when I run it on colab it shows as error number 9

RuntimeError: Job failed with: code: 9

nceniza · January 8, 2024, 8:42pm

It appears that you are hitting a quota limit, I would suggest to try this pipeline on a different region for troubleshooting. But usually the resource exhausted error you cannot use the resource on the current region of the pipeline because it is already exhausted. Also I recommend contacting Google, to investigate your resources and API usage/Quotas.

kanavdua · January 9, 2024, 5:53am

Hey, I have changed my region twice but no good. I changed it from, asia-south1 to asia-northeast1 and then to uscentral-1.

Which quotas shall I take care of? As in my quotas tab, none of my quotas were used even above 50%.

How shall I contact Google? As I am only learning and not part of any organization which they require while filling a form.

kanavdua · January 12, 2024, 7:31am

The free trial quotas for custom model training CPUs have been changed by Google. Hence it is impossible to run a VErtex AI pipeline on a free trial of GCP now. It is by default runs on an n1/e2 CPU which is not available on a free trial version.

kanavdua · January 12, 2024, 7:36am

It doesn’t work due to the change in policies in the free trial by GCP the quotas for custom model training CPUs. The by default CPUs for the same alloted are n1/e2 CPUs which are currently not provided by GCP in the free trial. Hence it is impossible to run a vertex ai pipeline on a free trial currently.

jmij4457 · March 4, 2024, 10:30am

Only use ‘\n’ for line breaks. If you use “\r\n”, the above error occurs.
You must use ” instead of ‘ for quotation marks. Depending on whether the
base model is text-bison or chat-bison, the input data is different.
save in UTF8

https://www.lalastower.com/en/dev/implementing-a-chatbot-through-vertex-ai/

jmij4457 · March 4, 2024, 10:32am

Only use ‘\n’ for line breaks. If you use “\r\n”, the above error occurs.
You must use ” instead of ‘ for quotation marks. Depending on whether the
base model is text-bison or chat-bison, the input data is different.
save in UTF8

(URL Removed by Staff)

mary99 · March 24, 2024, 8:15am

Hello @kanavdua I have this problem too. I’m on a free trial with “$300 credit”. In Vertex AI I tried to create an AutoML model for tabular data but it fails. So then I tried to find a CPU where I do have a quota. I thought I found it, but still it fails. Question: Does what you say still hold try - “impossible to run a vertix pipeline on a free trial.” ?

kanavdua · March 28, 2024, 6:01am

Yes mary99 it’s still not possible to run in free trial from quotas u have gotten as previously the 1 quota u highlighted was 8 for the free trials too.

lefteryx · November 3, 2024, 3:49pm

Wait, so just to confirm, you cannot train an ML model on GCP with the $300 free credits?

Topic		Replies	Views
First pipleline fails \| CPU quota exceeded - but it doesn't appear to be? Vertex AI - Custom ML & MLOps automl , vertex-ai-platform	1	0	April 1, 2024
code=RESOURCE_EXHAUSTED, message=The following quota-AI pipelines using Kubeflow within Google Colab Vertex AI - Custom ML & MLOps automl , vertex-ai-model-registry , vertex-ai-workbench	1	3	February 18, 2025
Exceeded limit 'QUOTA_FOR_INSTANCES' on resource 'dataflow-tabular-stats-and-e .... Limit: 24 Vertex AI - Custom ML & MLOps vertex-ai-model-registry	8	3	July 17, 2024