Hello,
I am facing challenges with resource allocation for a Kubeflow Pipeline component deployed on Google Cloud Platform’s Vertex AI. Specifically, the issues are related to both CPU and memory requests not being met as per the pipeline definition.
Pipeline Details
- SDK Version: kfp-2.4.0
In the pipeline definition, I have specified the following resource requests for a component:
Here is my pipelines definition file.
@dsl.component
def test():
print("test")
@dsl.pipeline(
name="test",
description="",
)
def pipeline() -> None:
train_op = test().set_display_name("test").set_cpu_request("12000m")
compiler.Compiler().compile(
pipeline_func=pipeline,
package_path=f"./pipeline.yaml",
)
yaml file
# PIPELINE DEFINITION
# Name: test
components:
comp-test:
executorLabel: exec-test
deploymentSpec:
executors:
exec-test:
container:
args:
- --executor_input
- '{{$}}'
- --function_to_execute
- test
command:
- sh
- -c
- "\nif ! [ -x \"$(command -v pip)\" ]; then\n python3 -m ensurepip ||\
\ python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1\
\ python3 -m pip install --quiet --no-warn-script-location 'kfp==2.4.0'\
\ '--no-deps' 'typing-extensions>=3.7.4,<5; python_version<\"3.9\"' && \"\
$0\" \"$@\"\n"
- sh
- -ec
- 'program_path=$(mktemp -d)
printf "%s" "$0" > "$program_path/ephemeral_component.py"
_KFP_RUNTIME=true python3 -m kfp.dsl.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@"
'
- "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import\
\ *\n\ndef test():\n print(\"test\")\n\n"
image: python:3.7
resources:
cpuRequest: 12.0
pipelineInfo:
name: test
root:
dag:
tasks:
test:
cachingOptions:
enableCache: true
componentRef:
name: comp-test
taskInfo:
name: test
schemaVersion: 2.1.0
sdkVersion: kfp-2.4.0
Despite this configuration, when the component is executed as a custom job on Vertex AI, it seems to run on an e2-highmem-2 instance. Based on my understanding, this instance type does not fulfill the requested 12 CPUs which could be affecting the performance.
I am trying to understand why the component is not allocated an instance that matches both the CPU and memory specifications. Could this be a result of how Vertex AI interprets resource requests in the pipeline definition, or might there be a misconfiguration on my end?
Thank you for your support.
Best regards, Ryo Ueda