How to setup , initialisation, execution Dataform pipeline using python client library

Hi all,

I’m try to setup an Dataform pipeline through Python client library Dataform.v1.beta1 library.

Can you help me to with intilisation of workspace and execution of pipeline.

I’m able to create repo, workspace but not able to init workspace through python library. (I initialised it’s through UI currently) need to do with python client library.

I am able to right sqlx files using writefile function.

But problem is how to execute pipeline using python client library.

https://cloud.google.com/python/docs/reference/dataform/latest

https://github.com/googleapis/python-dataform

Hi @umeshchandra ,

Welcome back to Google Cloud Community.

The dataform_v1beta1.WorkspacesClient.initialize_workspace method can be used to initialize a workspace using the Python client library for Dataform. Here is a sample of code that demonstrates initializing a workspace:

from google.cloud import dataform_v1beta1
from google.auth.credentials import AnonymousCredentials

Create a Dataform client using AnonymousCredentials

client = dataform_v1beta1.DataformClient(credentials=AnonymousCredentials())

Initialize a workspace

response = client.workspaces.initialize_workspace(
parent=“projects/{project_id}”.format(project_id=“your-project-id”),
workspace_id=“your-workspace-id”,
version=“latest”,
)

Replace “your-project-id” and “your-workspace-id” with your actual project ID and workspace ID, respectively, in the aforementioned code.

You can use the dataform_v1beta1.JobsClient.create_job method to run a pipeline using the Python client library for Dataform. Here is a sample of code that illustrates how to run a pipeline:

from google.cloud import dataform_v1beta1
from google.auth.credentials import AnonymousCredentials

Create a Dataform client using AnonymousCredentials

client = dataform_v1beta1.DataformClient(credentials=AnonymousCredentials())

Create a job to execute the pipeline

response = client.jobs.create_job(
parent=“projects/{project_id}/locations/{location}/workspaces/{workspace_id}”.format(
project_id=“your-project-id”,
location=“your-location”,
workspace_id=“your-workspace-id”,
),
job={
“type”: “PIPELINE”,
“request”: {
“pipeline_execution”: {
“pipeline_name”: “your-pipeline-name”,
},
},
},
)

Replace “your-project-id”, “your-location”, “your-workspace-id”, and “your-pipeline-name” with your actual project ID, location, workspace ID, and pipeline name, on the given code.

Here are some documentation that might help you:
https://cloud.google.com/dataflow/docs/quickstarts/create-pipeline-python?_ga=2.218493529.-1392753435.1676655686
https://cloud.google.com/python/docs/reference/lifesciences/latest/google.cloud.lifesciences_v2beta.types.Pipeline?_ga=2.218493529.-1392753435.1676655686
https://cloud.google.com/dotnet/docs/reference/Google.Cloud.Dataform.V1Beta1/latest/Google.Cloud.Dataform.V1Beta1.Dataform?_ga=2.154680824.-1392753435.1676655686

Hi @Aris_O ,

Thanks for reply.

Can I go through your code but problem is I’m using Dataform library not Dataflow.

In Dataform is not having client.jobs.create_job() method.

Can you try to go Dataform library.

Hi @umeshchandra ,

You may be unable to use the client.jobs.create_job() method, as you suggested, if you’re using Dataform rather than Dataflow. However, the google-cloud/bigquery module for Node.js, which offers methods for launching BigQuery tasks programmatically, can help you arrive at a similar outcome.

Hope this helps!