I am migrating my Dataform pipelines from Web based dataform to Big Query Dataform . In the web based Dataform, if I want to execute the Dataform pipeline through Dataform API Call, the documentation was clear. See attached image/screenshot from here: However I am not getting the corresponding API Service for Big Query Dataform.
I have been looking at this documentation but is not clear as the one for web based Dataform to get the API Call. Please can anyone help?
While the Dataform API on Google Cloud provides methods to manage and invoke Dataform workflows, if youâre looking to execute a Dataform pipeline directly in BigQuery, you might consider the following approach:
Extract the SQL generated by your Dataform pipeline.
Use the BigQuery API to execute this SQL.
To execute the SQL in BigQuery using the BigQuery API:
Hi, thanks @ms4446 for the response. I am sorry if my question was not clear. I would like to use Dataform API on Google Cloud thatâll invoke my Dataform workflows
To invoke a Dataform workflow in the new Dataform on Google Cloud, use the workflowInvocations resource in the Dataform API.
Specifically, use the create() method to create a new workflow invocation. The endpoint URL is:
POST https://dataform.googleapis.com/v1beta1/projects/{project}/locations/{location}/repositories/{repository}/workflowConfigs/{workflowConfig}/workflowInvocations
Replace {project}, {location}, {repository}, and {workflowConfig} with your actual values.
The request should include an authorization header with a valid access token:
Authorization: Bearer YOUR_ACCESS_TOKEN
Refer to the official documentation for the exact structure of the request body and additional information.
After sending the request to create the workflow invocation, Dataform will start executing the workflow. Monitor the status of the workflow invocation using the get() method on the workflowInvocations resource.
{location} refers to the Google Cloud region where your Dataform repository is located. For example, europewest2.
{repository} refers to the ID of your repository in Google Cloud, not the GitLab URL. You can find this in the Source Repositories page in the Google Cloud Console.
{workflowConfig} refers to the ID of your Dataform workflow configuration in Google Cloud. You can find or create workflow configurations in the Dataform section of the Google Cloud Console.
Also does {project} denote GCP project id or GCP project number?
Unfortunately I cannot find the {repository_id} . The Source repository page doesnât list my dataform repository. I can see my dataform repo as below in the Dataform page as below(I have deleted my project name and GitLab url and repo name). But the repo id does not appear. Is the {repository_id} mandatory?
{project} indeed denotes the GCP project ID, not the GCP project number.
The repository ID is mandatory for invoking a Dataform workflow through the API. If you cannot find the repository ID in the Dataform section of the Google Cloud Console, itâs crucial to reach out to Google Cloud Support for assistance. The exact steps and URL structures may vary, and the support team can provide the most accurate and up-to-date information.
Note: Ensure the Dataform API is enabled in your GCP project to view your Dataform repository in the relevant sections.
The 404 error with the reason âdoes not existâ indicates that the Dataform API is unable to find the resource specified in the request URL. This could be due to a number of reasons. To troubleshoot the issue, please consider the following steps and checks:
Verify Repository ID:
Ensure that the repository ID is correct. You can find the repository ID in the Dataform section of the Google Cloud Console.
Check the repository URL in the Dataform Console to confirm the repository ID.
Check Repository Status:
Confirm that the repository has not been deleted and is accessible to the user making the request.
Check the repository permissions to ensure the user has the necessary access to invoke workflows.
User Access:
Verify that the user making the request has the necessary access permissions to the repository.
Check the IAM & Admin section of the Google Cloud Console to verify user permissions.
API Token:
Ensure that the api_token is valid by trying to authenticate to the Dataform API using the token.
Generate a new API token if the existing one is not working.
Request Format:
Confirm that the run_create_request is formatted correctly, referring to the Dataform API documentation for the correct request format.
Use the Dataform API Playground to generate and test requests.
Endpoint URL:
Ensure that the dataform_project_url is correctly spelled and follows the exact structure expected by the Dataform API.
Verify the URL structure matches the example provided in the documentation.
Project and Location in URL:
Double-check that the project and location in the URL are correct and correspond to the actual project ID and location where the Dataform repository is hosted.
Select the correct project and location in the Dataform Console to generate the correct URL.
API Version:
Verify that the API version in the URL (v1beta1) is the version you intend to use, as different versions might have different URL structures and parameters.
Use the latest API version in the documentation to ensure compatibility.
Google Cloud Console Verification:
Directly verify the existence and accessibility of the repository in the Google Cloud Console to ensure it hasnât been inadvertently deleted or moved.
Select the repository in the Dataform Console to confirm it exists.
Refer to API Documentation:
Continuously refer back to the official API documentation to ensure all parameters and the URL structure are correct.
Use the documentation to verify the request body, URL parameters, and headers.
Retry the Request:
Attempt making the request again at a later time in case of a temporary outage or issue with the Dataform API.
Try making the request from a different network or location.
1My repository id is correct and I have checked with the repository URL in the Dataform Console. The repository is not deleted. The project id and location are indeed correct
Can you advice what permissions I need to invoke workflows? I am not sure about this
I am not sure what you mean by this - sorry - I am using this version as it mentioned in the documentation as we discussed int he first thread
I am not familiar with the Dataform API Playground, can you please advice how to go about testing the request, that will be very useful? Thanks again
Ensure you have the Dataform Editor role on the repository. Refer to the Dataform or Google Cloud documentation to verify this role and its permissions for invoking workflows.
Confirm you have the BigQuery Job User role on the project where the Dataform repository is hosted. Refer to the official documentation to verify this role and its permissions.
To verify your permissions:
Go to the IAM & Admin section of the Google Cloud Console.
Click the Roles tab.
Ensure your user account is listed under the Members section for each role.
Testing request in Dataform API Playground:
Note: Verify the URL for the Dataform API Playground from the official sources as the provided URL is a placeholder.
{project} with the ID of your Google Cloud project
{location} with the region where your Dataform repository is located
{repository} with the ID of your Dataform repository
{workflowConfig} with the ID of your Dataform workflow configuration
{workflow_invocation_name} with the name of your workflow invocation
{workflow_name} with the name of your Dataform workflow
{compilation_result_id} with the ID of your Dataform compilation result
If successful, a 200 OK status code will appear. Otherwise, analyze the error message in the response.
Additional notes:
Ensure the API token in the Authorization header has the necessary permissions.
For continuous issues, refer to the error message or contact Dataform or Google Cloud support.
Tips:
Find values like {project}, {location}, etc., in the Dataform Console.
Generate an API token in the Google Cloud Console.
Use tools like Postman for testing requests outside the Dataform API Playground.
Conclusion:
Verify each step, URL, and placeholder against the official documentation to avoid any discrepancies. This comprehensive guide should assist in effectively invoking workflows in Dataform. For further issues, do not hesitate to reach out to official support channels.
Sorry for the confusion. Dataform API Playground is not yet publicly available.
For testing API requests, you can use other tools like Postman or cURL commands in the terminal. These tools allow you to send HTTP requests to the API endpoints and view the responses.
Hereâs how you can use Postman to send a request:
Download and Install Postman:
If you donât have Postman installed, you can download it from the official website: Postman
Create a New Request:
Open Postman and create a new request by clicking the âNewâ button and selecting âRequest.â
Enter Request Details:
Enter the request details, such as the HTTP method (POST), the API URL, headers, and the request body.
Send the Request:
Click the âSendâ button to send the request. Postman will display the API response below.
Analyze the Response:
Analyze the response to check if the request was successful or if there were any errors.
Replace the placeholders with the appropriate values.
In the Authorization header, add your API token.
Click the Send button.
The response will contain a list of all compilation results for the repository. The compilation_result_id is the value of the id field in each compilation result.
Feedback and Suggestions:
API Endpoint: Verify the API endpoint URL with the official Dataform documentation or Google Cloud documentation to ensure it is correct and accessible with the appropriate permissions and API token.
API Version: Ensure that the API version in the URL (v1beta1) is the version you intend to use, as different versions might have different URL structures and parameters.
API Token: Make sure the API token used in the Authorization header has the necessary permissions to access the compilation results.
Error Handling: If you encounter any errors or issues, carefully review the error messages as they often provide clues about the problem. Check the API endpoint, headers, and other request details to ensure they are correct.
Official Documentation and Support: For the most accurate and reliable information, always refer to the official documentation and consider reaching out to Google Cloud Support for assistance.
Additional Considerations:
Endpoint Availability: Ensure that the API endpoint is available for your subscription or plan. Some endpoints may be restricted to certain subscription levels.
Placeholder Replacement: Ensure that all placeholders in the URL ({project}, {location}, {repository}) are replaced with your actual project, location, and repository details.
Support Channels: Donât hesitate to use other support channels like community forums or your dedicated support contact for more personalized assistance.
Security: Handle your API tokens and other sensitive credentials with utmost security. Ensure they are stored and transmitted securely, and are not exposed to unauthorized parties.
Proceed with these additional checks and considerations to ensure a more seamless and secure experience while working with the Dataform API and other related tools. Your continuous effort for improvement and attention to detail is crucial in providing effective and reliable assistance.
I'm trying to make a request in the dataform api to execute workflows, but it's returning a 404 error. Check the points mentioned above and they seem to be correct.
I'm using a URL:
[https://dataform.googleapis.com/v1beta1/projects/{project}/locations/{location}/repositories/{repository}/workflowConfigs/{workflowConfig}/workflowInvocations](https://dataform.googleapis.com/v1beta1/projects/{project}/locations/{location}/repositories/{repository}/workflowConfigs/{workflowConfig}/workflowInvocations)
Can you help me?
Please provide the complete endpoint URL you are using for the request. Ensure that it is correctly formatted and includes the necessary parameters like project ID, location, repository ID, and workflow configuration ID.
What HTTP method are you using (e.g., POST, GET)?
Include the exact error message you are receiving. A 404 error typically indicates a âNot Foundâ response, but the exact wording can sometimes provide more clues.
Confirm that the API token you are using is valid and has the necessary permissions. (Do not share the token itself.)
What environment are you using to make the API request (e.g., a specific IDE, Postman, a script in a certain programming language)?
Have there been any recent changes in your Dataform project or Google Cloud setup that might affect this?
Are you following a specific part of the Dataform API documentation? If so, please specify which part.
Here are some suggestions to further troubleshoot:
The URL you provided seems to be correctly formatted for invoking a workflow in Dataform. However, double-check for any typos or missing characters.
The method youâre using to generate the API token (gcloud auth print-access-token) is generally correct. Ensure that the account associated with this token has the necessary permissions to access the Dataform API and the specific resources (repository, workflowConfig) in your project.
Since youâre using a POST method, ensure that your request headers are set correctly. Typically, you would need to include Content-Type: application/json and Authorization: Bearer [YOUR_ACCESS_TOKEN].
Check if the POST request requires a specific body. The body structure should match the requirements as per the Dataform API documentation.
In VSCode using Python, ensure that your environment is correctly set up to make HTTP requests. If youâre using the requests library, ensure itâs correctly installed and imported in your script.
Verify the existence and accessibility of the repository and workflow configuration in the Google Cloud Console. Ensure that the repository Novo_ERP and the workflow configuration exec_xrt_dm exist and are correctly named.
Add additional logging to your script to print out the full request and response. This can sometimes provide more insights into what might be going wrong.
Iâve been looking at this documentation, but itâs not clear⌠could you help me? I believe are making the right request⌠but I still keep getting a 404⌠as if the url doesnât exist⌠Hereâs my initial code: