Hello, im trying to run a simple java application which reads data from cloud storage and writes to bigquery. Right now i have build the docker image and added to artifact registry and created a cloud run job and executed it. But I want to know
- is cloud run job the best solution for this ? end goal is to integrate gitlab and run this whole thing as a cicd pipeline
- If cloud run job is best solution, how can i deploy a cloud run job using cloudbuild.yaml, i could see deploy documents only for cloud run service and not for job
- is it possible to make cloud run job event based? like if i need to make the above solution event based(when file arrives in bucket, trigger the job) what is the best way?
Thanks
- If relying on GCS it’s a must to have, then Cloud Run seems to be ok, otherwise there are tools like Datastream[1], that offer native solutions to migrate data into Bigquery. And there is also the GCS to Bigquery template for Dataflow[2] both serverless options.
- Regarding how to specify the job with yaml, here is some documentation[3]
- And yes you can launch your pipeline event driven by using Cloud Functions so when a new file arrives you can launch it, this integration is automatic by GCP you only have to call your job using the API/SDK. Also you can call your job directly using a webhook this could work if you want to launch it from a service like github.
1- Change Data Capture | Datastream | Google Cloud
2- Cloud Storage Text to BigQuery template | Cloud Dataflow | Google Cloud
3-Create jobs | Cloud Run Documentation | Google Cloud