how to disable concurrency execution of batch job

I’m currently using a GCP workflow for batch job processing, set to run every 30 minutes. However, I’m facing an issue where multiple jobs are sometimes created while old ones are still executing. How can I disable concurrency to ensure that only one job runs at a time? I want to avoid situations where new jobs are created while a job is still running. Any guidance on how to achieve this within the GCP workflow setup would be greatly appreciated.

  • createAndRunBatchJob:
    call: http.post
    args:
    url: ##{batchApiUrl}
    query:
    job_id: ##{jobId}
    headers:
    Content-Type: application/json
    auth:
    type: OAuth2
    body:
    taskGroups:
    taskSpec:
    runnables:
  • container:
    imageUri: ##{imageUri}
    commands:
  • “–script-location”
  • “/mnt/disks/batch-scripts-${project_sha}/google_tv_aep/batch/code”
    environment:
    variables:
    job_id: ##{jobId}
    secret: projects/${project}/secrets/${secret_nm}/versions/latest
    local_path_1: /mnt/disks/${local_path_1}
    local_path_2: /mnt/disks/${local_path_2}
    query_file: “tv_mkt_campaign_events_load_query.sql”
    output_dataset_name: “tv_mkt_campaign_events_incremental_data.json”
    project: “${project}”
    aep_sandbox: “${aep_sandbox}”
    aep_dataset_name: “TV mkt campaign Events”
    load_mode: “incremental”
    aep_connection_id: “${aep_connection_id}”
    checkpoint_file: “mkt_campaign_checkpoint.param”
    aep_flow_id: “${aep_flow_id}”
    checkpoint_field: timestamp
    volumes:
  • mountPath: /mnt/disks/batch-scripts-${project_sha}
    gcs:
    remotePath: batch-scripts-${project_sha}
  • mountPath: /mnt/disks/${bucket_nm}
    gcs:
    remotePath: ${bucket_nm}
    computeResource:
    cpuMilli: 2000
    memoryMib: 16384
    taskCount: 1
    parallelism: 2
    allocationPolicy:
    network:
    networkInterfaces:
    network: projects/${project}/global/networks/spark-network
    subnetwork: projects/${project}/regions/${location}/subnetworks/spark-subnet-pr
    noExternalIpAddress: true
    serviceAccount:
    email: ${builder}
    logsPolicy:
    destination: CLOUD_LOGGING
    result: createAndRunBatchJobResponse
4 Likes

Hi @sugesh ,

If you want tasks for a job to be executed in sequential order, you can use the IN_ORDER field in API’s Scheduling Policy:

https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#schedulingpolicy.

If you want jobs to be executed not concurrently, Batch is now working on the feature support on Batch API side. We’ll let you know once its ready.

Before that, you can write on condition check on your side in the Workflow source to check the existing Job status, and decide whether submitting new Job or not based on the former submitted Jobs status.

Thanks!
Wenyan

3 Likes

@wenyhu Thanks for the reply. While I wait for the feature support on Batch API. I will take your suggestion on setting a conditional check in the workflow; would you have any sample that I can refer to?

4 Likes

Hi @sugesh ,

Below is one example of submitting a new Job when all listed existing Jobs are completed.

Ref:

Hope this helps,

Wenyan

3 Likes

@wenyhu Thanks much!

4 Likes