Hello everyone,
I am working on a Google Workflow where each retry should trigger a specific action:
- Retry 1: Send an email to Service A.
- Retry 2: Send an email to Service B.
- Retry 3: Create an incident via another action.
To achieve this, I tried using the native try…except syntax along with a retry block. The goal is to handle each retry individually and execute a specific step based on the retry number.
Problem Encountered
Despite my efforts, the following issues occur:
-
Specific actions for retries do not execute as expected:
- The alert-handling steps are sometimes skipped.
- The logic to differentiate each retry does not seem to work properly.
-
Use of execution.retry_attempt:
- I initially thought execution.retry_attempt was a native variable to track the retry count, but it seems this variable does not exist, causing deployment errors.
-
Manual Incrementation with a Variable (retry_count):
- I implemented a variable to track retries manually, but even with this approach, the specific alert steps are not triggered correctly.
What I Tried
- Using the following structure to handle retries and alerts:
- try to execute the main action (simulate_error).
- except to catch errors and execute the appropriate alert step.
- A retry block to manage retries.
- Manually incrementing a variable (retry_count) after each failure.
- Using a switch in the alert-handling step to determine the action based on retry_count.
Here is a simplified example of my workflow:
main:
params: [input]
steps:
- init_retries:
assign:
- retry_count: 0 # Initialize retry counter
- simulate_error_and_alert:
try:
steps:
- simulate_error:
call: http.get
args:
url: "https://httpstat.us/500" # Simulate an HTTP error
except:
steps:
- handle_alert_action:
switch:
- condition: ${retry_count == 0} # Retry 1
next: send_email_service_1
- condition: ${retry_count == 1} # Retry 2
next: send_email_service_2
- condition: ${retry_count == 2} # Retry 3
next: create_incident
- increment_retry_count:
assign:
- retry_count: ${retry_count + 1} # Increment retry counter
retry:
predicate: ${http.default_retry_predicate}
max_retries: 3
backoff:
initial_delay: 5 # 5 seconds
max_delay: 10 # 10 seconds
multiplier: 2
- send_email_service_1:
call: sys.log
args:
text: "Retry 1: Sending email to Service 1."
severity: "INFO"
- send_email_service_2:
call: sys.log
args:
text: "Retry 2: Sending email to Service 2."
severity: "INFO"
- create_incident:
call: sys.log
args:
text: "Retry 3: Creating an incident."
severity: "CRITICAL"
- log_final_failure:
call: sys.log
args:
text: "Retries exhausted. Workflow failed."
severity: "ERROR"
Current Results
- The simulate_error step is retried as expected up to 3 times (behavior of the retry block works as expected).
- However, the alert-handling steps (send_email_service_1, send_email_service_2, create_incident) are not executed at the correct time or are skipped entirely.
Questions
- How can I execute a specific action immediately after each retry in Google Workflows?
- Is my logic for using a manual retry_count variable correct? Is there a better approach?
- Is there a native way to run steps on retry ?
- Is there a native way in Google Workflows to track the number of retries performed without manually managing a variable like retry_count?
Additional Context
- I am testing this workflow in a standard GCP project.
- I am using sys.log to simulate the real calls (emails/incidents) during testing.
- My ultimate goal is to call specific Cloud Functions for each retry.
Thank you in advance for your help and feedback!