I want to create an alert when the success rate for my cloud function falls below certain threshold. I created an alert policy that checks the ratio of successful requests (status=ok) to all requests. The MQL query looks like this:
fetch cloud_function
| metric 'cloudfunctions.googleapis.com/function/execution_count'
| filter resource.function_name == 'my-function'
| { t_0:
filter metric.status == 'ok'
| align delta()
| group_by [resource.function_name],
[value_execution_count_aggregate: aggregate(value.execution_count)]
; t_1:
ident
| align delta()
| group_by [resource.function_name],
[value_execution_count_aggregate: aggregate(value.execution_count)] }
| ratio
| window 60m
| condition ratio < 0.99 '1'
It works fine for the most part, but there’s a problem. My users can send invalid requests to the function, e.g. use HTTP GET instead of POST, and the function will fail with status=error. If these errors happen often, they trigger my alarm. But it’s undesirable since it’s not something I can fix or control.
Is there a way to exclude specific errors from Cloud Function metrics? Or is there a better way to monitor success rate for a Cloud Function?