Monitoring Alerts for Batch Jobs on failure

Is it possible to create an alert about batch job failure? If so, use which option in Google monitoring? I look in the VM options, but as far as I understand, there is no such option.

Case: Started Batch Job instance for processing some pipeline and got an error which Failed all processing. How do we catch these errors and send alerts?

2 Likes

Hi @nechtobolshee ,

Batch supports JobNotification to help you subscribe on Job state changes, such as Job failures: https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#jobnotification.

Does this help for your case?

Thanks,

Wenyan

1 Like

Hello! This is not exactly what I had in mind, but is it possible to use this structure to send messages not to logs, but to notification channels?

1 Like

Hi @nechtobolshee ,

With Pub/Sub support as https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#jobnotification, you will get notifications when job state changes. You can integrate Pub/Sub into your pipeline to handle based on your preference. For more details, you can refer to https://cloud.google.com/batch/docs/monitor-jobs-using-notifications and https://cloud.google.com/pubsub/docs/publish-receive-messages-client-library.

Batch hasn’t provided option on Cloud Monitoring. We consider using Pub/Sub be a good start for subscribing on job state change notifications. If you think Cloud Monitoring support would benefit your case better, could you describe more detail about your wanted workflow?

Thanks!

3 Likes

Is it possible to add custom messages while using these notifications? for example:
“notifications”: [
{“pubsubTopic”: “”,
“message”: {
“type”: “JOB_STATE_CHANGED”,
“newJobState”: “SUCCEEDED”,
“data”: {
“engine”: “my_engine”
}}} ]