I have implemented alert in my Compute Engine instance, and suddenly my application on that instance is down, I even could not SSH into it to check the issue at that moment.
Here’s the condition:
Logs suddenly stopped after minutes.
CPU and Memory metrics stopped sent data. Before it stopped, there are does not have spike metric.
Hello @bruleey ,Welcome on Google Cloud Community.
What kind of alert? Which type of metrics did you used for that? It’s important because there is no possibility to broke VM after creating the alert, especially if you are not using agent metrics, as all metrics not related with agent installed on VM, are taken from hypervisor.
So based on that I’m assuming that you are using agent-based metric. See screenshot. If yes, there is a possibility (as always with any kind of agents and tools) that agent will cause VM freeze. I saw similar behavior on Azure and once on Google Cloud.
It might not be your case, as this is rarely happening, but happening. Maybe, which more fits to this situation imho, any other app/agent/service or OS itself caused VM freeze.
BTW, which OS version are you using? Linux or Windows?