Thank you for sharing the details about the outage. We really appreciate your transparency.
This outage was, in my experience, the most severe I have seen. The downtime in Japan exceeded 3 hours — compared to a previous maximum of around 2 hours — and the impact was unusually widespread, affecting users from Japan all the way to the EU, rather than being limited to the Asia region as in past incidents.
With that in mind, we would like to respectfully share two requests for the team:
1.Proactive quota management:
Since the resolution this time was a manual quota increase, we hope the team will review the quota thresholds for AppSheet and set them with enough headroom to handle unexpected traffic spikes before they cause an outage.
2.Improved monitoring and response time:
The incident began at approximately 00:30 PDT, but the first status update was not posted until around 07:20 PDT — nearly 7 hours later. We would appreciate a review of the monitoring and alerting processes to enable faster detection and communication.
We understand that managing a service at Google’s scale is incredibly complex, and we trust the team is working hard to improve. We look forward to your follow-up report, and hope these points can be considered as part of the post-incident review.
I will make sure to share your inputs with the product management team. I understand there will be an official communication just like the one @takuya_miyai has shared.
This incident didn’t affect me directly because our app is still in development, although close to production launch.
I am concerned to read suggestions that it was 7 hours from incident onset to first comms from Google. Further, from incident resolution to confirmation that RCA comms would be provided was 3 days. It’s been another 3 days since then, and I cannot locate RCA info in this thread or on the Workspace status page incident report.
Any chance you could follow up on the RCA please?
Also, could you please provide feedback that the community really needs (and expects) far more timely and frequent comms when an outage is affecting production systems? This one was major - at least a dozen countries in this thread alone - but really, I think comms should be prompt for any outage, regardless of scale.
On the one hand I’m surprised, because I would expect more from Google by way of timely communications. On the other, I’m not, based solely on this thread.
The number and duration of outages (link below) and the lack of appropriate communications during outages have been flagged as a risk for relying on this platform.
It’s day 15 since an 8+ hour outage took down tenants in 21 countries (in this thread alone). These are paying customers whose businesses were hamstrung without any formal acknowledgement from Google for 7+ hours, and with no post-mortem review more than two weeks after the incident.