AppSheet High Sync Latencies Incident July 9 - July 10, 2023

Hello everyone, AppSheet is experiencing a high latency service interruption, which began July 9, 2023, at approx 8:00 PM PT. There was some initial discussion in the thread below, which was originally started for a similar problem on July 4.
https://www.googlecloudcommunity.com/gc/AppSheet-Q-A/Appsheet-is-loading-slowly-today/m-p/610578#M217309

I will post further updates to this thread.

UPDATE 11:46 PM PT - U.S. Pacific Time

Hello everyone. AppSheet continues to experience high sync and homepage load latencies. I recognize this is is a significant impact to your applications and I recognize that it is peak business hours in many affected areas. I apologize for the impact that this has had to your business operations.

We have been investigating this issue for several hours and are actively deploying mitigations to our server infrastructure. These take time to take effect, and our graphs and metrics indicate that the system has not yet recovered.

My self and multiple other infrastructure engineers are actively working on this issue. I am working to get the GCP Cloud Status Dashboard updated to correctly reflect the service interruption.

Although the symptom appears the same as the July 4 incident (which also saw high sync latencies), our initial triage indicates that the underlying root cause is different. While in that case, it was a matter of scaling up additional capacity, in this case, the capacity is there but the server processes themselves are unhealthy. In other words, encountering conditions that prevent them from fulfilling client requests within reasonable latencies.

I understand and validate the frustration felt from a service outage in this regard. I’ve heard the request for a transparent post mortem, and we will work to provide that in the coming days, once we understand the root cause. This includes remediations to prevent a similar occurrence in the future.

I am actively working on the problem here with team members. I will post a follow-up in an hour, at 12:55 AM PT (US Pacific Time).

Mike Procopio

1 Like

9 Likes

Thank for you feedback. Please help us fix this problem asap.

We need an update and a fix on this asap. Thank you!

Waiting for the update.

UPDATE 10-July 12:31 AM PT - U.S. Pacific Time

Continuing to diagnose and mitigate the issue with engineering leads. Internal incident declared in our internal systems for tracking and post-mortem action items.

Actively troubleshooting in videoconference call with engineering now.

4 Likes

Please help to fix this.. All my apps are opening slowing and timing out. Please please please

Thank you !

Are there any updates if this issue can be resolved within the day? The app behavior seems to be not improving.

Any update?Can it be fixed within few hours?

UPDATE 10-July 1:28 AM PT - U.S. Pacific Time

The team has put some initial mitigations in place, and traffic has been rerouted to healthy server instances. Our graphs indicate a recovery in the sync times, approaching nominal levels from before the incident recovered.

Can you confirm on your side that latencies have improved for your apps?

5 Likes

works fine on my end now. thanks!

Thank you

2 Likes

Thank you, Mike

According to my assessment, the performance is currently around 70% compared to my usual usage.

It’s working fine now thank you.

Thank you everyone. We have concluded our emergency mitigation measures on this issue.

Again, I apologize for the service interruption and I recognize it had a significant impact on your applications.

In the coming days, we will continue to monitor our systems to ensure a full recovery, and work to finalize and test our hypotheses on root cause.

We will report back findings to the community, as well as action items on our side to prevent this in the future.

Mike

7 Likes

The app seems to be working and return to normal state. Thanks @Mike_Procopio .

1 Like

Hi @Mike_Procopio

I’m sure this is due to the Service Health specification, but the fault start and end times are clearly wrong.

https://status.cloud.google.com/incidents/bCk9RUNbaekMsKDQF9ZT

To begin with, how can the incident start at 2023-07-10 00:25 when it is reported in the summary as starting at 2023-07-09 20:00?

I appreciate that you have uploaded the Incident summary, but please examine the content carefully.
This is increasingly discrediting your customers.

3 Likes

Thank you for your support.

It seems to be that Appsheet working fine right now.

Hi. I think the server is still slow.
It takes 2~3 times more to load appsheet pages & sync apps.

Also, Google Maps is showing errors as below.

I think it still has lots of problems here in South Korea.

@Mike_Procopio

Hope you make your promise.

AppSheet is business application and running here and there globally. This is not a one of the cases where ALL the users globally affected (not able to use) the apps. The problem is this is not a single case, as we see quite often. Without me saying, you know the app is used in business daily, or moreover, hourrly and minutely. There will be a loss in the client (app users) while the apps are down. I will not say this if this is or was a single occassion, but this happens quite often, and we suspect it may happen even tomorrow. If the users have this doubt, then they will naturally leave from this platform.

I m not sure how AppSheet / Google management team take this seriously. Once the issue comes up, we explore to find a way how to wake up personnels in Seattle to fix issue.

Based on my past experinces for the past years, AppSheet devs team release new code at the mid night in Seattle time. then like us living in Asian pacific, NZ, AZ, Japan, Korea and other users in this resion is initially affected. At that time, you guys are on the bed.

Please review the process when you push the new code to the platform. Why we, who lives in Asian pacific, is always to be victems of your new codes?

we speak to local Google rep, however, they hvae no idea, which is understandable.

All in all, we AppSheet customers, will have no choice, but we need to cease our business, As I implied, this is a breach of SAL. I clamed numbers of the time, but no reaction from Google management.

This is a risk of Google/AppSheet as platform. Once the platform lose the suppose from the users, then the end of the time.

5 Likes