Hello,
In the last few weeks we started experiencing the following error in Apigee.
The Router shows errors against the client since it does not have upstreams (Message Processors) available:
2024/11/08 12:31:46 [error] 6382#6382: *359174066 upstream timed out (110: Connection timed out) while connecting to upstream, client: xxx.xxx.xx.xx, server: api.xxxxxxxx.xx.xxxx, request: "GET /customers/api/customers/00000000/documents HTTP/1.1", upstream: "http://xx.xx.xx.xx:8998/customers-api/api/customers/00000000/documents", host: "api.xxxxxxxx.xx.xxxx"
At the same time, Message Processor return time outs against the Cassandra Cluster:
2024-11-08 12:31:46,508 org:XXXX env:production api:xxxxx rev:2 messageid:iacvm0000-4447-347605043-673 policy:OAuth-v20-Store-External-Token Apigee-Main-37 ERROR DATASTORE.CASSANDRA - AstyanaxCassandraClient.insert() : Exception while insert rowkey [eyJhbGcixx, oauth_20_access_tokens, kms] to the columnFamily : {} in the keyspace {}
com.netflix.astyanax.connectionpool.exceptions.PoolTimeoutException: PoolTimeoutException: [host=None(0.0.0.0):0, latency=2001(2001), attempts=2]Timed out waiting for connection
at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.waitForConnection(SimpleHostConnectionPool.java:231)
....
2024-11-08 12:31:46,510 org:xxxx env:production api:oauthv2 rev:2 messageid:iacvm0000-4447-347605043-673 policy:OAuth-v20-Store-External-Token Apigee-Main-37 ERROR DATASTORE.CASSANDRA - AstyanaxCassandraClient.logHostPoolInCaseOfErrors() : Cassandra Host Pool under use - All Hosts: xx.xx.xx.xx(xx.xx.xx.xx):9160,yy.yy.yy.yy(yy.yy.yy.yy):9160,zz.zz.zz.zz(zz.zz.zz.zz):9160. Active Hosts: xx.xx.xx.xx(xx.xx.xx.xx):9160,zz.zz.zz.zz(zz.zz.zz.zz):9160
2024-11-08 12:31:46,510 org:xxxx env:production api:oauthv2 rev:2 messageid:iacvm0000-4447-347605043-673 policy:OAuth-v20-Store-External-Token Apigee-Main-37 ERROR KERNEL - ErrorMessages.formatMessage() : Unable to locate a resource bundle for error code invalid_request
2024-11-08 12:31:46,510 org:xxxx env:production api:oauthv2 rev:2 messageid:iacvm0000-4447-347605043-673 Apigee-Main-37 ERROR MESSAGING.FLOW - AbstractAsyncExecutionStrategy$AsyncExecutionTask.logException() : Exception caught. Message is invalid_request
2024-11-08 12:31:46,510 org:xxxx env:production api:oauthv2 rev:2 messageid:iacvm0000-10276-194337648-574 policy:OAuth-v20-Store-External-Token Apigee-Main-35 ERROR DATASTORE.CASSANDRA - AstyanaxCassandraClient.insert() : Exception while insert rowkey [eyJhbGcxxxxx, oauth_20_access_tokens, kms] to the columnFamily : {} in the keyspace {}
com.netflix.astyanax.connectionpool.exceptions.PoolTimeoutException: PoolTimeoutException: [host=None(0.0.0.0):0, latency=2000(2000), attempts=2]Timed out waiting for connection
at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.waitForConnection(SimpleHostConnectionPool.java:231)
....
In Cassandra we did not see any error or “reconection”. It only shows logs about garbage collector but at every time (sometimes 20sec of gc duration) and not only during error.
Also, we exposed jmx metrics in grafana, and during the issue we see a blink in message processor.
Do you know why that blink?
Do you know why Message Processors have those timeouts against the Cassandra?
Is there any command we can execute to determinate the root cause?
Thanks,
Mateo.