1 request per pod

I implemented a pipeline in GKE with several pods communicating each other. One of these pod (call it “A”) can send multiple requests to another pod (“B”).

The service implemented in B is quite slow. This is why I would like to ensure that B scales horizontally so that it is guaranteed that in each replica of B there is always one and only one service instance running. In any moment in time, there should be as many pods active as active requests.

I tried to implement a HPA logic on memory or cpu however, if the requests to B are too close to each other in time, the requests are directed to the same pod.

How do I make sure that every pod serves one and only one request?

1 Like