The data it transmits are three strings. If I try it locally it correctly streams the strings as soon as they are processed. But when I deploy it, it looks like it buffers the response and sends it only when it is completely processed.
I can’t understand why this happens. I’m avaiable for further clarifications.
This is common when using HTTP/2, which is the default on Cloud Run because Google Cloud Run buffers the response before sending it.
Here are the workarounds based on Google Cloud Documentation:
HTTP/2 and Response Buffering: Cloud Run uses HTTP/2 by default, and HTTP/2 typically requires the entire response to be prepared before it is sent. This can result in buffering of your data until the entire stream is ready to be delivered.
Increase Cloud Run Timeout: If your function is taking longer than the default timeout, increase the timeout limit for your Cloud Run service. The default timeout is 15 minutes, but you can extend it up to 60 minutes.
FastAPI Configuration: Make sure you’re using the latest versions of FastAPI to ensure proper async handling. (Please note to utilize this link with caution since this is not maintained by Google and could be inaccurate or outdated.)
Configure Cloud Run Resources: Check your Cloud Run resource configuration (memory and CPU allocation) to ensure that your service has enough resources to handle the streaming request effectively.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
HTTP/2 and Response Buffering: So the problem is the buffering, can I disable it?
Increase Cloud Run Timeout: It is strange, the function takes around 15 seconds, sending a chunk every 5. The problem is that I have to add additional chunks so it is not possible to have such a slow response. I need to have streaming, it is supported on AWS.
FastAPI Configuration: I’m using the last fastapi module and it works properly locally, so I guess it is configured correctly.
Configure Cloud Run Resources: It is not so heavy on memory, btw I have 2gb ram and 1 cpu. With is more than enough for 1 instance but it does not work.
Is there any workaround? Or should I swith to some other cloud services?