I am in the process of migrating a legacy Python 2 App Engine application to Python 3, sticking to the standard environment for now. I have several services on which I have been using in my app.yaml an entrypoint setting like:
entrypoint: gunicorn -b :$PORT -w 1 main:app
with one worker process.
I forgot to add this on a couple services though and noticed that they were immediately being killed with:
While handling this request, the process that handled this request was found to be using too much memory and was terminated. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may have a memory leak in your application or may be using an instance with insufficient memory. Consider setting a larger instance class in app.yaml.
Further investigation into the logs showed that it was actually starting up 8 worker processes for a single instance, with the default entrypoint being:
What is in the default /config/gunicorn.py? None of the rest of the documentation for specifying an entrypoint says anything about this, so hopefully itâs nothing important. It was just curious to see thisâŚ
I checked one of our Apps and for an F1 instance without entrypoint specified, Google runs gunicorn with 4 workers. Based on that, 8 would make sense for an F2 instance. But youâre right that this seems to go against the documentation youâve referenced.
Itâs possible that the documentation wasnât updated (they increased the default memory for the different classes because Python 3 required more memory and a larger footprint to run). You can click on the âsend feedbackâ button on the bottom of the page and tell Google about it.
Unless Google is using a custom gunicorn.py, the default for workers is 1 and the comment in the file i.e. gunicorn.py recommends a value of 2-4 x $(NUM_CORES). You also find the same recommendation in gunicorn documentation
Thanks for the reply. Unless someone from Google wants to chime in I think what you write seems logical; youâre probably right that the documentation is just out of sync.
In my case 8 workers even for an F2 instance is far too many; but itâs also a large application. Obviously these are just guidelines and are going to depend entirely on the memory footprint of the app.
I highly recommend to stick with one worker per instance, and use threads (2 or 4 x $(NUM_CORES)). Let app engine instances do the rest of the scaling.
Hey, thanks for the reply, and I will check out your article. For what itâs worth, we ended up doing 1 worker per instance as you wrote, and are using the gevent worker. Getting gevent working with the legacy app engine runtime is pretty non-trivial but weâve done it successfullyâI am still planning to write an article on how to achieve this.
Looking forward to the write up. I am sure that gevent would provide performance improvements, I saw a 20-30- improvement with our end-to-end tests. I think I mention that we hit runtime issues with grpc and did not investigate further.