Hi everyone, and hopefully Google Product/Engineering teams,
I’m raising an issue regarding what appears to be a systemic compute allocation bottleneck affecting the advanced models, specifically impacting paid subscribers and professional users.
The Issue:
During peak global usage hours, there is a severe degradation in the model’s logic depth, reasoning capabilities, and output token limits. I frequently encounter aggressive rate limiting and truncated reasoning chains that do not occur during off-peak times.
The Professional Impact:
As a researcher in functional neurosurgery and Brain-Computer Interface (BCI) applications, I rely on uncompromised model reasoning for structuring complex logic in multi-center clinical trials and neuro-modeling data synthesis. This dynamic resource scaling—seemingly to accommodate the massive influx of free-tier users and promotional/educational account abuse (bots/black market)—is directly penalizing high-value, professional users. It essentially turns compute allocation into a zero-sum game where paid tier SLAs are compromised.
Questions for the Community & Google PMs:
- Are other developers/researchers experiencing similar systematic reasoning degradation during peak hours?
- For Google: Is there a roadmap for implementing stricter compute isolation and prioritized, uncompromised inference pipelines for paid/advanced tiers?
I have already escalated this via Google One Support (Case ID: 9-8982000040242) but believe this strategic issue requires direct visibility from the engineering and compute allocation teams.
Attached are screenshots of my support interaction detailing the problem. I look forward to hearing if others are facing this SLA constraint.
