Hi Google Cloud SQL Team & Community,
We are using Cloud SQL for PostgreSQL and rely heavily on its capabilities. We have a use case where we need to efficiently calculate approximate quantiles (like median, percentiles, and potentially averages based on unique underlying entities) directly within our database queries.
Our Scenario:
We aggregate user feedback data (e.g., ratings) which is linked to interaction spans and organized into hierarchical clusters. A key requirement is to aggregate metrics up the hierarchy, ensuring each unique interaction span contributes only once. While the supported postgresql-hll extension works perfectly for calculating unique counts (like the volume of unique spans) efficiently using HLL sketches, we lack a comparable solution for value-based aggregations like average ratings or percentile distributions that respect this uniqueness rule at scale.
The Need:
Performing exact calculations (percentile_cont, etc.) or joins that retain unique IDs across the hierarchy are often too slow or resource-intensive due to the scale of the data. We need an efficient, in-database method for approximate quantile/distribution analysis, similar to how HLL handles counts.
Feature Request:
We request that Google Cloud consider adding support for one or both of the following well-established PostgreSQL extensions to Cloud SQL for PostgreSQL:
- Apache DataSketches (datasketches-postgresql): Specifically, the KLL sketch functionality within this library provides mergeable, approximate quantile sketches. This would align well with the existing HLL support.
 - t-digest (tvondra/tdigest): Another popular extension offering mergeable sketches for approximate quantiles and rank-based statistics.
 
Why in Cloud SQL?
Having these capabilities directly within Cloud SQL would allow us to perform these complex aggregations efficiently without resorting to:
- Moving large datasets out of Cloud SQL for external processing.
 - Implementing significantly slower, less accurate approximations within standard SQL.
 - Migrating to self-managed PostgreSQL on Compute Engine solely to install these extensions (which adds considerable operational overhead).
 
Adding support for robust approximate quantile sketching extensions like DataSketches KLL or t-digest would significantly enhance Cloud SQL’s capabilities for analytical workloads dealing with large datasets and complex aggregation needs.
Could the Cloud SQL team investigate the feasibility of supporting these extensions?
Thank you!