Authors:
Anna Whelan, AI Strategic Cloud Engineer, Google Cloud
Kamal Kishore, Software Engineer, Google
The world of real-time analytics demands speed and efficiency, especially when dealing with massive datasets. To empower users with cutting-edge solutions, it’s crucial to understand the performance capabilities of core services like Google Cloud Feature Store under various conditions.
This post shares the results of a comprehensive benchmarking effort focused on key metrics like scalability, cost efficiency, and, most importantly, latency. We will explore how different networking endpoints and architectural configurations impact the Feature Store’s performance. This involves a detailed analysis of:
- Optimized Feature Store with a Public Endpoint: Evaluating performance with standard public network access.
- Optimized Feature Store with Private Service Connect (PSC): Assessing the benefits of secure, private connectivity.
- A new Direct Read capability that uses Bigtable for enhanced performance.
Under the hood: Engineering the benchmarking solution
To accurately measure performance, we built a scalable and configurable benchmarking framework using Google Cloud’s native services. The solution is designed to simulate a real-world, high-throughput scenario.
The benchmarking architecture
The core of our solution is an Apache Beam pipeline that runs on Dataflow. This pipeline is responsible for generating a massive load of read requests against the Feature Store. Here’s how the components work together:
- Source Data: A text file in Google Cloud Storage (GCS) contains millions of entity IDs, which act as the keys for our feature lookups.
- Dataflow Pipeline: The pipeline reads these IDs, divides them across many workers, and makes parallel calls to the Feature Store’s online serving endpoint.
- Feature Store: This is the service we are benchmarking. We test instances with different endpoint configurations (Public, Private Service Connect).
- BigQuery: For every feature lookup, the Dataflow pipeline logs key metrics—most importantly, the end-to-end latency—into a BigQuery table for later analysis.
This architecture allows us to generate significant load and capture precise performance data for every single request.
Configuring the dataflow pipeline
With the architecture in place, the first step is to define the pipeline and its configuration options. Using a custom PipelineOptions
interface in Apache Beam allows us to easily parameterize the job, making it simple to switch between different test configurations (like public vs. private endpoints) without changing the code.
Here are the custom options we defined for our benchmarking pipeline:
public interface BenchmarkFS extends PipelineOptions {
@Description("The GCP project ID")
@Default.String("gcp-project-id")
String getProject();
void setProject(String project);
@Description("Feature Store ID")
@Default.String("feature-store-id")
String getFeaturestoreId();
void setFeaturestoreId(String fsId);
@Description("A file containing entity IDs")
@Default.String("gs://bucket/path/to/entity_ids.txt")
String getEntityIdFile();
void setEntityIdFile(String file);
@Description("Is this a private endpoint?")
@Default.Boolean(false)
Boolean getIsPrivateEndpoint();
void setIsPrivateEndpoint(Boolean isPrivate);
@Description("The private endpoint address")
@Default.String("10.11.12.13:443")
String getEndpointAddr();
void setEndpointAddr(String addr);
@Description("BQ Table to write results")
@Default.String("project.dataset.table_name")
String getBqResultsTable();
void setBqResultsTable(String table);
}
The main pipeline code then uses these options to build and run the job. The process starts by reading the entity IDs line-by-line from the source text file in Google Cloud Storage using Beam’s TextIO.read()
transform. Each entity ID is then passed downstream to be processed by the core logic of our pipeline.
public static void main(String[] args) {
BenchmarkFS options = PipelineOptionsFactory.fromArgs(args)
.withValidation()
.as(BenchmarkFS.class);
Pipeline p = Pipeline.create(options);
// Read entity IDs from a GCS file and pass them to the main processing DoFn
p.apply("ReadEntityIds", TextIO.read().from(options.getEntityIdFile()))
.apply("FetchFeatures", ParDo.of(new FetchFeatures(options)));
p.run();
}
The Core logic: Fetching features and measuring latency
The heart of the benchmark is a custom DoFn (a Beam processing function) called FetchFeatures. Each worker in the Dataflow job runs this function. For every entity ID it receives, this DoFn is responsible for connecting to the Feature Store, sending a read request, and logging the performance.
First, in the @Setup
method, which runs once per DoFn instance, we initialize the FeatureOnlineStoreServiceClient. This is crucial for efficiency, as it prevents creating a new client for every single request. This is also where we implement the logic to connect to either a public endpoint or a private one using Private Service Connect (PSC).
@Setup
public void setup() throws IOException {
// This logic creates the correct client based on the pipeline options
if (isPrivateEndpoint) {
// Connect to the private IP and port
ManagedChannel channel = ManagedChannelBuilder.forTarget(endpointAddr).usePlaintext().build();
FeatureOnlineStoreServiceSettings settings =
FeatureOnlineStoreServiceSettings.newBuilder()
.setTransportChannelProvider(
FixedTransportChannelProvider.create(GrpcTransportChannel.create(channel)))
.setCredentialsProvider(NoCredentialsProvider.create()) // PSC handles auth
.build();
this.client = FeatureOnlineStoreServiceClient.create(settings);
} else {
// Connect to the regional public endpoint
String apiEndpoint = region + "-aiplatform.googleapis.com:443";
FeatureOnlineStoreServiceSettings settings =
FeatureOnlineStoreServiceSettings.newBuilder().setEndpoint(apiEndpoint).build();
this.client = FeatureOnlineStoreServiceClient.create(settings);
}
}
Next, the @ProcessElement
method executes for each entity ID. The logic is straightforward:
- Record the start time.
- Build a
ReadFeatureValuesRequest
for the given entity. - Execute the request inside a try-catch block.
- Calculate the end-to-end latency in milliseconds.
- Create a BigQuery
TableRow
object containing the latency, the entity ID, and a timestamp. - Output the
TableRow
for the final stage of the pipeline.
@ProcessElement
public void processElement(ProcessContext c) {
String entityId = c.element();
long startTime = System.currentTimeMillis();
// Build the request object for the Feature Store
ReadFeatureValuesRequest request =
ReadFeatureValuesRequest.newBuilder()
.setEntityType(entityTypeName)
.setEntityId(entityId)
.build();
try {
// Make the actual API call to fetch features
client.readFeatureValues(request);
} catch (Exception e) {
// Handle exceptions if needed
} finally {
// Calculate latency and create a BQ row
long latencyInMillis = System.currentTimeMillis() - startTime;
TableRow row = new TableRow()
.set("entity_id", entityId)
.set("latency", latencyInMillis)
.set("timestamp", Instant.now().toString());
c.output(row);
}
}
Finally, the main pipeline takes the PCollection<TableRow>
produced by this DoFn
and streams the results directly into our BigQuery results table using BigQueryIO
.
// (Continuing from the main method...)
p.apply("ReadEntityIds", TextIO.read().from(options.getEntityIdFile()))
.apply("FetchFeatures", ParDo.of(new FetchFeatures(options)))
.apply("WriteToBQ", BigQueryIO.writeTableRows().to(options.getBqResultsTable())
.withCreateDisposition(CreateDisposition.CREATE_NEVER)
.withWriteDisposition(WriteDisposition.WRITE_APPEND));
From execution to analysis
This section covers the final steps: launching the Dataflow job and querying the results in BigQuery to understand the performance characteristics.
Running the benchmark job
You can execute the pipeline from your local machine using a Maven command. The command passes all the configuration parameters we defined in the BenchmarkFS
options interface, such as the project ID, GCS file path, and target BigQuery table.
To run the job, use the following command structure, filling in your specific project details:
mvn compile exec:java \
-Dexec.mainClass=com.example.BenchmarkFS \
-Dexec.args=" \
--runner=DataflowRunner \
--project=your-gcp-project \
--region=us-central1 \
--featurestoreId=your_fs_id \
--entityIdFile=gs://your-bucket/path/to/entity_ids.txt \
--bqResultsTable=your-gcp-project:your_dataset.results_table \
--isPrivateEndpoint=false \
--tempLocation=gs://your-bucket/temp/ \
--workerMachineType=n2-standard-4 \
--maxNumWorkers=50"
To test with a private endpoint, you would simply change the flags to --isPrivateEndpoint=true
and provide the PSC address with --endpointAddr=10.11.12.13:443
.
Analyzing the results in BigQuery
Once the Dataflow job completes, the results_table
in BigQuery will contain millions of rows, each logging the latency for a single feature lookup. Now, you can run SQL queries to aggregate this data and calculate meaningful performance statistics.
While an average latency is useful, percentile latencies (p50, p90, p95, p99) provide a much deeper understanding of the user experience. The following query uses BigQuery’s APPROX_QUANTILES
function to efficiently calculate these values:
SELECT
-- Calculate latency percentiles. p99 is the latency that 99% of requests fall under.
APPROX_QUANTILES(latency, 100)[OFFSET(50)] AS p50_latency_ms,
APPROX_QUANTILES(latency, 100)[OFFSET(90)] AS p90_latency_ms,
APPROX_QUANTILES(latency, 100)[OFFSET(95)] AS p95_latency_ms,
APPROX_QUANTILES(latency, 100)[OFFSET(99)] AS p99_latency_ms
FROM
`your-gcp-project.your_dataset.results_table`
Running this query for each benchmarked configuration (Public, PSC, etc.) will give you the precise data needed to compare their performance and make an informed architectural decision.
Conclusion: Key takeaways
This guide provides a complete, reusable framework for benchmarking the Google Cloud Feature Store at scale. By leveraging Dataflow, you can simulate high-throughput workloads and use BigQuery to capture and analyze granular performance data, allowing you to make informed, data-driven decisions for your MLOps platform.
The key takeaway from our own internal benchmarking is that the direct-read Feature Store (backed by Bigtable) offers a transformative leap in performance, consistently delivering the lowest latencies required for demanding, real-time applications.
You now have the tools to validate these findings for yourself. By using the provided pipeline code and analysis queries, you can test different endpoint configurations and machine types to find the optimal balance of performance and cost for your unique use case. In the world of real-time AI, this ability to precisely measure and optimize feature serving is not just an advantage—it’s a necessity.