Optimise AI training cost and reduce storage QPS via GCSFuse async prefetch
For AI/ML engineers and data scientists, the speed at which models can ingest data is a primary determinant of training efficiency. GCSFuse serves as a critical bridge, allowing users to interact with massive datasets in Google Cloud Storage as if they were local filesystems. By providing high-performance, cost-effective access to these buckets, GCSFuse is foundational for running training, serving, and checkpointing workloads at scale.
With the async metadata prefetch feature of GCS Fuse, customers can make AI training:
- Faster: Save up to 20ms for metadata calls and improve execution times by 48%.
- Cost efficient: Reduce metadata read cost by 108x by making 200x less Queries Per Second (QPS) to GCS.
The bottleneck: Metadata lookup latency
In data-heavy AI/ML pipelines, the performance of the filesystem is often limited not by throughput, but by metadata lookup latency. Every time a workload accesses a file, a “stat” operation is required to confirm file type and metadata. Historically, these individual, blocking network calls to GCS have been costly and slow, with latencies ranging from 20–100ms.
When these workloads are dealing with billions of files (as seen in TensorFlow datasets or massive Hugging Face datasets), these micro-delays compound into significant “read stalls.” Furthermore, traditional caching mechanisms often suffer from “cold starts” or frequent evictions, forcing the system back to the network for every operation.
Previously, GCS Fuse relied on loading metadata for the entire storage bucket at the mount. While this helps solve metadata latency, it fundamentally has two problems:
- Signal to noise ratio: Customers keep additional data (noise) outside the core training data (signal) in the same bucket. By loading metadata for these additional data, the metadata cache size becomes bloated.
- Inconsistent with data drift: Since metadata was loaded at the mount, when data changes over time, systems must either reduce the TimeToLive (TTL) or fetch metadata for the entire bucket again. There is no way to balance the metadata size vs TTL so that only active data sets are refreshed with aggressive TTL.
Optimising metadata lookup
The Async Metadata Prefetch introduces an intelligent background mechanism that proactively batch-retrieves directory metadata, activated upon initial directory access. This implementation offers superior scalability over legacy methods by targeting only active directories within a bucket and maintaining the capability to refresh cache entries dynamically throughout the workload execution.
GCSFuse, for each file operation (e.g., Read, Write, Stat), requires a metadata lookup handled by FUSE’s LookUpInode call. This can be served by making 2 StatObject or 1 StatObject + 1 ListObject/GetFolder network calls to GCS. The results can then be cached to serve further requests. When operating at a directory level, these operations can be optimised by doing a single ListObjects call at the directory level rather than executing two calls for each object in the directory.
By moving to directory-based prefetch during the first access, async prefetch provides the following benefits:
1. Performance
By proactively batch-retrieving only active directory metadata, the need for individual, blocking network calls is eliminated. Testing shows substantial execution time reductions that translate directly to faster training iterations. In FIO based benchmarking, reading 10K 1MiB files and comparing the performance of various bucket types, there is up to a 48% reduction in read times.
| Bucket Structure | Read Time (W/O Prefetch) | Read Time (W/ Prefetch) | Improvement (%) |
|---|---|---|---|
| Flat Bucket | 13m 53s | 9m 28s | 46% |
| Flat (Implicit Dirs) | 9m 15s | 6m 13s | 48% |
| HNS (Hierarchical) | 8m 49s | 7m 16s | 21% |
2. Cost efficiency
Metadata operations incur API costs. A single ListObjects call, when used for batching, is significantly more efficient than issuing thousands of individual GetObjectMetadata requests. For high-volume customers, this shift can lead to massive cost reductions.
Cost Savings Analysis (Assuming 100 files in an active directory):
- Flat bucket: $0.005 for 1000 Class A operations, and $0.0004 for Class B operations.
- HNS bucket: $0.0065 for 1000 Class A operations, and $0.0005 for Class B operations.
| Scenario (100 files per directory) | API Pattern | Estimated Cost | Notes |
|---|---|---|---|
| Prefetch Enabled | 1 List call (Class A) | $0.005 x 10⁻³ | - |
| Prefetch Disabled (HNS) | 100 x 2 GetMetadata (Class B) | $0.1 x 10⁻³ | 20x more expensive |
| Prefetch Disabled (Implicit Dirs on flat namespace) | 100 GetMetadata (Class B) + 100 ListObjects (Class A) | $0.54 x 10⁻³ | 108x more expensive |
By enabling async metadata prefetch, cost reductions can reach up to 108x and QPS reduction by 200x.
3. Scalability
As training clusters grow to hundreds of nodes, the system must handle massive concurrency without succumbing to “list storms” or Out-of-Memory (OOM) risks. Modern metadata optimization includes “Targeted Mode” and global concurrency throttling, ensuring that the system remains stable even under extreme metadata density.
Control the behaviour of Async metadata prefetch
Customers can control the behaviour of async metadata prefetch by adjusting the following parameters while mounting the storage buckets via Fuse:
| Parameter | Notes | Possible values | Default value |
|---|---|---|---|
enable-metadata-prefetch |
Enables async prefetch of metadata. | Boolean value: true, false. |
false |
metadata-prefetch-entries-limit |
Specifies the maximum metadata entries to prefetch per directory. Values > 5000 result in multiple sequential Cloud Storage list calls. | Integer between -1 and 2147483647. Set to -1 to prefetch all entries. |
5000 |
metadata-prefetch-max-workers |
The maximum number of concurrent background workers allowed to perform metadata prefetching across all directories. | Integer between -1 and 2147483647. Set to -1 for unlimited workers. |
10 |
Sample mount configuration
Async metadata prefetch is available from GCS Fuse version above 3.8.0 and GKE version: 1.35.2-gke.1852000. To get started with Fuse, please refer to documentation on how to mount a GCS bucket using Fuse.
gcsfuse Command
gcsfuse --enable-metadata-prefetch=true my-bucket /path/to/mount
GKE Fuse CSI driver
mountOptions:"metadata-cache:enable-metadata-prefetch:true"
