In today’s data-driven world, Google Cloud Storage (GCS) buckets are more than just storage; they are repositories of immense potential. They’re filled with unstructured data—documents, images, audio, and videos. But for many organizations, this data is “dark.” Its semantic meaning, the rich context and information within the files, is locked away and inaccessible. We know the data is valuable, but we can’t efficiently answer “What’s in this image?” or “What’s the sentiment of this call?” at scale. How can you efficiently process, categorize, and act on this data at scale without building complex and costly infrastructure?
The answer lies in transforming your storage from a passive repository into an active, intelligent platform. Today, we’re showing how you can combine the power of Gemini models with a powerful GCS feature, Object Contexts, to make your unstructured data smart.
The “what”: Two key ingredients for smart storage
1. Gemini: The “understanding” engine
Gemini is Google’s multimodal model capable of understanding and processing information from virtually any input type. It can summarize a 50-page PDF, transcribe an audio interview, identify objects and brands in an image, or pull structured entities from a complex document. This is the “brain” that will look inside your objects and tell you what matters.
2. GCS Object Contexts: The “actionable” tags
Object Contexts are user-defined key-value pairs that you can attach to your GCS objects. Think of them as a mutable “sticky note” for your data.
Why is this a game-changer? Unlike standard object metadata, which is immutable and set at creation, Object Contexts can be added, updated, or deleted after an object is written, using a simple objects.patch API call.
This crucial difference means you can dynamically enrich an object as it moves through a pipeline, recording its processing state, analysis results, or new attributes without rewriting the entire object.
The “how”: A simple, powerful, event-driven pattern
You can build incredibly powerful enrichment pipelines with just a few serverless components. The core pattern is simple:
-
Ingest: A new file (e.g.,
product_image.jpg,invoice.pdf) lands in a GCS bucket. -
Trigger: An Eventarc trigger (listening for
google.cloud.storage.object.v1.finalized) fires a message. -
Enrich: The event invokes a serverless service, like Cloud Run or a Cloud Function. This service:
-
Reads the new object from GCS.
-
Sends its content to the Gemini API with a specific prompt.
-
-
Annotate: The service receives the JSON response from Gemini and uses the
objects.patchmethod to write the findings directly as Object Contexts for the respective object.
Here’s a diagram illustrating this flow:
Let’s look at some examples of the data flow:
-
For an image (
product_defect.jpg):-
Gemini Prompt: “Describe the main object in this image, identify any visible brands, and point out potential defects. Assign relevant tags. Respond in JSON.”
-
Object Contexts:
-
product_type=“smartphone” -
brand=“AcmeTech” -
defect_detected=“true” -
defect_type=“screen_scratch” -
processed_image=“true”
-
-
-
For a text document (
contract_draft.pdf):-
Gemini Prompt: “Extract the parties involved, effective date, and any clauses related to termination from this document. Respond in JSON.”
-
Object Contexts:
-
party_a=“GlobalCorp” -
party_b=“InnovateX” -
effective_date=“2024-03-15” -
termination_clauses_exist=“true” -
processed_document=“true”
-
-
Putting it all together: Code examples
Let’s look at the code for the “Enrich” (Step 3) and “Annotate” (Step 4) parts of our workflow, as well as how to filter on this data later. These examples use the Java client libraries.
1. The Enrichment Service (a Java Cloud function)
This function is triggered by a GCS event. It downloads the file, sends it to Gemini, and updates the file’s contexts.
You would need these dependencies in your pom.xml
<dependencies>
<dependency>
<groupId>com.google.cloud.functions</groupId>
<artifactId>functions-framework-api</artifactId>
<version>1.1.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.google.events</groupId>
<artifactId>google-cloudevent-types</artifactId>
<version>0.8.0</version>
</dependency>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-storage</artifactId>
<version>2.38.0</version>
</dependency>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-vertexai</artifactId>
<version>1.2.0</version>
</dependency>
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20240303</version>
</dependency>
</dependencies>
Here is the Java Cloud Function code:
package com.example;
import com.google.cloud.functions.CloudEventsFunction;
import com.google.cloud.storage.Blob;
import com.google.cloud.storage.BlobId;
import com.google.cloud.storage.BlobInfo;
import com.google.cloud.storage.Storage;
import com.google.cloud.storage.StorageOptions;
import com.google.cloud.vertexai.VertexAI;
import com.google.cloud.vertexai.api.GenerateContentResponse;
import com.google.cloud.vertexai.api.GenerativeModel;
import com.google.cloud.vertexai.api.Part;
import com.google.events.cloud.storage.v1.StorageObjectData;
import com.google.protobuf.ByteString;
import io.cloudevents.CloudEvent;
import java.io.IOException;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
import java.util.logging.Logger;
import org.json.JSONObject;
public class GcsEnrichmentFunction implements CloudEventsFunction {
private static final Logger logger = Logger.getLogger(GcsEnrichmentFunction.class.getName());
private static final String PROJECT_ID = "your-gcp-project-id";
private static final String LOCATION = "us-central1";
private static final String MODEL_NAME = "gemini-1.5-pro-001";
private static final String IMAGE_PROMPT = """
Analyze this product image and provide the following details in a clean JSON format:
- "product_type": What is the product?
- "brand": What brand is visible? (or "unknown")
- "defect_detected": (true/false) Is a defect visible?
- "defect_type": (e.g., "scratch", "dent", "none")
""";
private final Storage storage;
private final VertexAI vertexAI;
public GcsEnrichmentFunction() {
// Initialize clients once
storage = StorageOptions.getDefaultInstance().getService();
vertexAI = new VertexAI(PROJECT_ID, LOCATION);
}
@Override
public void accept(CloudEvent event) throws IOException {
// Deserialize the event payload
StorageObjectData data = StorageObjectData.parseFrom(event.getData().toBytes());
String bucketName = data.getBucket();
String fileName = data.getName();
String contentType = data.getContentType();
BlobId blobId = BlobId.of(bucketName, fileName);
// 1. Read the file from GCS
byte[] imageBytes = storage.readAllBytes(blobId);
// 2. Call the Gemini API (The "Enrich" Step)
logger.info("Analyzing " + fileName + " with Gemini...");
try (GenerativeModel model = new GenerativeModel(MODEL_NAME, vertexAI)) {
GenerateContentResponse response = model.generateContent(Arrays.asList(
Part.newBuilder().setData(ByteString.copyFrom(imageBytes)).setMimeType(contentType).build(),
Part.newBuilder().setText(IMAGE_PROMPT).build()
));
// 3. Set Object Contexts (The "Annotate" Step)
String jsonResponseText = response.getCandidates(0).getContent().getParts(0).getText()
.trim().replace("```json", "").replace("```", "");
JSONObject analysisResults = new JSONObject(jsonResponseText);
// Build the context map. Keys *must* start with "context."
Map<String, String> newContexts = new HashMap<>();
newContexts.put("context.product_type", analysisResults.optString("product_type", "unknown"));
newContexts.put("context.brand", analysisResults.optString("brand", "unknown"));
newContexts.put("context.defect_detected", String.valueOf(analysisResults.optBoolean("defect_detected", false)));
newContexts.put("context.defect_type", analysisResults.optString("defect_type", "none"));
newContexts.put("context.processed_image", "true");
// The key API call: storage.update
// Get the current blob info and update it with the new contexts
Blob blob = storage.get(blobId);
BlobInfo updatedInfo = blob.toBuilder().setContexts(newContexts).build();
storage.update(updatedInfo);
logger.info("Successfully enriched " + fileName);
} catch (Exception e) {
logger.severe("Error processing " + fileName + ": " + e.getMessage());
}
}
}
2. The processing workflow (filtering by context)
Now, a separate process (e.g., a nightly batch job) can efficiently find only the objects that need attention. This Java application finds all images that Gemini flagged with a defect but which have not yet been reviewed.
package com.example.filter;
import com.google.api.gax.paging.Page;
import com.google.cloud.storage.Blob;
import com.google.cloud.storage.Storage;
import com.google.cloud.storage.StorageOptions;
import java.util.Arrays;
import java.util.List;
public class FilterByContext {
public static void getDefectiveImagesForReview(String bucketName) {
Storage storage = StorageOptions.getDefaultInstance().getService();
System.out.println("Finding defective images for QA review...");
// Define our filters based on the documentation:
// https://cloud.google.com/storage/docs/use-object-contexts#filter_objects_by_contexts
// 1. We WANT objects where defect_detected is 'true'
List<String> includeFilters = Arrays.asList("defect_detected=true");
// 2. We WANT TO SKIP objects that are already reviewed
List<String> excludeFilters = Arrays.asList("qa_reviewed=true");
// 3. Make the API call using the filters
Page<Blob> blobs = storage.list(
bucketName,
Storage.BlobListOption.objectContextInclude(includeFilters),
Storage.BlobListOption.objectContextExclude(excludeFilters)
);
// This 'blobs' iterator now *only* contains the files that match
// our exact criteria, without us downloading a full bucket listing.
System.out.println("Found the following files for review:");
int count = 0;
for (Blob blob : blobs.iterateAll()) {
count++;
System.out.println("- " + blob.getName());
// Next step: Add this blob.getName() to a Pub/Sub topic
// for a human review task.
}
if (count == 0) {
System.out.println("No new defective images found for review.");
}
}
public static void main(String[] args) {
// Example usage:
// getDefectiveImagesForReview("your-product-image-bucket");
}
}
The “why”: Driving actions and efficient workflows
As the code shows, once your objects are enriched, you can drive powerful, efficient workflows at scale.
Use case 1: Efficient, resilient processing pipelines
Object Contexts are perfect for managing state. Instead of a separate database to track which files have been processed, the object itself holds its state. A video transcoding job can efficiently find files where context.transcribed == “true” AND context.transcoded == “false”, skipping files that aren’t ready or are already complete.
Use case 2: Filter and act on relevant data
You can now filter objects based on their content and attributes to automate critical business processes.
-
Security & Compliance: A DLP tool (or Gemini) scans documents and patches them with
context.has_sensitive_data = “true”. You can then use Storage Transfer Service with an object context filter to automatically move all such files to a highly secure, restricted “Quarantine” bucket. -
Customer Support Triage: Automatically identify high-priority support calls: filter audio files where
context.sentiment = “negative”andcontext.topics=“critical_issue.” -
Product Quality Assurance: As shown in our code example, quickly find all images where
context.defect_detected = “true”andcontext.qa_reviewed != “true”and route them to a human review queue.
Use case 3: Create fexible, “no-code” enrichment services
The “Jarvis” demo pattern showcases a Cloud Run service that’s triggered by files landing in different folders (e.g., /images, /audio, /documents). The service intelligently selects a different prompt for Gemini based on the folder the file arrived in. This way, you can add new content analysis workflows (e.g., adding a /legal_contracts folder) simply by adding a new prompt file—no code change required to the processing service itself.
Get started today
By combining Gemini’s deep understanding with the flexibility of GCS Object Contexts, you are no longer just storing files—you are creating a semantic, intelligent, and “aware” storage system. This pattern unlocks the full potential of your unstructured data, allowing you to automate processes, discover new insights, and build smarter applications faster.
- To learn more about Object Contexts, read the official documentation.
- To start building with Gemini, explore the Vertex AI documentation.


