I’m trying to build a solution that accomplishes the following:
Passes text files from a GCS bucket to the embeddings API (I think the files will need to be chunked first? Not sure.)
Saves the returned embeddings into a .json file in the same GCS bucket
Loads the .json file into Vector search
Allows me to have multi-turn conversations with my data
So I guess the first question is, are the steps I’ve listed above the appropriate steps to build a RAG solution from data in a GCS bucket?
I’ve gone through several notebooks on the Google Gen AI Github repo. I can get those to work just fine, but I can’t seem to get anywhere when I attempt to customize them to accomplish what I’ve listed above. Is anyone aware of any good step by step documentation or code samples that performs what I’m trying to do?
Hi Jason – Google Cloud has a couple different offerings for building a RAG app. Based on your description above, Vertex AI Search & Conversation (VASC) might be a good pick. (This product went by a few other names, previously - Discovery Engine, Gen AI App Builder, Enterprise Search)
First, you can create a VASC Data Store. You give VASC a Cloud Storage bucket with your text files. No need to pre-chunk yourself. VASC will take care of processing + creating the underlying vector embeddings in your Data Store. (Docs)
Then you can write code to query your VASC Data Store with regular text queries. This is the prompt augmentation step for RAG.
Once you get back search results and augment your prompt, you can then use the Gemini API on Vertex, for the multi turn chat. (Docs)
Here’s a code example in Java Spring, from some recent experimentation I did.
import com.google.cloud.discoveryengine.v1.SearchRequest;
import com.google.cloud.discoveryengine.v1.SearchResponse;
import com.google.cloud.discoveryengine.v1.SearchServiceClient;
import com.google.cloud.discoveryengine.v1.SearchServiceSettings;
import com.google.cloud.discoveryengine.v1.ServingConfigName;
import com.google.cloud.vertexai.VertexAI;
import com.google.cloud.vertexai.api.GenerateContentResponse;
import com.google.cloud.vertexai.generativeai.preview.ChatSession;
import com.google.cloud.vertexai.generativeai.preview.GenerativeModel;
import com.google.cloud.vertexai.generativeai.preview.ResponseHandler;
...
@PostMapping(value = "/chat", consumes = "application/json", produces = "application/json")
public ChatMessage message(@RequestBody ChatMessage message) {
String userPrompt = message.getPrompt();
logger.info("? POST /chat, prompt: " + userPrompt);
// 1 - Query Vertex AI Search (VASC aka discoveryengine API) for matching
// documents
String projectId = "YOUR_PROJECT_ID";
String location = "global";
String collectionId = "default_collection";
String dataStoreId = "YOUR_VASC_DATASTORE";
String servingConfigId = "default_search";
String searchQuery = userPrompt;
String endpoint = String.format("discoveryengine.googleapis.com:443", location);
String augment = "";
try {
SearchServiceSettings settings = SearchServiceSettings.newBuilder().setEndpoint(endpoint).build();
SearchServiceClient searchServiceClient = SearchServiceClient.create(settings);
SearchRequest request = SearchRequest.newBuilder()
.setServingConfig(
ServingConfigName.formatProjectLocationCollectionDataStoreServingConfigName(
projectId, location, collectionId, dataStoreId, servingConfigId))
.setQuery(searchQuery)
.setPageSize(10)
.build();
SearchResponse response = searchServiceClient.search(request).getPage().getResponse();
for (SearchResponse.SearchResult element : response.getResultsList()) {
Struct derivedStructData = element.getDocument().getDerivedStructData();
Map<String, Value> fields = derivedStructData.getFieldsMap();
Value extractiveAnswersValue = fields.get("extractive_answers");
ListValue listValue = extractiveAnswersValue.getListValue();
Value firstValue = listValue.getValues(0);
Struct structValue = firstValue.getStructValue();
Map<String, Value> innerFields = structValue.getFieldsMap();
Value contentValue = innerFields.get("content");
String stringValue = contentValue.getStringValue();
augment += stringValue;
}
} catch (Exception e) {
logger.error("⚠️ Vertex AI ERROR: " + e);
}
// 2 - Use augmented prompt to query Gemini (Vertex AI API)
String geminiPrompt = "You are a helpful car manual chatbot. Answer the car owner's question about their car. Human prompt: "
+ userPrompt
+ ",\n Use the following grounding data as context. This came from the relevant vehicle owner's manual: "
+ augment;
logger.info("? GEMINI PROMPT: " + geminiPrompt);
String geminiLocation = "us-central1";
String modelName = "gemini-pro";
try {
VertexAI vertexAI = new VertexAI(projectId, geminiLocation);
GenerateContentResponse response;
GenerativeModel model = new GenerativeModel(modelName, vertexAI);
ChatSession chatSession = new ChatSession(model);
response = chatSession.sendMessage(geminiPrompt);
String strResp = ResponseHandler.getText(response);
logger.info("? GEMINI RESPONSE: " + strResp);
message.setResponse(strResp);
} catch (Exception e) {
logger.error("⚠️ GEMINI ERROR: " + e);
}
return message;
}
}
Could anyone steer me in the correct direction? The moment I try to sort_by on a numeric field, the search results go haywire, and everything seems to be returned, in no order whatsoever. The modelYear below is a numeric field. In the sample below I tried boosting but that also influences everything.
# Refer to the `SearchRequest` reference for all supported fields:
# https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.types.SearchRequest
request = discoveryengine.SearchRequest(
serving_config=serving_config,
query=search_query,
page_size=10,
content_search_spec=content_search_spec,
query_expansion_spec=discoveryengine.SearchRequest.QueryExpansionSpec(
condition=discoveryengine.SearchRequest.QueryExpansionSpec.Condition.DISABLED,
),
spell_correction_spec=discoveryengine.SearchRequest.SpellCorrectionSpec(
mode=discoveryengine.SearchRequest.SpellCorrectionSpec.Mode.MODE_UNSPECIFIED,
),
# Optional: Boost search results based on conditions
boost_spec=discoveryengine.SearchRequest.BoostSpec(
condition_boost_specs=[
discoveryengine.SearchRequest.BoostSpec.ConditionBoostSpec(
condition="modelYear = 2023",
boost=1
),
]
)
# Optional: Use fine-tuned model for this request
# custom_fine_tuning_spec=discoveryengine.CustomFineTuningSpec(
# enable_search_adaptor=True
# ),
#order_by="modelYear desc",
#filter="modelYear = 2025",
)