This tip is the next in sequence to the tip below.
Integrate AppSheet with Google Gemini AI API using… - Google Cloud Community
We can analyze images with Google Gemini AI API.
Use Cases
The tip demonstrates the use cases of
1. Take photo of an instrument and extract the readings into an AppSheet table record field: The sample app demonstrates taking photos of electrical energy meters and converting the reading into an AppSheet record field. The use case can be extended to similar use cases on the factory floor, construction sites or any such outdoor application.
2. An OCR functionality: One can take photo of a business card and or some document and convert the contents into textual format to be stored in an AppSheet record.
Google Gemini Model Used and Pricing
Show More
I used Google Gemini 1.5 Pro API free tier along with Google Apps Script.
Please note the Google 1.5 Pro pricing given in the link below
Implementation and acknowledgements :
Show More
I have mainly extended the earlier tip that analyzed textual data. This time we use the Google Gemini API to analyze images.
Here is the link to the earlier tip:
Integrate AppSheet with Google Gemini AI API using… - Google Cloud Community
I have mainly used the Google Apps Script code given in the Google code lab mentioned below,. Please take a look at the sections 4 and 5. I have made minor modifications to the code to read inputs from an AppSheet app and read the return results back in AppSheet.
Automate Google Workspace tasks with the Gemini API | Google Codelabs
For getting the token count of the inline data ( prompt plus image in an AppSheet record ) I have referred and adapted the code from the following article by the Google Apps Script expert Tanaike
The images in the sample app are from Google image search and the business card is based on a sample in Microsoft Word.
The GAS Code:
Show More
const properties = PropertiesService.getScriptProperties().getProperties();
const geminiApiKey = properties['GOOGLE_API_KEY'];
//const geminiEndpoint = `https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro:generateContent?key=${geminiApiKey}`;
const geminiProVisionEndpoint = `https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro:generateContent?key=${geminiApiKey}`;
// The function gets two inputs in the form of two modals- the prompt ( the text input from the AppSheet column [Instruction] )
// and the image data from the image in an AppSheet app record. This image data is named as const imageData
function callGeminiProVision(prompt, image, temperature=0) {
const imageData = Utilities.base64Encode(image.getAs('image/png').getBytes());
const payload = {
"contents": [
{
"parts": [
{
"text": prompt
},
{
"inlineData": {
"mimeType": "image/png",
"data": imageData
}
}
]
}
],
"generationConfig": {
"temperature": temperature,
},
};
// The code block below computes the total token count that the AI query will have as an input
// The query is bimodal- it has prompt as text input and image input that is convrted to base 64 string.
const baseUrl = "https://generativelanguage.googleapis.com";
const model = "models/gemini-1.5-pro";
const version = "v1beta";
const payload1 = { contents: [{ parts: [ { text: prompt+imageData },] }] };
const totalTokens = JSON.parse(UrlFetchApp.fetch(`${baseUrl}/${version}/${model}:countTokens?key=${geminiApiKey}`, { payload: JSON.stringify(payload1), contentType: "application/json" }).getContentText());
Logger.log(totalTokens);
const options = {
'method' : 'post',
'contentType': 'application/json',
'payload': JSON.stringify(payload)
};
const response = UrlFetchApp.fetch(geminiProVisionEndpoint, options);
const data = JSON.parse(response);
const content = data["candidates"][0]["content"]["parts"][0]["text"];
return content;
}
//The function below calls the function callGemini() coded above
// The output returned by the Gemini is returned back to Appsheet app using the "Apps Script tasks return value" option.
// The input to the function is image from the current app record and the prompt. The user can write the form to ask specific information from
//Gemini API out of image
// The function reads the input image and the prompt and returns the requested details from the image.
function testGeminiVision(prompt1, imageInput) {
var prompt=prompt1;
var image = UrlFetchApp.fetch(imageInput).getBlob();
const output = callGeminiProVision(prompt, image);
console.log(prompt, output);
return output;
}
Some GIFs to show the Gemini AI API with AppSheet in action:
Read the energy meter and business card
The images to be read are captured in an AppSheet’s card view as the screenshot below shows
The GIF below shows the energy meter reading given by Gemini after reading the image and prompt in blue font ( Prompt: Read the meter only in digits without any units). The GAS is invoked by a data change bot invoked by an action at the bottom right in the card view.
The meter reading output returned by the Gemini AI API is given in red font for easy identification.
It takes a while for the Gemini to process the input and return the results. So please wait a while till the full GIF runs in below GIF files. ![]()
GIF for reading the second energy meter.
A GIF for reading a business card is shown below
Show More
Show More
Some insights on tokens
Show More
The Gemini AI pricing is based on tokens. So it is good to know how many tokens are consumed in test.
The energy meter photos seem to have more dense information. Their API calls consumed substantial tokens as captured in GAS logs screenshots shared below.
The energy meter image calls consumed around 300 K tokens
The business card image analysis consumed only about 20 K tokens.
Disclaimer
Show More
The tip shared is purely for demonstration purpose of possibilities with Gemini AI API.
Please read the applicable Google terms and pricing before integrating the Gemini AI API in your apps. Please ensure it fits your use case for accuracy and type of your business etc. The AI field is still relatively new. So please doubly satisfy yourself with proper testing, legal and commercial terms of use of AI before deploying in production level apps.
What other use cases community colleagues think of for the image reading capability of Gemini AI API that can be used in an AppSheet app?












