In today’s interconnected world, businesses operate across borders and cater to a diverse global audience. This presents challenges in providing customer support in multiple languages and handling inquiries that may be synonymous (customers may use different terminologies in their queries to seek information).
One of the revolutionary foundation models that successfully addresses both issues is c****hat-bison, based on Google’s Generative AI (GenAI) technology. The Google Cloud Console provides a Model Garden where multiple Vertex AI foundational models and APIs can be explored. For answering questions, we specifically rely on chat-bison in this blog.
But wait…
What are foundational models, and how are they named?
Foundation models are pre-trained multitask large language models that can be tuned or customized for specific tasks using Generative AI Studio, the Vertex AI API, or the SDK for Python.
Their names only have two components: ‘use case’ and ‘model size’ if they are of the latest version. The naming convention for its stable counterpart appends a 3-digit version and is in the format
Here, our use case is so welcoming to beginners that you may not notice the difference between the outputs of different model versions.
Addressing multilingual FAQ challenges: Cater Globally
Traditional FAQ systems often struggle with multilingual content, requiring manual translation and the maintenance of separate FAQ databases for each language. This can be time-consuming, costly, and error-prone.
Chat-Bison overcomes this limitation by leveraging GenAI to automatically translate FAQs into multiple languages while adapting to regional nuances. This ensures that customers can access accurate and relevant support information regardless of their preferred language.
Tackling synonymous FAQs: A Hidden Cost Center
Another common challenge in FAQ management is handling synonymous inquiries. Customers may ask for the same information using different wordings, potentially impeding the model’s ability to deliver accurate results.
Chat-Bison understands the semantic relationships between words and phrases, enabling it to identify synonymous FAQs. This ensures that customers can find the information they need, regardless of the specific terms they use.
How does chat-bison answer FAQs?
Chat-Bison is a fine-tuned version of the PaLM 2 model, specifically designed for multi-turn conversation use cases. When answering an FAQ from question-answer pairs only, it utilizes a similar process as it does for answering general questions. Here’s a simplified overview:
- Question processing: Chat-bison analyzes the input question, extracting keywords, phrases, and the overall intent of the query.
- FAQ matching: It matches the extracted keywords and phrases against the available FAQ pairs, identifying the most relevant FAQ that potentially answers the question.
- Answer extraction: It extracts the answer portion of the matched FAQ, ensuring it directly addresses the user’s query. If needed, it may also rephrase the extracted answer into a concise and informative response tailored to the specific question.
How to implement chat-bison?
First things first! Install the required packages, set up authentication, a project ID, location variables, etc. Also, import the required libraries.
# For interacting with Generative AI Studio, install the Vertex AI SDK.
!pip install "shapely<2.0.0" -q
!pip install google-cloud-aiplatform langchain pandas google-api-python-client config --upgrade -q
import vertexai
from vertexai.language_models import ChatModel, InputOutputTextPair
project_id = '<Project-ID>' # {type:"string"}
vertexai.init(project=project_id, location="us-central1")
chat_model = ChatModel.from_pretrained("chat-bison")
Then, we articulated 100 questions in various ways and languages. Out of them for the demo purpose, an example is taken of asking about the number of leave days in the hypothetical company ‘XYZ’ and specifying its answer as ‘23 days’. Hence the question, ‘How many leave days are there in company XYZ?’ with its answer ‘23 days’ form one of the input-output text pairs coded as follows:
faqs = ["How many leave days are there in company XYZ?",....]
answers = ["23 days",.....]
faq_pairs = []
for i in range(len(faqs)):
faq_pairs.append(InputOutputTextPair(input_text=faqs[i],
output_text=answers[i]))
chat = chat_model.start_chat(examples=faq_pairs)
parameters = {
"max_output_tokens": 512, # control the maximum length of responses ([1-2048] tokens, Default: 1024 tokens ≈ 600-800 words)
"temperature": 0.2, # higher temperature -> more creative response (degree of randomness: [0.0-1.0], Default: 0.0)
"top_p": 0.8,# higher top_p -> more diverse response ([0.0-1.0], Default: 0.95)
"top_k": 40, # higher top_k -> more probable candidates explored and fluent text ([1-40], Default: 40)
"candidate_count": 1 # number of response variations ([1-4], Default: 1)
}
We ask our considered simple question, ‘How many leave days are there in company XYZ?’ in various languages and get the answer ‘23 days’ as follows:
mlingual_questions = [
"Combien de jours de congé dans l'entreprise XYZ?", # French
"¿Cuántos días de licencia en la empresa XYZ?", # Spanish
"Wie viele Urlaubstage gibt es im Unternehmen XYZ?", # German
"XYZ公司有多少天假?", # Chinese
"XYZ社には何日休暇がありますか?", # Japanese
"XYZ 회사에는 몇 일의 휴가가 있습니까?", # Korean
"Сколько дней отпуска в компании XYZ?", # Russian
"Quantos dias de licença na empresa XYZ?", # Portuguese
"Quanti giorni di ferie nell'azienda XYZ?", # Italian
"كم عدد أيام الإجازة في شركة XYZ؟", # Arabic
"XYZ कंपनी में कितने दिन की छुट्टी होती है?", # Hindi
"XYZ কোম্পানিতে কত দিনের ছুটি হয়?", # Bengali
"Hoeveel verlofdagen heeft het bedrijf XYZ?", # Dutch
"Ile dni wolnych ma firma XYZ?", # Polish
"Kolik dní dovolené má společnost XYZ?", # Czech
"Hány nap szabadsága van az XYZ cégnek?", # Hungarian
"Câte zile de concediu are compania XYZ?", # Romanian
"Πόσες ημέρες άδειας έχει η εταιρεία XYZ;", # Greek
"Kuinka monta lomapäivää XYZ-yrityksellä on?", # Finnish
"Hur många semesterdagar har företaget XYZ?", # Swedish
"Hvor mange feriedage har virksomheden XYZ?", # Danish
"Hvor mange feriedager har selskapet XYZ?", # Norwegian
"Mitme puhkepäeva on ettevõttel XYZ?", # Estonian
"Cik daudz atvašu dienu ir uzņēmumam XYZ?", # Latvian
"Kiek atostogų dienų turi įmonė XYZ?", # Lithuanian
"Скільки днів відпустки є в компанії XYZ?", # Ukrainian
"Колко дни отпуск има компания XYZ?", # Bulgarian
"Koliko dana odsustva ima kompanija XYZ?", # Serbian
"Koliko dni dopusta ima podjetje XYZ?", # Slovenian
"Koľko dní dovolenky má spoločnosť XYZ?", # Slovakian
"Koliko dana bolovanja ima tvrtka XYZ?" # Croatian
]
for question in mlingual_questions:
response = chat.send_message(question, **parameters)
print("\033[91m" + f"{question}" + "\033[94m" + f" Answer:{response.text}")
Note: Close to 40 languages are supported; for the other languages, the model outputs, ‘I’m not able to help with that, as I’m only a language model. If you believe this is an error, please send us your feedback.’
Now, we ask the identical question in English only, employing diverse approaches:
similar_questions = [
"XYZ leaves?", "XYZ leave days?", "leaves in XYZ?",
"leave days at XYZ?", "leave days for XYZ?",
"leave days available at XYZ?", "leave policy for XYZ?",
"paid time off for XYZ?", "vacation policy for XYZ?",
"sick leave for XYZ?", "personal leave policy for XYZ?",
"PTO for XYZ?", "leave entitlement for XYZ?",
"How many paid leaves do XYZ employees have?",
"How many leaves are there in company XYZ?",
"What is the number of leaves for employees at XYZ?",
"What is the average number of leaves for XYZ employees?",
"How many paid leaves are employees entitled to at XYZ?",
"How many paid leave days do XYZ employees have?",
"What is the minimum paid leave entitlement for XYZ workers?",
"How many days of paid leave are XYZ employees granted annually?",
"What is the average number of leave days for XYZ employees?",
"How many paid leave days are employees entitled to at XYZ?",
"What is the minimum number of paid leave days for XYZ workers?",
"How many days of paid leave are granted annually to XYZ employees?",
"How many leave days are there in company XYZ?",
"What is the number of leave days for employees at XYZ?",
"How many days of leave are available to employees at XYZ?",
"What is the leave policy for employees at XYZ?",
"How many days of paid time off do employees at XYZ receive?",
"What is the vacation policy for employees at XYZ?",
"How many days of sick leave do employees at XYZ get?",
"What is the personal leave policy for employees at XYZ?",
"How many days of PTO do employees at XYZ have?",
"What is the leave entitlement for employees at XYZ?"
]
for question in similar_questions:
response = chat.send_message(question, **parameters)
print("\033[95m" + f"{question}" + "\033[90m" + f" Answer:{response.text}")
Multi-candidate text generation: This provides a more detailed response specifying 'safety attribute scores’, ‘grounding metadata’, etc. as well.
Do you notice differences in the ‘safetyAttributes’ in the output when the question is asked in different languages or ways?
from pprint import pprint
chat = chat_model.start_chat(examples=faqs)
sample_questions = [
"How many leave days in company XYZ?",
"XYZ कंपनी में कितने दिन की छुट्टी होती है?", # Hindi
"XYZ社には何日休暇がありますか?", # Japanese
"What is the minimum paid leave entitlement for XYZ workers?"
]
for question in sample_questions:
response = chat.send_message(question, **parameters)
print("\033[91m" + f"{question}" + "\033[94m")
pprint(f" Answer:{response}")
print("\n")
Once we introduce the ‘context’ with conflicting information, ‘XYZ offers 17 leaves’, output response changes for the question to ‘17 days’ and ‘23 days’ is no longer picked from the declared ‘examples’ pairs!
chat = chat_model.start_chat(context='XYZ offers 17 leaves', examples=faqs)
sample_questions = [
"How many leave days in company XYZ?",
"XYZ कंपनी में कितने दिन की छुट्टी होती है?", # Hindi
"XYZ社には何日休暇がありますか?", # Japanese
"XYZ leaves?", "XYZ leave days?", "leaves in XYZ?",
"leave days at XYZ?", "leave days for XYZ?",
"leave days available at XYZ?", "leave policy for XYZ?",
]
for question in sample_questions:
response = chat.send_message(question, **parameters)
print("\033[91m" + f"{question}" + "\033[94m" + f" Answer:{response.text}")
You can also stream the output of the conversation model. This means you can generate a prompt response as soon as its tokens are generated. For this, you can use the next() function to get the next value from the generator as follows:
for question in sample_questions:
response = chat.send_message_streaming(question, **parameters)
response = next(response)
print("\033[91m" + f"{question}" + "\033[94m" + f" Answer:{response.text}")
If we change the parameters to have a maximum number of candidate answers of 4 and change the context to have multiple types of leaves, we can observe the following output:
parameters = {
"max_output_tokens": 2000,
"temperature": 0.2,
"top_p": 0.0,
"top_k": 40,
"candidate_count" : 4
}
chat = chat_model.start_chat(context='XYZ offers 17 annual leaves. Also, its policies indicate 18 medical and 7 casual leave days.', examples=faqs)
For a single response, the model may even sum up the given numbers of annual, medical, and casual leaves: 17+18+7 = 42 leave days as per the context, intelligently as follows:
Consider adjusting other parameters, including top_p, top_k, and temperature, to 'heat up’ or ‘cool down’ the model! This may not affect the response in this simple use case. Also, experiment with the output token limit. If you keep it extremely low (< 3), the output response might be incomplete even if only the leave day number is requested!
Limitations and Considerations using Chat-Bison
While chat-bison can be a valuable tool for answering FAQs from question-answer pairs, it has certain limitations that should be considered when using it for this task. A few of them are as follows:
-
Reliance on question formulation: The ability to identify the most relevant FAQ is heavily dependent on the formulation of the input question. If the question is poorly phrased, ambiguous, or incomplete, chat-bison may struggle to match it with the appropriate FAQ.
-
Picking an answer from the knowledge base: Chat-Bison may not always pick the answer, even if the asked question is very similar to one of the input questions. This mainly happens for the factual question, for which answers can be picked from its knowledge base as well. For example, the question ‘How many leaves in India?’ may prompt the model to pick the answer from what it is trained on instead of fetching it from the Q&A pairs specified under ‘examples’.
-
Language limitations: Support for additional languages is expected to be added in the future, as the newer versions of the model are under development at the time of writing (‘chat-bison@latest’ indicates the latest version available).
-
Inconsistency: When the same set of questions is inputted again, the model may hallucinate and not respond consistently with the same answer or accuracy in the next run. Also, the order in which the queries are listed may impact the results. For example, try playing around with the order of these Q-A pairs.
questions = [ "How many leave days in company XYZ?", # General query at the top "How many medical leave days in company XYZ?", "How many casual leave days in company XYZ?"] answers = [ "7 days", "17 days", "18 days", ]
When you test, you may notice that if the query indicating leave days in general is placed below other queries having particular leave types and you ask, ‘How many leave days in company XYZ?’, it may match with the questions above it!
This pretty much sums up the intricacies of the model from a use-case perspective. It’s important to consider these limitations when using chat-bison for a particular task and to implement strategies to mitigate them. This may include combining ‘chat-bison’, human expertise, and other tools to handle more complex or nuanced queries and adapt to dynamic FAQ content.
Conclusion
Chat-Bison is a conversational AI model trained to engage in natural and engaging dialogue. It’s a valuable tool for enhancing human-computer interactions, which represents a significant advancement in FAQ management technology, leveraging GenAI to address the challenges of multilingual and synonymous FAQs. By automating FAQ resolution, this empowers businesses to provide a consistent and personalized customer experience across languages and regions.
References
More details about the chat-bison model:
Text chat | Vertex AI | Google Cloud
Language support for Vertex AI PaLM API:
Available models in Generative AI Studio | Vertex AI | Google Cloud
Multi-candidate text generation response and safety attributes:
Class TextGenerationResponse (1.36.2) | Python client library | Google Cloud
Responsible AI | Vertex AI | Google Cloud
Thanks for reading! Have questions? Please leave a comment below.









