Multimodal Gemini powered marketing: scalable and personalized

Over the past several years, marketing professionals have pursued the aspiration of creating personalized experiences, captivating content and seamless scalability. Envision an environment where each customer interaction is meticulously crafted, content production is no longer a limiting factor, and marketing campaigns are optimized in real-time.

Given that Generative Artificial Intelligence (GenAI) has transitioned from a theoretical concept to a feasible reality, it possesses the potential to unlock transformative opportunities. GenAI will revolutionize the marketing landscape through its ability to facilitate personalized experiences, engage in dynamic content creation, and optimize campaigns in real time.

In this article, we will examine the various marketing workflows in which GenAI is likely to have a significant impact. We will also explore the various services and approaches that can be adopted to harness the potential of GenAI and achieve this impact.

This represents a non-exhaustive overview; alternative approaches and workflows may exist beyond the scope of this article.

Generative AI for marketing:

Generative artificial intelligence (GenAI) possesses the ability to construct various forms of text, including marketing content, social media posts, and even personalized messaging. Additionally, it can facilitate the generation of new images through text-to-image capabilities. However, to harness the full potential of underlying GenAI models, proficiency in prompt engineering techniques is essential. This raises the question of whether it is feasible to delegate the complex aspects of prompt engineering to the model itself by simply providing guidelines. Furthermore, it is also relevant to consider the potential impact of GenAI adoption on go-to-market (GTM) strategies and its applicability in improving performance of marketing campaigns.

GenAI presents significant opportunities for enhancing overall efficiency and productivity across various processes and domains. It holds the promise of improving the customer experience by streamlining interactions and delivering more personalized and engaging content.

Let’s comprehensively deep dive into each phase of marketing content creation and identify the specific areas where GenAI can have the most significant impact.

New content creation:

GenAI possesses the capability to revolutionize the process of creating marketing content. It not only generates text and descriptions but also provides prompts that can be utilized in image generation based on the provided sample template image or guidelines related document. By incorporating user persona details, themes, and festival details, it can deliver personalized messaging and marketing content, enhancing the overall customer experience.

In the provided examples, the model was given Google RCS Business Messaging platform guidelines along with a concise product description or offer and product type. Using the guidelines, the Gemini model could produce comprehensive rich content cards with required details, an image prompt, and a negative prompt that could be utilized to generate a completely new image for a given product marketing campaign.

INPUT:

  • Media platform and/or company guidelines.
  • Sample template image if required.
  • Product specifications and any available offers.
  • Product classification
  • Additionally, the following information can be used:
  1. Intended user persona
  2. Theme
  3. Seasonal marketing initiatives

OUTPUT:

  • Tiles - Within the maximum character limit
  • Description - Within the maximum character limit
  • Call to action - Appropriate call-to-action options for a product category
  • Image prompt text - To be used to create a new marketing image using Imagen
  • Negative prompt - Elements that should not be included in the generated image

During the demonstration of the RCS workflow, we effectively generated and distributed marketing content specific to RCS to test devices solely based on a concise statement encompassing product information and its related category.

Search existing marketing contents:

Why reinvent the wheel if a solution already exists that can be reused, thus avoiding duplication of effort? However, the challenge lies in making all past marketing campaign-related content searchable across the organization. The obvious solution would be to maintain all metadata and documentation in a standardized format, although this can be challenging. An alternative approach is to utilize embedding for multimodal representation.

Embeddings for Multimodal creates 1408-dimensional vectors from video, image, or text inputs. These vectors are interchangeable and can be used for tasks like image classification and content moderation. These vectors can also be used for searching video by image, image by text, or searching text by image etc.

The integration of marketing content through multimodal embedding will provide the versatility to conduct searches for any content using text, image screenshots, or brief video clips, thereby enabling the identification of all existing similar content.

from vertexai.vision_models import MultiModalEmbeddingModel

mm_embedding_model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding")

Enhance the existing images or newly generated images:

Within the marketing domain, it is customary for clients to express a desire for the utilization of a new product imagery or the reuse of preexisting marketing materials. Additionally, for various media platforms, the same generated image may require rescaling to different aspect ratios or further enhancement. In such situations, the image enhancement capabilities of Imagen present an opportunity to augment the quality of existing images, rendering them aesthetically more appealing and suitable for marketing applications.

In this workflow, features of Imagen such as the enhancement of backgrounds, upscaling of resolutions, alterations of aspect ratios, and the overlay of images and logos based on Python templates have been demonstrated.

It is impractical to manually compose a detailed background prompt for all existing or generated images based on the model being used to produce optimal results, and this could impact overall GTM timelines even after incorporating additional human effort. In such instances, the multimodal capabilities of Gemini can be utilized to generate several background prompt options based on the provided image, product category, and potentially targeted persona or theme.

Additionally, this workflow can be leveraged within the retail domain as well to enhance the existing images of the entire catalog.

In light of the chosen media platform, the image must undergo a process of rescaling to a distinct aspect ratio or upscaling to yield higher resolution images. Employing a 1:1 aspect ratio image during the generation of a new image offers a significant advantage. Specifically, during the rescaling process, the primary subject element can be conveniently aligned to the right, left, center, up, or down by creating a corresponding mask with minimal effort.

We have now reached the final stage of image generation, where logo titles or marketing taglines are superimposed onto the image. Many text-to-image models also support text generation, but it can be challenging to control typos, font, logo placement and color styles in the final image. Potentially, future models may possess the capability to address this issue as well. However, considering we need a granular control around font, text & logo placement, an alternative approach could be to utilize a template-based method, as illustrated in the image below, to overlay the content accordingly. In the subsequent section, we will investigate how this process can be automated.

Template to code:

Upon obtaining the requisite information and content, the subsequent phase entails ensuring that the placement and overlay of all elements comply with the template specifications for a specific media platform.

To guarantee that logo placements, titles, and taglines adhere to corporate style guidelines, they must be integrated using a photo editing tool or custom code to create a template.

The multimodal capability offers distinct advantages. When an individual uploads the template image alongside relevant information such as the logo, marketing image, and superimposed text, Gemini possesses the capacity to generate template codes based on the provided template image. Subsequently, a human can review and modify this generated code as a one-time activity, enabling the reuse of the same template for all similar content generation tasks.

For example below PIL based template logic has been generated by providing template images along with all the required text & images.

(Note: Actual text, font style and image paths are edited in the sample code.)

from PIL import Image, ImageDraw, ImageFont

# Configuration

COMPANY_LOGO_PATH = "logo.png"

MARKETING_IMAGE_PATH = "marketing_image.png"

OUTPUT_IMAGE_PATH = "output_image.png"

FONT_PATH = "<font>.ttf"

# Load images

company_logo = Image.open(COMPANY_LOGO_PATH).convert("RGBA")

marketing_image = Image.open(MARKETING_IMAGE_PATH).convert("RGBA")

# Define scaling factors

company_logo_scaling_factor = 0.15

marketing_image_scaling_factor = 0.5

# Resize images

company_logo = company_logo.resize((int(company_logo.width * company_logo_scaling_factor), 
    int(company_logo.height * company_logo_scaling_factor)))

marketing_image = marketing_image.resize((int(marketing_image.width * marketing_image_scaling_factor), 
    int(marketing_image.height * marketing_image_scaling_factor)))

# Calculate placement for the marketing image

marketing_image_x = int(marketing_image.width)

marketing_image_y = int((marketing_image.height)/2 - marketing_image.height/2)

# Paste the marketing image onto the background

marketing_image.paste(marketing_image, (marketing_image_x, marketing_image_y), marketing_image)

# Define text content and styles

title_text = "<Title generated for marketing>"

title_font_size = int(marketing_image.width * 0.05)

title_font = ImageFont.truetype(FONT_PATH, title_font_size)

title_color = "#202124"

# Calculate text placement

title_x = int(marketing_image.width * 0.05)

title_y = int(marketing_image.height * 0.17)

# Draw text on the image

draw = ImageDraw.Draw(marketing_image)

draw.multiline_text((title_x, title_y), title_text, font=title_font, fill=title_color)

# Calculate placement for the company logo

company_logo_x = int(marketing_image.width * 0.05)

company_logo_y = int(marketing_image.height * 0.67)

# Paste the company logo onto the image

marketing_image.paste(company_logo, (company_logo_x, company_logo_y), company_logo)

# Save the final image

marketing_image.save(OUTPUT_IMAGE_PATH)

Marketing template evaluation:

Every marketing content must undergo a thorough review process prior to publication. This review process must adhere to both company marketing guidelines and platform-specific guidelines. Sometimes this process can be very time consuming, error prone and exhausting due to monotonous nature.

By utilizing Gemini’s multimodal features of document understanding here, all PDF guidelines can be appended to the prompt to evaluate all generated and created marketing content right from text, images, video and even audio as well. This approach can substantially expedite the overall review process and optimize the go-to-market (GTM) strategy.

For instance, in the screenshot provided below, the PDF document of RCS guidelines, along with all the GenAI generated content, is passed to the Gemini model to obtain the final verdicts. In such cases, the reasoning assists any human reviewer in quickly comprehending the rationale employed by the model to produce such outcomes.

A similar methodology can also be integrated as a quick assessment of the quality of all the GernAI-generated content, incorporating a scoring and reasoning mechanism to filter out content based on a predetermined threshold.

{

    "evaluation": [

        {

            "category": "Character Limit",

            "score": "1.0",

            "reasoning": "The text length is well within the 178 character limit."

        },

        {

            "category": "Image Content: Logo",

            "score": "1.0",

            "reasoning": "The image does not contain any visible logos."

        },

        {

            "category": "Image Content: Distortion",

            "score": "1.0",

            "reasoning": "The image is clear and free from any distortions."

        },

        {

            "category": "Image Content: Text",

            "score": "1.0",

            "reasoning": "The image does not have any overlaid text."

        }

    ]

}

Dialogflow agents:

Let us now analyze from the perspective of enhancing the end customer experience after marketing messages are delivered to the customer. How can we fundamentally gain a deeper understanding of the customer’s specific requirements, address fundamental and evident questions, and capture pertinent details to provide a competitive quotation or discounts?

Utilizing human agents to respond to all inquiries may not be a scalable solution, and frequently, effort is wasted due to invalid leads or customers who are merely exploring to gather more information without having an immediate need. Assigning agents in such situations will assist in offloading all repetitive queries and gathering additional data regarding the customer’s requests.

An agent is a virtual entity capable of handling concurrent conversations with end-users. It is a natural language understanding module that comprehends the intricacies of human language. GenAI-based agents possess reasoning abilities, enabling them to engage in human-like conversations and gather all pertinent information to better assist customer service agents.

In accordance with the specific application and operational workflow of the business, Natural Language Processing (NLP), Generative Artificial Intelligence (GenAI), or a hybrid combination of both (NLP+GenAI) agents can be strategically deployed within the media platform’s interface. This deployment aims to effectively support customer inquiries and queries. Additionally, the integration of a multimodal GenAI model within the agent will facilitate the digitization of unstructured documents. Furthermore, the model’s capabilities extend to analyzing uploaded images or videos as an integral part of the business workflow, thereby enhancing the customer support experience.

Scaled & personalized marketing workflow:

The image above depicts a potential workflow for the generation of scaled and personalized marketing content. Let’s explore each step in detail.

Marketing campaign list:

The process begins with a marketing campaign list and detailed campaign information (offers, product details, theme, target audience, template image, channel guidelines, etc.).

Generate marketing text and image prompts:

Gemini uses multimodal capabilities to produce marketing text and image prompts based on input campaign parameters and reference images associated with each campaign.

Image Generation & Refinement:

  • Generate New Image: Using the auto-generated image prompt via Gemini, Imagen creates an initial image relevant to the campaign.
  • Enhance Background: The background of the generated image is further enhanced using another auto-generated prompt via Imagen edit functionality.
  • Rescale/Upscale: The image is resized or upscaled as needed using inpainting and outpainting feature of Imagen.

Quality Check (Image):

The generated images undergo a quality check based on the provided guidelines using Gemini. The “GenAI Generated Quality Evaluations” suggests if the image meets the required standards or any further corrections are required.

Decision Point:

A decision point checks if the generated image adheres to the guidelines.

  • No: If the image doesn’t meet the guidelines, the process loops back to generate a new image using refined prompts (Above steps are repeated).
  • Yes: If the image meets the guidelines, the process proceeds for content personalization.

Content Personalization & Template Generation:

  • Template Code Generation: Code is generated via Gemini to integrate the personalized text and logos onto the approved image, using the provided template image as a reference.
  • Final Image Generation: The final marketing image is generated by combining the approved image, personalized text, and logos based on the generated template code.

Quality Check (Content):

The final marketing content (image with text and logos) is evaluated based on the company, platform and/or marketing guidelines to expedite the review process.

Conclusion:

Gemini, with its multimodal capabilities, offers an innovative solution for processing diverse data types through its novel Mixture-of-Experts (MoE) architecture. This enables various industries to process a wide range of input sources, including images, videos, and audio, beyond just text. This expanded capability facilitates the generation of insights and corrections to enhance the overall response quality.

In this article, we have showcased a few workflows where Gemini’s multimodal capabilities can have a significant impact. Nevertheless, there are additional use cases, particularly in the marketing realm, that were not covered in this discussion but where Gemini could play a pivotal role.

What’s Next:

Authors:

Other contributors:

  • Aejaz Saiyed | Cloud GTM Specialist, Google Cloud
  • Abhinav Jha | Head of Business Development, Communications Partnerships, Google
3 Likes

Great piece. Indeed, Generative AI is revolutionizing the market with content creation and enhancing customer experience.