The Gemini Enterprise Flowchart:
Ways to Connect a data source to Gemini Enterprise
| Feature | 1. Ingestion (Index) | 2. Federation (Live) |
| How it Works | Ingests data, creates vector embeddings, and builds a searchable index | Queries the external API live at search time (No indexing) |
| Privacy & Security | Stored in Google Cloud• Encrypted (CMEK supported)• HIPAA / FedRAMP | |
| Pros & Cons | ||
| Identity | FlexibleWIF + Identity Mapping | StrictRequires User OAuth context |
Deep Dive: Execution Guide (The “How-To” for Each Step)
Use this reference to assign tasks to the right people and avoid common pitfalls.
[ 1 ] Pre-built Connector Strategy
| Category | Details |
GCP: roles/discoveryengine.admin (Create Data Store).Source App: Admin access (e.g., Jira Admin) to generate API Keys/Tokens. |
|
| SaaS: Requires Public Internet egress. On-Prem: Requires Private Service Connect (PSC) to bridge Google Cloud to your data center. |
|
| API Limits: Large indexes (1M+ items) can hit SaaS API rate limits. VPC-SC: If using VPC Service Controls, you must whitelist the SaaS provider’s IPs.PSC Global Access: For On-Prem, you must enable “Global Access” on your Internal Load Balancer, or Gemini (a global service) cannot reach it |
|
| Jira Cloud, Salesforce, ServiceNow, Confluence, SharePoint On-Prem. | |
| Connector List & Guides |
[ 2 ] Federated Success (WIF + OAuth)
Choose this for live, real-time access to highly sensitive or fast-changing data.
| Category | Details |
GCP: roles/discoveryengine.admin.End User: Must grant OAuth consent (“Allow Gemini to access Salesforce?”). Admin: Must allow the OAuth app in the IDP. |
|
| Live Query: Traffic flows from Google’s backend directly to the App API at search time. | |
| Latency: Search speed depends entirely on the external API’s speed. VPC-SC & Actions: If VPC Service Controls are on, Actions (e.g., “Create Ticket”) are blocked by default to prevent data exfiltration. You must explicitly allow the method. Availability: If Salesforce is down, search is down. |
|
| Salesforce, Slack, ServiceNow (Live Mode). | |
| 1. Configure Workforce Identity Federation 2. Example: Connect to Salesforce 3. Example: Connect to Slack |
**
[ 3 ] Google-on-Google (Native)**
The default path for Workspace customers.
| B | |
|---|---|
| Category | Details |
| Workspace: Super Admin must enable “Gemini for Google Workspace”. | |
| Internal Google Traffic (Zero setup required). | |
| Context-Aware Access (CAA): If you use CAA to block users from accessing Drive via certain IPs, ensure Gemini’s service agents are not inadvertently blocked by these policies. | |
| Google Drive, Gmail, Sites, Slides, Sheets. | |
| 1. Set a Google Drive Data Store 2. Connect to Storage Data Store 3. Connect to BigQuery |
[ 4 ] Configure Workforce Identity (WIF)
The prerequisite for ANY 3rd party Identity Provider (Okta, Entra, Ping).
| Category | Details |
GCP: roles/iam.workforcePoolAdminIDP: Global Admin (to create the OIDC/SAML App). |
|
| Public OIDC/SAML Handshake. Your IDP must be reachable via public internet. | |
Case Sensitivity: Jane@Co.Com != jane@co.com. Always use .lowerAscii() in your attribute mapping ADFS: If using On-Prem ADFS, you must expose the metadata endpoint or use a proxy so Google can verify the token. Sync Lag: Group membership changes in Entra ID can take ~1 hour to reflect. |
|
| Connecting Entra ID (Azure AD), Okta, or PingIdentity. | |
| 1. Configure identity provider 2. Configuring Worforce Identity for Entra ID 3. Okta ID |
**
[ 5 ] The Builder (Custom Connector)**
For apps with no pre-built connector (Workday, Legacy SQL).
| Category | Details |
GCP: roles/discoveryengine.editor (Service Account).Source: Read-Only API Key. |
|
Ingress: Script must reach Source Data.Egress: Script must reach discoveryengine.googleapis.com. |
|
You are the IDP: You define the ACLs manually. If you push readers: ["public"], the document is public.Private Access: If running the script inside a private VPC, enable “Private Google Access” to reach the Gemini API without public internet. Staleness: Data is only as fresh as your cron job frequency. Code Sample: Your JSON payload must structure ACLs correctly: |
|
| Workday, SAP, Oracle DB, Homegrown HR Portals. | |
| 1. Create a Custom Connector 2. Auto Schema Detection |
[ 6 ] Identity Mismatch (External Identity Mapping)
The “Rosetta Stone” for fixing empty search results.
| Category | Details |
GCP: roles/discoveryengine.admin |
|
Ingress: Script → Data Source.Egress: Script → discoveryengine.googleapis.com. |
|
| Orphaned Docs: If a user ID changes in the source app but not in your mapping file, the user loses access. Mapping Limits: 500k identities per load. Large organizations may need batching. Maintenance: You must automate the upload of this mapping file, or new employees won’t see legacy data. |
|
Jira Data Center (jdoe), Legacy SQL (db_user), Windows File Shares (DOMAIN\User). |
|
| 1. Manage Identity Maps |
**
[ 7 ] Website Crawler (Public Data)**
The shortcut for public documentation and marketing sites.
| Category | Details |
| Permissions | GCP: roles/discoveryengine.admin.Web: Verified Owner in Google Search Console. |
| Public DNS & HTTP/S. | |
| No Auth: Cannot crawl behind a login page. Robots.txt: Must allow Googlebot or Google-Cloud-Vertex-AI-Agent-Builder.WAF: Ensure your corporate firewall doesn’t block the crawler’s User Agent. |
|
docs.company.com, company.com/blog, Public Help Center. |
|
| 1. Create a public web data store |




