If you’re a data professional and haven’t heard the term, “AI”, at least a dozen times a day in the past year, please send an invite to the rock you’re living under. I’ll bring wings! The constant barrage of this term signifies that the age of Artificial Intelligence (AI) is here, and every data professional — from the ELT engineer to the BI analyst — is scrambling to figure out how to best leverage it.
One of the most exciting use cases in the world of data analytics is the ability to use natural language to query our complex data and extract insights faster than ever before. However, if your data isn’t configured in a way to speak back to you with a single, unified voice, your brilliant AI agent will do the same thing an insecure human does when they aren’t sure of the answer: Make something up…or more accurately, “hallucinate”, in the LLM world.
The unsung hero in this story is the semantic layer. It’s the infrastructure that ensures your AI agent is getting consistent and governed context. It’s the difference between a reliable data navigator and a confidently wrong chatbot. But what is a semantic layer, exactly? A semantic layer is a translation layer that maps complex data to familiar business terms. This creates a unified, governed view of data across your organization, empowering both human users and AI agents with access to accurate insights. Ok, great, but what is a semantic layer, actually? Like, what does it look like and how do I get one?
Stop the Semantic Drift
Let’s be honest, we live in a world of semantic drift. Ask three different departments for the definition of Gross Margin and you’ll get four different answers. Ask them again next year and you may get five different answers.
-
Merchandisers might be excluding gift card purchases.
-
Finance might be excluding freight costs.
-
The teams in Europe might be including VAT.
In smaller organizations where an email chain can resolve the ambiguity, this might be fine. But in large enterprises, data silos are real and they need to be tackled with consistency and automation.
If you ask an AI agent a question about Gross Margin and it has access to different, conflicting sources from your underlying datasets, it will, best case, acknowledge the ambiguity. But more often, it will confidently provide an answer based on a best guess, which, without the proper context, will very likely be incorrect, and this can lead to mass confusion and finger pointing. As data professionals know all too well, garbage in…garbage out
.
The semantic layer addresses this by creating a single, definitive glossary for your organization’s data. I like to think of the semantic layer as playing three distinct, but equally critical, roles:
1. The Business Data Translator
Consistency is key. The semantic layer is the unifying dictionary for your organization’s key performance indicators (KPIs). It translates the raw, and often cryptic, underlying data structure (tables, columns, joins) into clear, business-friendly metrics, complete with quantitative calculations, natural language descriptions, labels, tags and other relevant metadata. When Finance asks for Gross_Margin, they know that it’s calculated the same way every time for everyone in the organization and allows an AI agent to reliably provide accurate answers. In Looker, Gross_Margin would be considered a measure, which is typically an aggregate calculation of one or more dimensions or other measures. Read more here on how Looker thinks about dimensions and measures.
2. The Security Guard
You’ve got lots of data. Some of that data is sensitive and/or should only be accessed by certain individuals. The semantic layer should ensure that all data requests — whether from a human analyst or an AI agent — adhere to strict governance rules. It’s the guard at the gate, enforcing content security as well as row-level and column-level restrictions so that sensitive data doesn’t end up in the wrong hands or is included as part of an AI agent’s response to an unauthorized user. Looker facilitates this with its powerful and robust concept of user attributes.
3. The Engineer
This role is responsible for optimization and abstraction. The semantic layer enables end-users to ask complex business questions, whether via a traditional point-and-click BI interface or via natural language with the help of AI agents, without having to get into the low-level details of the underlying SQL. Like a brilliant engineer, it figures out the best way to join tables, aggregate values, and apply filters to deliver the correct answer with efficiency. To the end user, it just seems like magic. What’s really happening is that Looker is translating the user intent into optimized SQL on the fly. The object relationships and aggregations have already been deterministically pre-defined in Looker explores.
LookML: The Blueprint for Consistency
Looker and its declarative modeling language, LookML, are the most powerful way to build your semantic layer; your single version of truth. A core principle here is metric consistency. Every metric is defined once, in one place. Let’s look at a simple, but important, question companies often ask: How many active customers do we have?
Without a semantic layer, a SQL analyst might answer the question like this:
SQL
SELECT
COUNT(DISTINCT customer_id)
FROM
orders
WHERE
order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
AND status = 'completed'
;
Meanwhile, another team might use a 120 day interval or include pending orders. If an AI agent tries to answer the question, “How many active customers do we have?”, without context (read as “a semantic layer”), it’s unlikely to return the correct response because it doesn’t know how your company defines “active customer”.
In LookML, you define the metric (a measure) once in a central view file:
LookML
measure: active_customers {
type: count_distinct
sql: ${customer_id} ;; # Counts unique customer IDs
filters: [order_status: "completed", order_date: "90 days ago for 90 days"]
description: "Customers who have completed an order in the last 90 days."
}
Now, every tool that connects to this semantic layer — every dashboard, every embedded visualization, and crucially, every natural language query from an AI agent — uses that single, definitive definition: active_customers.
If the business decides that “active” should change to an interval of 180 days, you simply change one line of LookML. Instantly, all reporting is updated with the new, consistent definition. The alternative—updating dozens (or hundreds) of SQL scripts, dropping/recreating database objects, reloading tables and potentially retraining AI models—is a recipe for governance chaos and opens the door nice and wide for the introduction of operational errors.
LookML: Safe Data Access
Beyond consistency, Looker’s semantic layer, powered by LookML, is a solid line of defense for data governance. It ensures that even when interacting with an AI agent, users only see the data they are authorized to access.
Imagine you have Sales data, and different Sales regions should only see their own numbers. With LookML, you can implement row-level security using the access_filter parameter in combination with user attributes:
LookML
explore: sales_data {
access_filter: {
field: sales_data.region
user_attribute: user_region # Assumes a user attribute called ‘user_region’ is set for each user
}
# … [other explore parameters here]
}
With this simple LookML in place, if a user from the “East” region asks the AI agent “What were our total sales last quarter?”, Looker will dynamically apply a WHERE clause to the SQL it generates and return only Sales from the “East” region. The AI agent doesn’t need to try to figure this out on its own because it’s already been deterministically defined in the LookML.
Similarly, column-level security can be enforced by redacting data from sensitive fields (or hiding the fields altogether) based on user attribute values and access grants, ensuring the fields/data are never exposed to unauthorized queries, whether from a human or an AI agent.
Governance + Accuracy = Trust
The relationship between the semantic layer and AI is simple: Govern the data, guarantee the context.
Imagine you just hired an amazing business analyst. They check all the boxes for qualifications of the job, they are super excited about the direction of your organization, they’re bringing some great ideas and experience and they’re an exceptional culture add to the team. Day one on the job, you give them direct query access to your enterprise data warehouse (EDW). Do you trust them to deliver business reporting to your leadership and your field teams? Are you confident they’ll know all the join keys and filter criteria? Are you confident they know your business vernacular and how your KPIs are calculated? Even more critically, do you trust they know who should and shouldn’t have access to certain data?
Your optimism (or insouciance) is enviable if you answered “yes” to those questions. But for the rest of us that would prefer to remain employed, the answer is a hard “No”. However, this is effectively what you’re doing when you let a LLM loose on your EDW. With all the ambiguity that’s inherent in an EDW, you’re leaving a lot to chance without giving the LLM some clear guidelines. As discussed above, the semantic layer is where we establish context:
-
This metric is defined like this…
-
These tables are joined on these keys and these filters should be applied
-
When a user asks for this, they really mean…
The semantic layer is also where we establish Governance and security:
-
Users in Group A should only see these rows from these tables
-
Ensure that users always apply a date filter on this very large dataset
-
Make sure results are fresh and not pulled from cache when queried at this time
By having all your metrics and definitions centralized and governed, you achieve two primary outcomes that are non-negotiable for AI:
-
Metric Stability: Your definitions are consistent. An AI agent powered by stable, reliable declarations of your business rules and KPIs will be measurably more accurate and less prone to generating “creative” answers.
-
Governed Insights: Your data is secure. Since the Semantic Layer is the Security Guard, you can rest assured that the AI assistant is not inadvertently exposing PII or other sensitive data that users shouldn’t have access to, reducing compliance risk and maintaining trust.
The data cloud may be the engine of the AI revolution, but the semantic layer is the blueprint and the central nervous system. It’s the essential tool that takes messy data and transforms it into the clean, consistent and governed context required for trustworthy, accurate, and truly intelligent AI.
Thanks for reading! If you’d like to see Looker’s semantic layer + AI in action, check out this short video.