A Scalable Looker Architecture: Taming Complexity & Empowering Teams

A Scalable Looker Architecture: Taming Complexity & Empowering Teams

This article is meant to be an architectural guide for the administrators of a new or growing Looker instance. It provides a reference architecture that is designed to maximize governance and data quality without sacrificing the flexibility that developers and analysts need to move quickly. The core principle of this approach is to create clear lines of ownership that reduce administrative overhead and empower teams to manage their own data assets.

Does any of this sound familiar?

  • Users complain that “It’s difficult to know what dashboards are useful or outdated.”
  • Your instance is cluttered with countless Explores, and developers feel it’s easier to create new ones than to update what already exists.
  • Dashboards or Explores break, and it’s unclear who is responsible for fixing them.
  • As an administrator, you are overwhelmed by requests from users and LookML developers.

If these challenges resonate, this architectural guide offers a path toward a more scalable, governed, and low-touch Looker environment.

The Core Philosophy: Team-Centric Ownership

The foundation of this architecture is organizing your Looker instance around teams. A team is a group of analysts and developers who collaborate on data projects and who typically share data sources, subject matter expertise, and a common set of primary stakeholders.

Each team is given ownership over a corresponding set of assets:

  • A dedicated GCP billing project and service account (or equivalent for other data warehouse dialects).
  • A unique Looker connection.
  • A single Looker model and associated model configuration.
  • A dedicated LookML folder within a project.
  • A dedicated content folder for dashboards and Looks.

This one-to-one mapping creates an unambiguous chain of responsibility, from query cost all the way to dashboard quality.

Key Technical Elements

Here is a breakdown of the key Looker components of this architecture and how to manage them.

LookML Projects

Definition: A Looker project is a collection of LookML files that are version-controlled by the project’s own Git repository. A project is the highest-level container for your semantic models and defines a community of developers who share the same code base and CI/CD workflow.

Management: We recommend organizing your LookML into three distinct projects to serve different developer communities and security requirements:

  • general-project: This is the main project, which contains the majority of your LookML. Any team can request development access here. Explores created in this project are generally accessible to all Looker users.
  • secret-project: This project houses LookML models that require access to sensitive data, such as material non-public information (MNPI) or personally identifiable information (PII). Development is restricted to teams with a legitimate need to model this data. You might have more than one secret project if there are multiple types of sensitive data with different need-to-know developer audiences. For info on how access to Explores presented in this project’s models can be controlled using automation, check out another article I wrote alongside Reddit: Keep data secure: Linting LookML for access filters and access grants with LAMS
  • curated-project: This project is owned by a central team (e.g., Data Science, BI & Analytics) responsible for creating and maintaining high-quality, reusable, general-purpose LookML. Other teams can import LookML from this project into their own using remote project import, allowing them to build on a trusted foundation without needing developer access to the curated project itself.

Separating projects by developer community is crucial. While Looker allows restricting query access to specific models within a project, it’s operationally simpler and less confusing for developers if they can access all the code within a project. If a developer can’t query a model, they shouldn’t be developing on it.

LookML Folders

Definition: The Looker IDE allows you to create folders within a project, enabling you to organize your LookML files into a logical hierarchy.

Management: Each team is assigned a dedicated folder within the general-project and, if needed, the secret-project. This folder contains their model file and all associated view and explore files.

This folder structure is a powerful governance tool. By mapping each team’s folder to a team in your Git provider’s CODEOWNERS file, you enforce that any pull request which modifies files in that folder must be approved by the owning team.

# Example CODEOWNERS file
# Assign ownership of all files in the team-a-folder to the 'team-a-developers' GitHub team.
/team-a-folder/ @my-org/team-a-developers


This structure ensures that a clear owner is responsible for the quality and maintenance of the code, while still allowing other developers to contribute by submitting a pull request for the owners to review.

Connections

Definition: A connection powers Looker’s ability to run queries against your data warehouse. For BigQuery, a connection requires a billing project (where query costs are allocated) and an authentication method.

Management: Every team gets its own connection, which maps one-to-one with their model. This architecture uses machine default credentials (MDC) with an impersonated service account as the authentication method.

  • The Looker instance authenticates to Google Cloud using its own principal (the MDC).
  • For each query, that principal impersonates a team-specific service account. This requires granting the MDC roles/iam.serviceAccountTokenCreator on each billing project.
  • This team-specific service account is granted roles/bigquery.dataViewer on the necessary datasets and roles/bigquery.jobUser on its team’s billing project.
  • Grant the MDC roles/bigquery.dataEditor on each project to enable PDTs.
  • This model ensures that query costs are billed to the team that owns the LookML model, not the user who is running the query. This incentivizes teams to write efficient LookML and provides a clear path for remediation when queries are slow or expensive.
  • Note that MDCs are only available for Looker (Google Cloud Core); for either Looker-hosted or customer-hosted Looker (original) a downloaded JSON key for the service account credentials is needed, and typically a separate one is used for each connection.

With this in place, data access is primarily controlled by granting the team’s impersonated service account viewer access to BigQuery datasets. If a team needs access to a dataset in another team’s project, you simply grant their impersonated service account the necessary IAM role (roles/bigquery.dataViewer) on that dataset.

Models

Definition: A model file specifies a database connection and defines the Explores that use it. It’s where you define joins, relationships, and other model-wide settings. To be functional, every LookML model file (.model.lkml) must have a corresponding model configuration created in the Looker UI, which links the LookML model to its allowed connection(s).

Management: Each team has one model file inside their LookML folder. This file brings together all the views and Explores developed by that team. The model name should align with the team’s connection name, content folder name, and GCP project name for consistency.

Since each model maps one-to-one with a connection, the configuration is simple: in the model configuration settings, allow only the single connection created for that team.

Explores

Definition: An Explore is the starting point for a query in Looker. It defines a logical object that users can query, composed of a base view and one or more joined views.

Management: Explores can be defined directly in the model file, but consider instead defining them in their own dedicated files (e.g., my_explore.explore.lkml) inside an /explores sub-folder within the team’s main LookML folder. The team’s model file then includes all explores from that folder that use a wildcard: include:"/team-a-folder/explores/*.explore.lkml". In each explore file, only include the view files necessary for that explore. This approach improves validation times and simplifies Explore maintenance.

Here are a few rules of thumb for designing effective Explores:

  • Create as few Explores as possible. Combine related use cases into a single, well-structured Explore to simplify the user experience. However, also avoid creating one giant Explore that tries to do everything.
  • Stick to many_to_one joins. Though Looker’s symmetric aggregation feature makes it possible to model fanouts without introducing any aggregation errors, avoiding them will improve query performance. If you need to add information from a table that requires a many_to_many join, consider pre-aggregating the right-hand table to a grain that allows for a many_to_one join instead.
  • Build around fact tables. A common pattern is to create one Explore per fact table, which joins to a shared constellation of dimension tables.

For sensitive data in the secret-project, you can use access filters or access grants to implement row-level or column-level security within an Explore. To ensure that these security rules are never accidentally removed, you can enforce their presence with a linter. For a detailed guide, see this Looker Community article: Keep data secure: Linting LookML for access filters and access grants with LAMS.

Content Folders

Definition: Content folders are where users save and organize dashboards and Looks.

Management: Each team gets a dedicated content folder. To maximize discoverability, the built-in “All Users” group is granted View access to all team folders.

Each team manages a corresponding Looker Group (e.g., “Team A Content Editors”) which is granted Manage Access, Edit permissions on their folder. The members of this group are responsible for curating the folder’s content — ensuring it is accurate, up-to-date, and well-organized. This list should be kept small to maintain a strong sense of ownership.

Architecture Diagram

The following diagram provides a visual representation of this architecture, illustrating how different teams, BigQuery projects, LookML projects, LookML folders, LookML models, connections, and repositories interact within the Looker ecosystem. It shows the flow of data access, code management, and query billing.

Conclusion

By implementing this team-centric architecture, you can build a Looker instance that scales gracefully, providing robust governance while empowering your teams to own their data products from end to end.

6 Likes

Hello

Thank you for the great guide, I just have one question about the Looker models. You are saying this

Each team has one model file inside their LookML folder.
This file brings together all the views and Explores developed by that team.

But looking at the Looker performance overview documentation, it is saying this

Limit the number of views included within a model when a large number of view files are present.
Including all views in a single model can slow performance.
When a large number of views are present within a project, consider including only the view files needed within each model.
Consider using strategic naming conventions for view file names, to enable easy inclusion of groups of views within a model.
An example is outlined in the includes parameter documentation.

So what I’m trying to understand, is what if the team with their Looker project has like 50 / 100 fact tables they need to expose to the users. Should they create only a single model, because we are seeing a lot of slowness issues with all views of a team being in a single model.

What would be your recommendation for this, should the strategy be to divide the code per model ? And how to choose what should be in which model and how to name them ?

Thanks a lot it would be really helpful to have your inputs

Kind Regards

PS : assume that the team has only one connection but with a lot of fact tables exposed, so all models are mapped to the connection.

Thanks for the question!

You are right to point out that validation and loading explores can be slowed the more code the LookML compiler needs to handle within that model. I usually see this be worst in Dev mode, where the code needs to be checked more often since it is subject to change. I also mostly see this happen when there is an include which adds view files that are actually not needed in that model (using this actually-not-best-practice line automatically added to every new model file include: “/views/*.view”).

But it seems your issue might be neither of those. I do wonder whether 100 explores really ought to belong to one team and one model access grain - like, are all of those really used by the same audience and managed by the same group of developers? Consider whether your 50-100 fact tables could/should be consolidated into fewer tables so that more information can be presented in each explore.

That said, explores should only be as big as they need to be - full outer joining two fact tables so all that info can be in one explore is another very common anti-pattern which results in slow queries. In those cases I generally recommend summarizing the righthand fact table to a grain which avoids x_to_many joins.

But I’ll assume that, while interesting, those pieces of advice don’t help you here. Is there a problem with splitting a single team’s explores into two or more models in the same project? I can’t see one! Might even be a really good organizing paradigm for explorers and developers alike. I would consider keeping them in the same team LookML folder if the dev team is the same for both sets of explores. As for how to choose which to go in which model… for any subjective question like that I like to think from the perspective of an Explore user who just joined the company yesterday. That persona should be able to get value out of the Looker instance with no supervision - make the model names (or explore group labels) a useful tool to help them find the explores relevant to them. Is there a decision tree you can help encourage for approaches to analytical question-asking? Like, “Topic_A - Core Explores” and “Topic_A - Deep Dives” or “Gold” v “Silver” or “Product” v “Accounting” … that kind of thing.

Please ask follow up questions if any of these topics piqued your interest!

1 Like