A Scalable Looker Architecture: Taming Complexity & Empowering Teams
This article is meant to be an architectural guide for the administrators of a new or growing Looker instance. It provides a reference architecture that is designed to maximize governance and data quality without sacrificing the flexibility that developers and analysts need to move quickly. The core principle of this approach is to create clear lines of ownership that reduce administrative overhead and empower teams to manage their own data assets.
Does any of this sound familiar?
- Users complain that “It’s difficult to know what dashboards are useful or outdated.”
- Your instance is cluttered with countless Explores, and developers feel it’s easier to create new ones than to update what already exists.
- Dashboards or Explores break, and it’s unclear who is responsible for fixing them.
- As an administrator, you are overwhelmed by requests from users and LookML developers.
If these challenges resonate, this architectural guide offers a path toward a more scalable, governed, and low-touch Looker environment.
The Core Philosophy: Team-Centric Ownership
The foundation of this architecture is organizing your Looker instance around teams. A team is a group of analysts and developers who collaborate on data projects and who typically share data sources, subject matter expertise, and a common set of primary stakeholders.
Each team is given ownership over a corresponding set of assets:
- A dedicated GCP billing project and service account (or equivalent for other data warehouse dialects).
- A unique Looker connection.
- A single Looker model and associated model configuration.
- A dedicated LookML folder within a project.
- A dedicated content folder for dashboards and Looks.
This one-to-one mapping creates an unambiguous chain of responsibility, from query cost all the way to dashboard quality.
Key Technical Elements
Here is a breakdown of the key Looker components of this architecture and how to manage them.
LookML Projects
Definition: A Looker project is a collection of LookML files that are version-controlled by the project’s own Git repository. A project is the highest-level container for your semantic models and defines a community of developers who share the same code base and CI/CD workflow.
Management: We recommend organizing your LookML into three distinct projects to serve different developer communities and security requirements:
general-project: This is the main project, which contains the majority of your LookML. Any team can request development access here. Explores created in this project are generally accessible to all Looker users.secret-project: This project houses LookML models that require access to sensitive data, such as material non-public information (MNPI) or personally identifiable information (PII). Development is restricted to teams with a legitimate need to model this data. You might have more than one secret project if there are multiple types of sensitive data with different need-to-know developer audiences. For info on how access to Explores presented in this project’s models can be controlled using automation, check out another article I wrote alongside Reddit: Keep data secure: Linting LookML for access filters and access grants with LAMScurated-project: This project is owned by a central team (e.g., Data Science, BI & Analytics) responsible for creating and maintaining high-quality, reusable, general-purpose LookML. Other teams can import LookML from this project into their own using remote project import, allowing them to build on a trusted foundation without needing developer access to the curated project itself.
Separating projects by developer community is crucial. While Looker allows restricting query access to specific models within a project, it’s operationally simpler and less confusing for developers if they can access all the code within a project. If a developer can’t query a model, they shouldn’t be developing on it.
LookML Folders
Definition: The Looker IDE allows you to create folders within a project, enabling you to organize your LookML files into a logical hierarchy.
Management: Each team is assigned a dedicated folder within the general-project and, if needed, the secret-project. This folder contains their model file and all associated view and explore files.
This folder structure is a powerful governance tool. By mapping each team’s folder to a team in your Git provider’s CODEOWNERS file, you enforce that any pull request which modifies files in that folder must be approved by the owning team.
# Example CODEOWNERS file
# Assign ownership of all files in the team-a-folder to the 'team-a-developers' GitHub team.
/team-a-folder/ @my-org/team-a-developers
This structure ensures that a clear owner is responsible for the quality and maintenance of the code, while still allowing other developers to contribute by submitting a pull request for the owners to review.
Connections
Definition: A connection powers Looker’s ability to run queries against your data warehouse. For BigQuery, a connection requires a billing project (where query costs are allocated) and an authentication method.
Management: Every team gets its own connection, which maps one-to-one with their model. This architecture uses machine default credentials (MDC) with an impersonated service account as the authentication method.
- The Looker instance authenticates to Google Cloud using its own principal (the MDC).
- For each query, that principal impersonates a team-specific service account. This requires granting the MDC
roles/iam.serviceAccountTokenCreatoron each billing project. - This team-specific service account is granted roles/bigquery.dataViewer on the necessary datasets and roles/bigquery.jobUser on its team’s billing project.
- Grant the MDC
roles/bigquery.dataEditoron each project to enable PDTs. - This model ensures that query costs are billed to the team that owns the LookML model, not the user who is running the query. This incentivizes teams to write efficient LookML and provides a clear path for remediation when queries are slow or expensive.
- Note that MDCs are only available for Looker (Google Cloud Core); for either Looker-hosted or customer-hosted Looker (original) a downloaded JSON key for the service account credentials is needed, and typically a separate one is used for each connection.
With this in place, data access is primarily controlled by granting the team’s impersonated service account viewer access to BigQuery datasets. If a team needs access to a dataset in another team’s project, you simply grant their impersonated service account the necessary IAM role (roles/bigquery.dataViewer) on that dataset.
Models
Definition: A model file specifies a database connection and defines the Explores that use it. It’s where you define joins, relationships, and other model-wide settings. To be functional, every LookML model file (.model.lkml) must have a corresponding model configuration created in the Looker UI, which links the LookML model to its allowed connection(s).
Management: Each team has one model file inside their LookML folder. This file brings together all the views and Explores developed by that team. The model name should align with the team’s connection name, content folder name, and GCP project name for consistency.
Since each model maps one-to-one with a connection, the configuration is simple: in the model configuration settings, allow only the single connection created for that team.
Explores
Definition: An Explore is the starting point for a query in Looker. It defines a logical object that users can query, composed of a base view and one or more joined views.
Management: Explores can be defined directly in the model file, but consider instead defining them in their own dedicated files (e.g., my_explore.explore.lkml) inside an /explores sub-folder within the team’s main LookML folder. The team’s model file then includes all explores from that folder that use a wildcard: include:"/team-a-folder/explores/*.explore.lkml". In each explore file, only include the view files necessary for that explore. This approach improves validation times and simplifies Explore maintenance.
Here are a few rules of thumb for designing effective Explores:
- Create as few Explores as possible. Combine related use cases into a single, well-structured Explore to simplify the user experience. However, also avoid creating one giant Explore that tries to do everything.
- Stick to
many_to_onejoins. Though Looker’s symmetric aggregation feature makes it possible to model fanouts without introducing any aggregation errors, avoiding them will improve query performance. If you need to add information from a table that requires amany_to_manyjoin, consider pre-aggregating the right-hand table to a grain that allows for amany_to_onejoin instead. - Build around fact tables. A common pattern is to create one Explore per fact table, which joins to a shared constellation of dimension tables.
For sensitive data in the secret-project, you can use access filters or access grants to implement row-level or column-level security within an Explore. To ensure that these security rules are never accidentally removed, you can enforce their presence with a linter. For a detailed guide, see this Looker Community article: Keep data secure: Linting LookML for access filters and access grants with LAMS.
Content Folders
Definition: Content folders are where users save and organize dashboards and Looks.
Management: Each team gets a dedicated content folder. To maximize discoverability, the built-in “All Users” group is granted View access to all team folders.
Each team manages a corresponding Looker Group (e.g., “Team A Content Editors”) which is granted Manage Access, Edit permissions on their folder. The members of this group are responsible for curating the folder’s content — ensuring it is accurate, up-to-date, and well-organized. This list should be kept small to maintain a strong sense of ownership.
Architecture Diagram
The following diagram provides a visual representation of this architecture, illustrating how different teams, BigQuery projects, LookML projects, LookML folders, LookML models, connections, and repositories interact within the Looker ecosystem. It shows the flow of data access, code management, and query billing.
Conclusion
By implementing this team-centric architecture, you can build a Looker instance that scales gracefully, providing robust governance while empowering your teams to own their data products from end to end.
