Product Announcement
We’re excited to share that we now have three new API endpoints available for PDT management starting with Looker version 22.2. These new API endpoints provide finer-grained control over PDT management and pave the way for powerful integrations with workflow orchestration tools.
This new functionality is gated behind the “develop” permission, so a user would need to have this permission in order to use the API for PDT management.
Getting started
In order to manage PDTs via the API, a toggle called Enable PDT API Control needs to be enabled in the connection settings. This is a new checkbox that will appear at the bottom of the settings page for any existing or new database connection starting in Looker version 22.2.
The New Endpoints
To provide more robust PDT scalability, we now have three new API endpoints that can be used for orchestrating PDT transformation processes outside of Looker.
Start
The start endpoint can be used to start building a specified PDT based on the model name and view name that you input. It will also generate a materialization_id for the PDT build job.
We know many customers have been looking for a way to rebuild incremental PDTs in full. The force_full_incremental parameter can be used for this by simply setting this parameter to true. When this parameter is set to false, which is the default behavior, an incremental PDT will rebuild incrementally based on the definition set in the LookML.
Stop
The stop endpoint can be used to cancel the materialization of a PDT. This requires input of a materialization_id to ensure cancellation of the appropriate build job (which can be retrieved when starting a PDT build via the API).
Status
The check status endpoint can be used to check materialization status for a given PDT.
Because a materialization ID is required as an input for the stop and check status API calls, only PDTs that have been started via API can be monitored or canceled via API at this time.
Rebuild Behavior
The behavior of PDT builds that are triggered using the API will be consistent with builds triggered manually using the “Rebuild Derived Tables & Run” option in the Explore dropdown menu.
When a PDT is rebuilt via the API, the last triggered time will be updated to the API build time. This means that:
-
The clock for PDTs using persist_for or
interval_triggerwill start (or restart) based on the API rebuild time -
PDTs with a
sql_trigger_valueordatagroup_triggerwill consider the API materialization to be a forced rebuild and will compute a new trigger value at the start of the build.
Cascading Persistent Derived Tables
When you use the “Rebuild Derived Tables & Run” functionality to manually rebuild a PDT with dependencies, all upstream PDTs that it depends on will be rebuilt automatically. Rebuilding using “Rebuild Derived Tables & Run” will not force rebuilds of downstream PDTs.
This rebuild behavior is slightly different when forcing a rebuild via the new API functionality. A PDT directly requested via the API will be rebuilt, but any upstream PDTs that it depends on will not be rebuilt automatically. In order to rebuild a PDT and its dependencies, the force_rebuild parameter must be set to true when the build request is made. Note that neither using “Rebuild Derived Tables & Run” from within Looker nor rebuilding via the API with the force_rebuild parameter will force rebuilds of downstream PDTs.
In order to complete a full rebuild of an incremental PDT and force rebuilds of any PDT dependencies, you would need to set the following parameters:
-
force_full_incremental=true -
force_rebuild=true
In this situation, the target PDT will be rebuilt in full and any upstream PDTs will be rebuilt according to their LookML definitions (i.e. upstream incremental PDTs will be triggered to rebuild a new increment but would not fully rebuild).
Persistence Strategies
For derived tables that will be managed primarily outside of Looker, implementing a strategy that persists the table without triggering the regenerator from periodically rebuilding it is recommended. There are a few ways in which this could be accomplished:
-
Using single-firing SQL trigger statements, such as SELECT 1
-
Setting a long persistence time period using
persist_fororinterval_trigger
Using a SQL trigger like SELECT 1 will ensure that the internal regenerator never kicks in and builds the table, but it will still ping the database regularly. Using persist_for or interval_trigger would avoid regular database pings, but if the set timeframe is reached, then the table would be dropped or rebuilt.
The Looker team is currently working on adding a new PDT parameter to indicate if a PDT is externally managed. Once available, adding this new parameter to a derived table will enable you to persist that table without setting a persistence strategy inside of Looker.
Check it out
An easy way to get started with these new endpoints is by using the API Explorer. Check out the parameters available and even try using the API to trigger a rebuild! You’ll be able to see build status in-product in the Queries section of the Admin Panel. You will also see information for each table in the Persistent Derived Tables section of the Admin Panel.
And these new endpoints are just the beginning! We have new integrations with Apache Airflow and Cloud Composer coming your way soon to provide even more scalable mechanisms for monitoring and managing PDTs. Stay tuned!