We have recently delved into the world of data form and its benefits are obvious, from collaboration, versioning, assertions etc. I am curious, how ubiquitously do people use it in their daily practise? Has it completely replaced the use of scheduled queues? How are people using it, a single repository for all different transformations or separate repositories per project?
Dataform is a robust tool designed for a range of data operations, including scheduled queries and transformations. It introduces several advantages over traditional methods, such as enhanced collaboration, versioning through integration with platforms like Git, and the ability to set assertions for data quality.
While Dataform is a relatively recent addition to the data tooling landscape, it’s quickly gaining traction. Many professionals are incorporating it into their daily workflows for scheduled queries and transformations. However, its adoption can vary based on factors like industry, company size, and specific data requirements. While many teams are embracing Dataform, others continue to rely on their established tools.
Users typically employ Dataform for scheduled queries and transformations in two primary ways:
A single repository for all different transformations: This centralized approach is prevalent as it fosters collaboration, facilitates sharing of transformations, and streamlines the management of dependencies between transformations.
Separate repositories per project: Larger organizations or those with multiple teams working on distinct projects might prefer this method. It allows each team to maintain its repository for transformations, optimizing performance and scalability.
The optimal strategy for using Dataform hinges on an organization’s unique requirements. Nevertheless, Dataform’s capabilities make it a compelling choice in various contexts. Here are some additional insights into the benefits of using Dataform:
Collaboration: Dataform’s integration with version control systems, like Git, simplifies collaboration on transformations. Multiple users can concurrently modify the same transformation, with changes being automatically tracked and merged. This collaborative feature can lead to significant time savings and ensures transformations remain current.
Versioning: With its support for version control, Dataform allows users to monitor transformations over time. If needed, they can revert to a previous state, aiding in debugging and ensuring the reliability of transformations.
Assertions: Dataform’s assertion feature is pivotal for maintaining data quality and integrity. After transformations, assertions validate the data against specific criteria or conditions, ensuring the desired results and catching potential errors early.
For those seeking a dynamic tool for scheduled queries and transformations, Dataform stands out. It offers numerous advantages over traditional methods and is witnessing a steady rise in adoption.
I haven’t checked back here in awhile but still wanted to contribute to this post.
My team previously had 100% of transformations, snapshots, data transfers,… set up via scheduled queries. With the introduction of Dataform I decided to let DF manage transformations and left scheduled queries to manage maintenance-type operations.
So this year I’ve been migrating all of our transformations, that end up in reporting tools, into DF workflows. Then leave the backups, snapshots, data transfers, and other maintenance operations in scheduled queries.
Is this the best solution? Probably not. I’d prefer to fully commit to one or the other but there doesn’t seem to be any published standard operating procedures or community supported best practices yet for DF.
I’m hesitant to mention it as it may break a forum rule but I reference DBT’s documentation for ideas on best practices and how to translate them into Dataform.
Your approach to integrating Dataform (DF) into your data workflow while still leveraging scheduled queries for maintenance operations is a strategic approach.
While there might not be a one-size-fits-all answer, your approach seems well-thought-out given the current landscape. As with any technology decision, it’s essential to periodically reassess the strategy as the tools evolve and as more best practices emerge.