The state of the art in Snowflake Governance

Arnaud Col Mon April 29, 2024

The Context

Imagine you're the new CIO of an international group.

One of the first projects you want to undertake is to harmonize the data ecosystem.

You understand that the Data Mesh approach must provide strong autonomy to teams, but you don’t forget that one of the most important aspects of Data Mesh is Federated Governance.

This article discusses the implementation of Federated Governance over Snowflake.

 

A Bit of Vocabulary

In Snowflake, you have several containers at your disposal to “organize” your data.

Here is their hierarchy:

 
Snowflake Container Hierarchy

That’s all for the technical part. Let’s now talk about what makes data architectures exciting.

 

Snowflake Governance

To ensure proper data usage and management within Snowflake, you may want to monitor the following:

  • Data classification.
  • Application of mandatory tags (e.g., PRIVACY_CATEGORY and SEMANTIC_CATEGORY) to identify sensitive data.
  • Adherence to naming conventions.
  • Implementation of masking rules (masking policy).
  • Usage of data access restrictions (row access policy, aggregation policy, projection policy).
  • Access history to certain sensitive objects.
  • Financial tracking of resource usage.
  • Network security rules or management of computing units (warehouses) with specific rules (resource monitor).
  • Secure access management (SSO, MFA, key/pair).

All governance elements are now under a new umbrella called Snowflake Horizon, which continues to expand.

3 Options for Managing Your Governance

As I mentioned to grab your attention, we have three options for managing accounts:

1️⃣ Use a single account and separate departments into different databases.

2️⃣ Use multiple accounts and deploy your governance from your CI/CD.

3️⃣ Use multiple accounts, including a Zero Data Account that carries the governance.

 

Option 1: Single Account

It's possible, and indeed it has been done for years, to use a single Snowflake account and isolate departments into separate databases.

Example in a diagram:

Single Account

Advantages:

  • Joins are possible directly.
  • Governance is very simple: direct access to all metadata.

Disadvantages:

  • Data isolation will only be through role management.
  • Roles will proliferate, quickly leading to several dozen roles.

 

Option 2: Multiple Accounts with CI/CD Deployment

One can use a CIO account and one account per department capable of operating its own Snowflake. If this isn’t the case, it might be preferable to keep objects at the CIO account level and provide a complete service to subsidiaries.

To deploy governance elements (tags, masking policies, etc.), we won't manually execute scripts. No, not here.

We will use CI/CD (e.g., Github Actions + schemachange) to automatically execute scripts across the different Snowflake accounts.

We can also use direct git integration in Snowflake and code a procedure that manages deployments. I'll talk about this in a future article.

 
Multi-accounts, deployment by CI/CD

Note: You will not be able to perform joins from one account to another directly in your queries.

We will publish the shared object (table, view, etc.) on a Private Listing accessible to one or more accounts.

I see this constraint as an opportunity to manage the publication of data products more controlledly. Because when you know you are a data producer, you must provide your consumers with a quality experience (documentation, quality data, no changes to the interface contract, etc.).

Advantages:

  • Clearer separation of responsibilities and data.
  • Fewer roles.

Disadvantages:

  • Cannot perform joins but must go through publication operations.

 

Option 3: Multiple Accounts Including a Zero Data Account

The approach of a Zero Data Account is now possible with Snowflake thanks to the introduction of Replication Groups. But the principle is simple and widespread in DevOps approaches (e.g., AWS Control Tower).

Instead of using CI/CD, we will deploy governance elements directly from Snowflake through replication groups that will be deployed on other accounts in the same organization.

 
Replication from the Zero Data Account

We can centralize all previously mentioned governance information at this Zero Data Account level, which, as the name suggests, does not intend to host data.

The Zero Data Account must be at a Business Critical subscription level to use the governance object replication.

I haven’t found in the documentation whether the target accounts also need to be Business Critical, but it seems they do not.

To verify that accounts have properly used the governance elements, we could ask them to share tables like SNOWFLAKE.ACCOUNT_USAGE.TAG_REFERENCE with us.

But we can also talk to each other during federated governance meetings. I prefer that.

 

Advantages:

  • Deployment is managed from Snowflake.
  • Supervision and auditing are centralized.

Disadvantages:

  • A certain number of accounts are needed to justify creating the Zero

 

Conclusion

Using Snowflake is simple.

Administering Snowflake within a large company while ensuring governance consistency across various accounts requires a bit more thought.

Without that, where would the fun be?!

 

Sources

Webinar : Gouvernance des données dans un data mesh — DCWT 23 — Jade Le Van, Nicolas Lerose

Masterclass: Deliver a Domain-Driven Data Mesh Architecture Successfully

Partagez cet article

Nos Actus

Global Data
4 octobre 2023
Snowflake's Stock Exchange Debut: How Cloud BI Will Grow

Snowflake’s listing on the Stock Exchange in September 2020 allowed the company to accelerate its expansion and demonstrate its ambition:...

Hot news
29 avril 2024
Long Live Data Metric Functions!

A new object arrives in Snowflake... the Data Metric Function! You can use standard functions or create your own to measure the quality of your data....

2 mai 2024
What is Data for Breakfast?

"Data for Breakfast" is a series of international events organized by Snowflake, aimed at transforming businesses through strategic data use. Held in...