Snowflake Architecture 101: Getting the Best Out of the Popular Data Warehouse
It affects the speed of business processes and unlocks data-driven insights that result in faster and more accurate business decisions. As a data platform offering totally modern data warehouse architecture, Snowflake architecture enables explicit separation between three particular layers, which makes it easy to tailor to even complex business needs. How? Find out below.
When it comes to storing all your data, there are plenty of systems that do so efficiently. Although we can still rely on old but verified solutions, Snowflake architecture tackles this problem with a fresh approach. It connects data warehousing features with big data insights. This provides a very powerful tool that empowers us to harness the potential of analytical data workloads through advanced data engineering.
Being a SaaS (Software-as-a-Service) solution, Snowflake offers extreme flexibility and ease of use. That allows companies to shift data engineers to more creative and demanding tasks, handling simple ones, like managing Hadoop clusters, without the necessity of engaging human workforce. With the Snowflake data warehouse architecture model, it’s way more straightforward to create, maintain, and get the best results from your data.
How does Snowflake work?
Snowflake is a cloud data platform which offers data management simplicity. Four key takeaways of its’ offer are:
- Running completely on a cloud data provider of our choice (find out more in the “What is Snowflake architecture” chapter below) and is built for cloud from the ground up
- There is absolutely no hardware to select, configure, or maintain
- There is absolutely no software to install, configure, or maintain
- All the maintenance and updates are conducted by the Snowflake team with no downtime
Therefore, its design reduces data management from the user perspective to almost zero. Moreover, it provides a lot more features, ranging from traditional data warehouse database offerings to innovative features like time traveling or zero-copy data sharing, query processing or metadata management.
What is Snowflake architecture?
Snowflake’s documentation explains concisely what the architecture is:
Snowflake’s architecture is a hybrid of traditional shared disk architecture and shared nothing database architecture with Massively Parallel Processing.
Without explaining what each of these approaches means in detail, the main takeaway is that Snowflake takes full advantage of the benefits of both. It has a central data repository where it stores all the data (shared disk architecture) but still manages to apply MPP (Massively Parallel Processing) to immensly boost computing power (shared nothing architecture).
Moreover, the third approach, shared-data, stems out. Snowflake architecture is heavily based on metadata management, therefore sharing cloud data between users or accounts is in majority of cases complimentary.
What are the three layers of Snowflake architecture?
Apart from the overall view, let’s zoom in a little bit to discover three main layers of Snowflake data warehouse architecture:
- Database storage layer
- Query processing layer
- Cloud services layer
There are several sources though that include even a fourth one, which is something we already discussed above - cloud agnostic layer, which allows us to switch cloud provider with relatively little effort. Snowflake behavior will last the same from the user’s perspective, no matter the cloud provider we choose. Therefore, let’s focus on the remaining three layers below.
Database storage layer
Even though business data is shown in a plain table format, it’s highly inaccurate to assume it’s stored like this in a cloud storage layer. In fact, when putting data into Snowflake (after loading into a special place called stage), it is reorganized to be as optimized as it can be. This is a place for compression, partitioning, clustering, and other optimization techniques (like fetching metadata).
Additionally, it’s important to remember that it’s not possible to get to the data (or other objects) by using our cloud provider’s accounts. It may seem inconvenient, but in reality, it is a great extra layer of security. Everything that Snowflake stores can be accessed only by its interfaces.
Query processing layer
A query processing or computing layer is a place where the real magic starts. It’s one of the main features that everybody should remember while considering Snowflake architecture and the platform itself - processing and data storage are explicitly decoupled.
The concept of virtual warehouses is used and basically, means computing resources that can be utilized between multiple teams or even tasks. Each warehouse is a completely independent compute cluster, although all have access to the same data. Real access relies on role permissions that are attached to particular users.
Cloud services layer
Apart from data storage and compute clusters, there are plenty of other functionalities that need to be provided. The cloud services layer prepares the whole collection of features that users may need during working with Snowflake. It includes the following:
- Authentication and security
- Data management simplicity
- Infrastructure management
- Metadata management
- Query parsing and optimization
- Access control
As you can see, even though in the beginning it sounded rather supplementary, cloud services play a crucial role in the whole Snowflake data warehouse architecture.
Building efficient data architecture with Snowflake
Regardless of which product your organization chooses in the end, there are plenty of considerations to bear in mind when building data warehouse architecture. If Snowflake is your choice, below you can find a handy recipe to help you set up a powerful data platform:
- Remember the importance of database storage and compute decoupling. Create a distinct virtual warehouse for each group in your organization (sales, data science) or task category (bulk loading). Even though compute resources will be different, they will still have access to the same data architecture by default.
- Experiment with warehouse sizes. There’s no silver bullet standard so you need to find the best configuration for your own purposes
- Automate whatever you can by leveraging Snowflake’s built-in mechanisms. Streams, tasks, zero-copy cloning and time traveling are just the tip of the iceberg of what Snowflake can do. Making use of these features can create much more for your organization than just another data warehouse.
- Research thoroughly what operations and features are costless. It may turn out that the metadata management Snowflake implements can save your organization lots of money. On the other hand, there may be functionalities that will surprise you with accrued costs - so be aware of them as well!
- Apply resource monitoring for virtual warehouses, as well as costs notifications and restrictions, so you don’t need to worry all the time about your organization’s budget.
When is Snowflake your best choice?
When compared to data warehouses, Snowflake has its pros and cons. AWS Redshift, GCP BigQuery, Azure Synapse Analytics, Firebolt - each one has its own advantages and drawbacks. Nevertheless, Snowflake has a few characteristics that put it on the top of products to consider when thinking about data warehousing:
- Ease of use and full automation - it goes without saying that today’s business is made with insights driven by a straightforward, yet powerful tool. Snowflake takes care of almost all administration, allowing organizations to fully focus on what they know the best.
- Extra features - it’s not only about data storage and simple data analytics queries anymore. Snowflake provides a compelling set of extra functionalities that can make your life easier, e.g., time traveling, tasks execution, zero-copy data sharing, metadata management, and masking policies. As a Snowflake power-user, we know all about how helpful these features are.
- Security and encryption - Snowflake takes care of comprehensive data security, both at rest and in motion, with AES-256 encryption. Fulfilling strict security requirements enables businesses to operate, even if they have highly sensitive data onboard (SOC1, SOC2, HIPAA, PCI DSS compliance).
- Cost management - by decoupling data storage and compute resources, it’s much easier to control costs on all fronts. Moreover, admin accounts have very detailed information about all spending, so they can be optimized in real time.
- Flexibility - not sure yet which cloud provider to choose, or what exact infrastructure to set up? With Snowflake, it’s convenient to administer and modify things that stay strictly consistent among the rest of the data warehouse providers.
If the above list sounds like a set of pros that your organization can benefit from exceedingly, the choice is very simple! Snowflake has its own set of restrictions but they diminish with the speed of light when placed next to such powerful advantages.
Ready to get the best from Snowflake architecture?
Creating a data warehouse that fits into your organization’s ecosystem is not an easy task. In fact, in some cases, it may take even months, if not years to do so. It’s crucial to choose the proper methodology and tools right at the start of this challenging task.
Among others, Snowflake seems to be a very reasonable choice. With its storage and compute decoupling, and many other features mentioned above, it provides a powerful, flexible data platform. If used wisely, Snowflake can be a top-notch solution that will help you leverage the organization’s data warehousing architecture and data analytics capabilities.
Data science holds many potential benefits for business, whatever industry you’re in. The only thing you need to harness its potential, is a team of great experts.