Well, the PASS Summit has come and gone and it was *awesome*. What a difference from my first Summit experience, when I went to every session I could and then went back to the hotel. This time around there was something going on every evening and I made the most of it. I think the fact that I slept for 14 hours when I got home on Saturday speaks volumes.
Besides sessions and networking, I also managed to pass 2 of my certification exams last week. That leaves the much-dreaded 70-463 exam. Dun-Dun-Duuunnnn! So it’s time to start preparing for that exam, and as I go along I’ll post here so you can study along with me. Ready?
Exam 70-463 focuses on implementing a data warehouse in SQL Server. Obviously this is a pretty big topic that goes way beyond even what the exam covers, but let’s set the foundation with some fundamentals. Most of us are very familiar with OLTP database design. It’s highly normalized and great for line of business (LOB) applications that are primarily using lots of small transactions of very short duration. The problem with a normalized design is that it’s not so great for analytical reporting. First of all, normalization means joining many tables which leads to poorer query performance and also makes for much more complex queries. The other problem is that all of your LOB data reside in different databases, so comparing or aggregating data between different systems is difficult. To get around this, we build a data warehouse where all of our disparate OLTP data can be merged for analysis and historical reporting. What makes the data warehouse so great for this type of reporting is its fundamental design, and that’s what we’re going to cover in this post.
Tables
There are two main types of tables in the data warehouse environment: fact tables and dimension tables.
- Fact – Fact tables store quantitative information for analysis or reporting. An example of this would be a sales table like the FactInternetSales table in the AdventureWorksDW2012 database. Every row in the FactInternetSales table records a sale that happened. Customer A purchased Product B on date C, and so on. However, your typical sales fact table wouldn’t contain the actual customer information. Take a look at FactInternetSales and you won’t see customer data. Rather it stores a foreign key reference to the customer data. And that’s where dimension tables come in.
- Dimension – A dimension table contains descriptive attributes that can be used to filter, group, and label facts in the fact table. They provide context. For example, the FactInternetSales table mentioned earlier would likely be joined to a Customers dimension table, as well as a Products dimension. The more dimensions a fact table is associated with, the more ways you can slice the data. This is referred to as dimensionality or granularity.
Schemas
It’s the logical design or schema of the data warehouse, or how those fact and dimension tables are laid out and connected to each other, that makes the data warehouse so great for reporting. And your data warehouse will typically use one of two schemas: the Star schema and the Snowflake schema.
- Star schema – in a Star schema, a central fact table is surrounded by a single layer of dimension tables. One Star schema will cover one business area in the DW, but you can (and probably will) have multiple Star schemas in your DW. These schemas should be connected by sharing dimension tables (shared dimensions). If the various Star schemas in your DW don’t share dimension tables, you lose the connection between fact tables, making it difficult, if not impossible, to compare data between different fact tables.
- Snowflake schema – a Snowflake schema will start out as a Star schema, but then some of the dimensions are normalized, creating the snowflake-like pattern. An advantage of the Snowflake schema is that is can make hierarchies more apparent. For example, a Product dimension might be normalized to join to a ProductSubCategory dimension, and that might in turn be joined to a ProductCategory dimension. In general, Snowflake schemas are discouraged for performance (more joins) and maintainability reasons.
Summary
Today we covered the two primary types of table found in the data warehouse, the fact table that holds your quantitative data and the dimension table that holds your qualitative, contextual data. We also reviewed the two most common schema designs, the Star schema and the Snowflake schema. Next time we’ll talk more about dimension tables, their columns, and how to handle updates.
Nice post! I just started studying for the 70-463 as well. Good luck! In regards to snowflaking a couple more disadvantages worth noting:
1. Snowflaked dimension tables penalize cross attribute browsing and prohibit the use of bit mapped indexes.
2. Snowflaking also adds overhead to cube processing time and query processing time. The reason they add overhead is because the data from all involved tables must be joined at the time of processing. This means that the data must be sorted and matched prior to being fetched or loading into the OLAP dimension.
Thanks Anthony!
Hello Colleen. I passed 70-461 and 70-462 this time last year and have been putting off 70-463. I started looking at it last week so will look forward to following your posts on the subject. I was hoping to take the exam before the new year, but we’ll see…
Good luck Simon! Let me know how it goes.
Pingback: Exam 70-463 – Implementing a data warehouse | MS Excel | Power Pivot | DAX
Your articles is easy to understand and not too much reading at one time.
I will follow your series on my way to the 70-463 exam.
Thanks for your great work!
Pingback: Exam Prep 70-463: More about Dimensions | ColleenMorrow.com
Pingback: Exam Prep 70-463: Just the facts, ma'am | ColleenMorrow.com
Pingback: Exam Prep 70-463: Intro to Columnstore Indexes | ColleenMorrow.com
Pingback: Exam Prep 70-463: Columnstore Indexes and Partitioning | ColleenMorrow.com
Pingback: Exam Prep 70-463: Introduction to SSDT | ColleenMorrow.com
Pingback: Exam Prep 70-463: SSIS Connection Managers | ColleenMorrow.com
Pingback: Exam Prep 70-463: Control Flow Tasks and Containers | ColleenMorrow.com
Pingback: Exam Prep 70-463: Control Flow Tasks and Containers - SQL Server - SQL Server - Toad World
Pingback: Exam Prep 70-463: Control Flow Tasks and Containers | blog1
Pingback: Exam Prep 70-463: Data Flow Source Adapters | ColleenMorrow.com
Pingback: Exam Prep 70-463: Data Flow Source Adapters - SQL Server - SQL Server - Toad World
Pingback: Exam Prep 70-463: Data Flow Destination Adapters | ColleenMorrow.com
Pingback: Exam Prep 70-463: Data Flow Destination Adapters - SQL Server - SQL Server - Toad World
Pingback: Exam Prep 70-463: Data Flow Transformations | ColleenMorrow.com
Pingback: Exam Prep 70-463: Data Flow Transformations - SQL Server - SQL Server - Toad World
Pingback: Exam Prep 70-463: Variables & Parameters | ColleenMorrow.com
Pingback: Exam Prep 70-463: Variables and Parameters - SQL Server - SQL Server - Toad World
Pingback: Exam Prep 70-463: SSIS Expressions | ColleenMorrow.com
Pingback: Exam Prep 70-463: SSIS Expressions - SQL Server - SQL Server - Toad World
Pingback: Exam Prep 70-463: SSIS Transactions and Checkpoints - SQL Server - SQL Server - Toad World
Pingback: Exam Prep 70-463: Logging in SSIS 2012 | ColleenMorrow.com
Pingback: Exam Prep 70-463: Deploying projects in SSIS 2012 | ColleenMorrow.com
Pingback: Exam Prep 70-463: Logging in SSIS 2012 - SQL Server - SQL Server - Toad World
Pingback: Exam Prep 70-463: Deploying projects in SSIS 2012 - SQL Server - SQL Server - Toad World
Pingback: Exam Prep 70-463: Executing packages in SSIS 2012 | ColleenMorrow.com
Pingback: Exam Prep 70-463: Executing packages in SSIS 2012 - SQL Server - SQL Server - Toad World
Pingback: Exam Prep 70-463: Introduction to Master Data - SQL Server - SQL Server - Toad World
Pingback: Exam Prep 70-463: Installing Master Data Services | ColleenMorrow.com
Pingback: Exam Prep 70-463: Installing Master Data Services - SQL Server - SQL Server - Toad World
Pingback: Exam Prep 70-463: Master Data Services Models | ColleenMorrow.com
Pingback: Exam Prep 70-463: Master Data Services Models - SQL Server - SQL Server - Toad World
Thanks very much for this article. Explained in very easy to understand language. Definitely this will help for my 70-463 examination.
I want to write the 70-463 exam, but before I do that what should I study and practice. Can someone guide me as to how to do the basics ?
How to create staging tables from say AdventureWorks database, then create dimensions and cubes (dimensional and tabular), then do the measures, etc. Then deploy it to SSAS, and finally view the cube via a report and SharePoint.
There is a lot of info on the internet, but not a website we one can follow a step-by-step approach to get what I mentioned above.
Thanks
Hi Coleen: I just passed the 70-463 exam, and used your great blog series as a ‘tune-up’ as part of my preparations. Thank you for your effort and contribution to the SQL Server community. I’m sure you’ll do well when the time comes for you to write. Good luck!
Thanks Simon, and congrats! I passed the exam, I just never updated the blog to let people know. 🙂