Ok kids, last time I introduced row compression, today I want to take the next step and talk about page compression. Now, whereas row compression removed NULL and 0 values from each individual record, page compression removes redundant values from a table. Redundant data is stored on the page only once, and subsequent occurrences will reference this value. So you can probably imagine the kind of space you could save in a table with many duplicate values.
So, how does it work? Glad you asked!
The first step in the page compression process is something we’re already familiar with: row compression. Makes sense, right? After all, if you’re trying to compress the data on a given page as much as possible, you should probably start by making each row as small as possible.
The second step in page compression is something called prefix compression. What the who? SQL Server looks at the beginning bytes all the columns on the page and sees if any of the values in that column share a common pattern. This common pattern is the prefix. If they do, SQL Server takes that byte string (prefix) and stores it at the beginning of the page, right after the page header. Any instances of the prefix in that column are then replaced with a reference to that prefix. Any given prefix is local to a specific column, so a value in columnB can’t reference a prefix for columnA. For this reason, prefix compression is sometimes referred to as column-prefix compression. I think a picture would help illustrate the concept. So here’s one I shamelessly “borrowed” from BOL.
Take a look at the first record before and after compression. That “4b” means the row should contain the first 4 bytes of the referenced prefix, plus the letter b. Also note how each value is only referencing a prefix in its own column. Got it? Good. If not, I think you will once you see the next step. Onwards!
The third, and final, step in page compression is dictionary compression. Dictionary compression is pretty much the same as prefix compression, only it applies to any column on the page. So a given prefix can be referenced by a value in columnA or columnB, just as “4b” is referenced by two different columns below.
And that, my friends, is page compression. Page compression can be enabled when the table is created, in which case rows will be compressed as they are inserted and the page will be compressed after the page is full and a new row is inserted. Once the page is compressed, if the new row will fit on the page, it’s placed there, if not a new page is created and the row is written there. Additionally, you can enable page compression on an existing table using the ALTER TABLE statement. Just as with row compression, though, this can be quite a resource-consuming process. And I keep saying “table” but in reality compression can be used on heaps, clustered indexes, non-clustered indexes, table and index partitions, and indexed views.
In my next post I’ll talk briefly about Unicode compression, Backup compression, and discuss some of the things you might want to keep in mind when using compression. Until next time, folks…