Page compression and performance


I know, I know.  This is the last one on compression, I promise.  Well, the last one for… at least a month.

So, we’ve talked about the different types of data compression and some things to keep in mind when deploying compression.  But we all know that the biggest factor that might keep you from using compression is the potential performance hit.  I mean, saving disk space is great and all, but if it means your app taking a 10%, 20%, even 50% performance hit?  Well, maybe not so much.  So how do you know what to row compress, what to page compress, and what to leave alone?

The first place I would start is with this awesome article on MSDN:  Data Compression: Strategy, Capacity Planning, and Best Practices.  If you don’t have time to read it now, print it out and read it later.  It’s well worth the paper.

Reads

In general with data compression, you get the biggest bang for your buck on tables/indexes that are scanned.  You’ll still see some performance benefit on single-row lookups, but it won’t be nearly as pronounced.  For this reason, data compression really shines in data warehouses.  To see this for myself, I set up some tests with 2 clustered tables with about 25 million records, one table using no compression, the other using page compression.

Before I show the test results, let’s take a quick look at the physical size difference between the tables using sp_spaceused:

I’d call that some pretty impressive space savings using page compression.  Ok, on to the SELECT tests.

For the first test, I queried on a range of key values (Clustered Index Seek).

set statistics io on
select COUNT(*) from mytab_no_compression where auindex between 215457 and 15555555;
select COUNT(*) from mytab_page_compression where auindex between 215457 and 15555555;

(1 row(s) affected)
Table ‘mytab_no_compression’. Scan count 5, logical reads 252577, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)
Table ‘mytab_page_compression’. Scan count 5, logical reads 56271, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The mytab_no_compression table required more than 4 times as many logical reads for the same query.

Next I queried on single key value (Clustered Index Seek).

set statistics io on
select COUNT(*) from mytab_no_compression where auindex = 2153545;
select COUNT(*) from mytab_page_compression where auindex = 2153545;

(1 row(s) affected)
Table ‘mytab_no_compression’. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)
Table ‘mytab_page_compression’. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Meh.  As I stated earlier, the performance gains on single-row queries aren’t very impressive.

My third test was a select on non-key value (Clustered Index Scan)

set statistics io on
select COUNT(*) from mytab_no_compression where aureason = '0518'
select COUNT(*) from mytab_page_compression where aureason = '0518'

(1 row(s) affected)
Table ‘mytab_no_compression’. Scan count 5, logical reads 422394, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)
Table ‘mytab_page_compression’. Scan count 5, logical reads 92418, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Here again we see a large performance gain with the page compressed table.

What about writes?

Well, I’m sold, how about you?  I know, I know.  Reads are one thing.  But what about writes?  What kind of impact does page compression have on OLTP applications?  The article I referenced earlier does a great job in exploring the effect of page compression on heavily updated tables, even providing a general formula for determining what tables should use page compression, row compression, or no compression.  In my environment, however, the application I’m looking at doesn’t really do updates.  It does delete/insert instead.  So how would page compression affect that?  And what about its impact on loads in our data warehouse?  More tests were needed.

The first write scenario I looked at was a large insert of roughly 25 million records.  There were 4 separate tests, I ran each test twice:

  1. Heap with no compression
  2. Heap with page compression
  3. Clustered index with no compression
  4. Clustered index with page compression

Ok, so we took a little hit on on the compressed heap load vs. the non-compressed heap, but personally I don’t think that’s too bad for 25 million records.  But holy schnikes Batman, look at the compressed clustered load!  It’s probably common sense, but really folks, build a heap and add the clustered index after it’s loaded, mkay?

Next I wanted to look at how compression affected singleton deletes/inserts.  I wrote a small bit of code that pulled 250,000 random records into a temp table and then cursored through them to delete the original record and insert a new record.  In this case I had 6 separate tests, again each was executed twice:

  1. Heap with no compression, index with no compression
  2. Heap with page compression, index with no compression
  3. Heap with no compression, index with page compression
  4. Heap with page compression, index with page compression
  5. Clustered index with no compression
  6. Clustered index with page compression

I’d call the first pairing a wash.  In the second pairing a slight advantage goes to the compressed heap with the compressed index.  In the third pairing, the clustered indexes, is where we see the biggest impact of page compression with a 5% average increase in the timings.  But nothing here would make me rethink deploying page compression in an environment that had the CPU cycles to spare.  And to that end, I also monitored processor utilization during my tests.  For the delete/insert tests average CPU utilization was .2% higher on the compressed tables, and on the initial load tests the average utilization was about 3% higher.  Keep in mind this was on a pretty beefy server with no concurrent usage, so your mileage may vary.

So there you have it.  I hope this series of posts was helpful, I know I learned quite a bit.  Have a great weekend!

Also recommended:

Leave a comment

Your email address will not be published.