SQL Server A to Z – Heaps


The Trash Heap

What a heap might look like.

Happy Monday, SQL peeps. I hope everyone had a great weekend. I, for one, had so much fun on Saturday, that I spent Sunday recovering. When did I get old?  Anyway, time for another installment of SQL Server A to Z. Today we’re talking about heaps.

What is a heap?

A heap is a table without a clustered index, it’s a collection of unordered records. When you insert new records into a heap, you can never be sure where they’ll be physically located, it all depends on where there’s available space. If the table is one that experiences a lot of inserts and deletes, a new record could be physically inserted into an empty space left by a previous delete. If the table is one that only sees inserts, all the inserts will be appended at the end of the table.

How is a heap structured?

Physically, a heap is made up of one or more Index Allocation Map (IAM) pages and data pages. An IAM page is a page that tracks 4GB of space in a datafile. If the datafile is bigger than 4GB, or if the database has multiple datafiles, the individual IAM pages are linked together. When you perform a table scan of a heap, SQL Server really scans through these IAM files to find all of the extents that contain data for the heap. Non-clustered indexes that are created on heaps contain the index key and the physical record id (RID) that tells SQL Server where to find the corresponding record in the heap.

What’s the problem with a heap?

There are a couple of potential problems with heaps. The first is that, if no indexes exist on the heap, any singleton lookups cause a full scan of the table. Not too efficient. Another issue is something called forwarded records. Imagine you have a table with a varchar(2000) column in it, and you insert a bunch of records, and the records just fill up a page. Now let’s say you update one of those records and expand the data in that varchar field from 20 characters to 2000. Suddenly that record is too big to fit on the existing page. Now, SQL Server could either move that record and update any non-clustered indexes with the new RID (mucho overhead). Or, it could move the record and leave a marker at its former location that says “Hey, I’m over here now!”. And that’s what SQL Server does, and that, my friends, is a forwarded record. As you might imagine, while saving resources on the update operation, forwarded records do have an impact on read operations, and they take up space in the datafile.

So how do I know if I’ve got forwarded records? And what do I do about them?

The way to know whether forwarded records are a problem in your database is by querying the sys.dm_db_index_physical_stats DMV using the DETAILED option.

SELECT OBJECT_NAME(object_id), forwarded_record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'DETAILED')
WHERE index_level = 0
AND forwarded_record_count >0
ORDER BY forwarded_record_count DESC;
GO

This query will return all heaps that contain forwarded records, that is to say, all fragmented tables. The simplest way to correct the problem and get rid of the forwarded records is to build a clustered index on the table and then drop the clustered index. This will physically reorder all the rows based on the clustered index key. Keep in mind that this will also rebuild any non-clustered indexes on the table, so you might just want to drop them first and recreate them when you’re done. Also, this is only a stop-gap measure. The forwarded records will eventually creep back in, and you’ll find yourself repeating this process somewhere down the road.

Also recommended:

Leave a comment

Your email address will not be published.