Database Snapshots – Part 1 – An Introduction


Let’s say you support the database for a financial system, and periodically, whether it be monthly, quarterly, or yearly, they close out the period and run a bunch of reports before they let users back on the system to start making new transactions.  Depending on your system, the size of your database, and how those reports are written, that could take hours.

Or let’s say you have a datawarehouse environment that pulls from this financial database, and it has to be balanced against the source database.  It can be difficult and time consuming to balance against a moving target.

Or what if you wanted to offload both of the reporting and the datawarehouse extracts onto another server, so they wouldn’t impact other users?

Wouldn’t it be awesome if you could quickly create a read-only copy of that database at a specific point in time that your financial users could report against while opening the main database back up for next-period transactions, without hours of downtime?  Wouldn’t it be nice if your datawarehouse team had a static copy of the data to run their extracts and balancing against?  Wouldn’t it be a little slice of heaven if you could even offload all this onto another server??  Where do I sign?

The thing is, with database snapshots you can do all this, and more (yes more!).  Database snapshots were introduced in SQL Server 2005 and have some feature enhancements in SQL Server 2008.

What is a database snapshot?

In a nutshell, a snapshot is a read-only “copy” of your database as of a point in time.  I put “copy” in quotes because it’s not really a full copy of the database.  How database snapshots work is, initially an empty file is created, called a sparse file.  Only when data is changed in the original database does anything get written to the sparse file.  And what gets written is the data prior to the change.  So if you update a record in the customers table, before it updates the page SQL Server copies it in its original form to the sparse file, then it completes the update.  A picture’s worth a thousand words, so here are some words from the Microsoft.

How NTFS sparse files work

So, how do you know what data to pull from the snapshot and what data to pull from the original database?  You don’t need to know.  To read the snapshot, you treat it just as you would any other database.  It knows what data it has and what’s still in the original database and handles that logic behind the scenes.  Make sense?  Here are a couple more pics from Microsoft to illustrate the read process.  The first is at the initial snapshot creation, before any data changes.  You can see that, even though the user is querying the snapshot, SQL Server is reading entirely from the original database.

Reading from a new snapshot

Now let’s say someone’s gone in and changed a record in the database.  Prior to writing that page, SQL Server copies it to the snapshot.  Now if you query the snapshot, SQL Server knows to grab that page from the snapshot and the rest of the unchanged data from the database.  This is how you’re able to get a view of the data at a specific point in time, even with transactions going on.

image

How do I create a snapshot?

Unfortunately, you can’t create a snapshot via Management Studio, don’t ask me why.  So you’re stuck with T-Sql.  No biggie, though, it’s just a variation of the CREATE DATABASE statement.

USE master;
GO
CREATE DATABASE AW_Snap1
   ON
       ( NAME = 'AdventureWorks_Data',
         FILENAME = 'D:MSSQLDataAW_Snap1.ss')
AS SNAPSHOT OF Adventureworks;
GO

If you take a peek at the snapshot file at the filesystem level, it appears to be the same size as the AdventureWorks datafile.  But if you look at the file properties or query sys.dm_io_virtual_file_stats, you’ll see the actual size is very small, in this case 131072 bytes.

USE AW_Snap1;
GO
SELECT size_on_disk_bytes from sys.dm_io_virtual_file_stats (DB_ID(),1)    ;
GO
image

A quick query of the snapshot and original database returns the same data:

USE AW_Snap1;
GO
SELECT * FROM Person.Contact WHERE ContactID = 200;
SELECT * FROM AdventureWorks.Person.Contact WHERE ContactID = 200;
GO

image

Now let’s update some data and run the query again.

/*    Update the original database    */
USE AdventureWorks;
GO
UPDATE Person.Contact SET MiddleName = 'Eugene' WHERE ContactID = 200;
GO
/*    Query the snap again    */
USE AW_Snap1;
GO
SELECT * FROM Person.Contact WHERE ContactID = 200;
SELECT * FROM AdventureWorks.Person.Contact WHERE ContactID = 200;
GO

image

As you can see, querying the snapshot returns the original data whereas querying the database returns the updated data.  Also, if we check sys.dm_io_virtual_file_stats again, we can see that the physical size of the snapshot file has gone up to 196608 bytes.

You can’t refresh an existing snapshot.  If you want a more current snap of the database, you’ll need to create a new snapshot.  Be careful with this.  SQL Server will continue to maintain any old snapshots until they’re dropped, so the more snapshots you have laying around out there the more overhead you’re going to incur.  As a rule, if you don’t need a snapshot anymore, drop it.

USE master;
GO
DROP DATABASE AW_Snap1;
GO

Stay tuned for future installments in this series focusing on the fun things you can do with snapshots.

Also recommended:


Leave a comment

Your email address will not be published.

2 thoughts on “Database Snapshots – Part 1 – An Introduction