Partitioning is a feature that was first introduced in SQL Server 2005, though in theory it existed well before that in the form of partitioned views. I never had the pleasure of working with these partitioned views but reading about them makes me shudder. But it was in SQL 2005 that partitioning really came into its own, and you could now partition the actual tables.
Confession: I’m a former Oracle DBA and I was used to how partitioning was implemented in that environment. So when I first looked into SQL Server partitioning, it was a bit confusing. What the heck is this partition function and partition scheme business?? Well, really, the concept is pretty simple to grasp, once you understand a couple key elements. And that’s what I’m here to explain to you today. Ready?
Partition Function. The partition function is a user defined function that maps out how data rows will be broken out into partitions. For example:
CREATE PARTITION FUNCTION pf_RangeByYear (datetime) AS RANGE RIGHT FOR VALUES ('20020101', '20030101', '20040101');
The function above will partition data based on a date column, using the boundaries we’ve provided. In this case, we’re partitioning the table into years. Notice the “RANGE RIGHT” clause. That tells us that any records with that boundary value will be placed in the partition to the right. In other words, the first partition will contain values < 2002/01/01, all values in 2001 and earlier. The next partition will contain values >= 2002/01/01 and < 2003/01/01, or all values for 2002.
Partition Scheme. The partition scheme maps out what filegroup each partition will be assigned to. Example:
CREATE PARTITION SCHEME ps_RangeByYear AS PARTITION pf_RangeByYear TO ( Filegroup1, Filegroup2, Filegroup3 );
With this code, we’re telling SQL Server where to physically place the partitions created by our partitioning function. For example, the 2007 values will go into Filegroup2. You don’t have to specify a different filegroup for each partition. In fact, you can have all the partitions go to the same filegroup if you want.
CREATE PARTITION SCHEME ps_RangeByYear AS PARTITION pf_RangeByYear ALL TO ( [PRIMARY] );
Partitioning a Table, step by step
The first step in partitioning any table is to determine the partitioning key and how many partitions to have. This is by far the hardest part, and the easiest thing to get wrong. You want to make sure you know your data and know how your application is using that data. For instance, if you’re working with a table that contains a lot of historical data which isn’t accessed often, you might partition it such that the most recent, and most accessed, data is in one partition and the rest is located in one or more “history” partitions. Kind of like we did above.
The next step is to map out where each partition will go, i.e. filegroups and create any new filegroups you need. If you’re luck enough to have faster and slower disk available to you, you also want to work with your SAN admin to map out where these filegroups should be physically located on disk. Obviously you’re going to want your more heavily accessed partitions on the faster disk.
Once we’ve done all the heavy lifting of planning all this out, the rest is cake. Create your partition function and your partition scheme in your database. Finally, you’re ready to create your partitioned table. Here I’m creating a copy of the Sales.SalesOrderHeader table and I’m partitioning it on the OrderDate using the function we created earlier, placing all my partitions in the primary filegroup.
CREATE TABLE [Sales].[SalesOrderHeader_Part]( [SalesOrderID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL, [RevisionNumber] [tinyint] NOT NULL, [OrderDate] [datetime] NOT NULL, [DueDate] [datetime] NOT NULL, [ShipDate] [datetime] NULL, [Status] [tinyint] NOT NULL, [OnlineOrderFlag] [dbo].[Flag] NOT NULL, [SalesOrderNumber] AS (isnull(N'SO'+CONVERT([nvarchar](23),[SalesOrderID],0),N'*** ERROR ***')), [PurchaseOrderNumber] [dbo].[OrderNumber] NULL, [AccountNumber] [dbo].[AccountNumber] NULL, [CustomerID] [int] NOT NULL, [ContactID] [int] NOT NULL, [SalesPersonID] [int] NULL, [TerritoryID] [int] NULL, [BillToAddressID] [int] NOT NULL, [ShipToAddressID] [int] NOT NULL, [ShipMethodID] [int] NOT NULL, [CreditCardID] [int] NULL, [CreditCardApprovalCode] [varchar](15) NULL, [CurrencyRateID] [int] NULL, [SubTotal] [money] NOT NULL, [TaxAmt] [money] NOT NULL, [Freight] [money] NOT NULL, [TotalDue] AS (isnull(([SubTotal]+[TaxAmt])+[Freight],(0))), [Comment] [nvarchar](128) NULL, [rowguid] [uniqueidentifier] ROWGUIDCOL NOT NULL, [ModifiedDate] [datetime] NOT NULL, ) ON ps_RangeByYear (OrderDate) GO
The only special syntax here is the ON clause. Where we would normally specify the filegroup, we’re now specifying the partition scheme and what column we want to partition on.
What about indexes?
By default any indexes on a partitioned table will use the same partitioning scheme. This means that, if the column contains the partitioning column, it will also be partitioned. When this happens we say that the index is aligned with the table. A unique index must contain the partitioning column. If a non-unique index does not contain the partitioning column, SQL Server will add it to the index definition as an included column in order to align the index to the table. You can also choose a different partitioning scheme for your index or not partition your index at all.
What about data compression?
If you’re using data compression you have the flexibility to compress or not compress individual partitions as best fits your situation. You can also use different compression types in different partitions.
When to use Partitioning
Maintaining partitioned tables takes work, so you probably want to make sure it’s worth your time and energy. You don’t want to partition just for the sake of partitioning. Make sure you have a real need, whether it be performance or maybe to make database maintenance easier. Large tables, like you find in a data warehouse, make excellent candidates because partitioning allows for better parallelism for large operations. The query optimizer can also eliminate entire partitions from queries, thus reducing the amount of data that needs to be read and processed. Index maintenance can be targeted to more dynamic partitions on a more frequent basis. And you can utilize filegroup backups to only backup those partitions that are changing.
What if I only have 1 disk/mount point?
In a perfect world all of your partitions would be in separate filegroups and the datafiles of all of those filegroups would be located on different LUNs assigned to physically separate disk arrays in your SAN. Yeah, and unicorns would fly out of my… ahem, let’s just say most of us live in the real world. Maybe your SAN admin only gives you one mount point/LUN on a RAID5 array. Does that mean you can’t reap the benefits of partitioning? Absolutely not. You can still use partitioning to ease administration as I mentioned earlier. And the optimizer will still use the partitioning to exclude partitions from query plans. So don’t let that physical limitation stop you from using partitioning if it makes sense for your situation.
Obviously partitioning is a big topic and really I’ve only scratched the surface here. For more information, here are some great resources.
P.S. A GIS of “unicorns rainbows” produces some, um, interesting images.