I’ve been using my RYO Maintenance scripts for a little while now, and they’ve been working great. Almost. One day I noticed that the backup script hadn’t backed up all of the databases on my instance. I checked the job history, no errors. I checked the SQL Server log, no errors. I checked the Windows event logs, nothing. Odd, and a little disconcerting (to put it mildly). So, I asked the Great Gazoogle.
Apparently, I’m not the only one with this problem. Others have run into the same odd behavior, and while the fix was simple, I’ve yet to find an explanation as to the cause. There have been some hypotheses thrown out, but nothing definitive.
The common theory is that something about the backup process is changing records in the sys.databases catalog. When the cursor tries to move to the next row, it can’t. So far, due to the random nature of the problem, I’ve not been able to reproduce and debug it. If anyone out there has found the cause, I’d love to hear from you.
In the meantime, the fix: use a STATIC cursor. Like so:
DECLARE getdbs CURSOR STATIC FOR SELECT d.database_id, d.name FROM sys.databases d WHERE d.name <> 'tempdb' AND d.state = 0 AND d.name LIKE @dbs AND d.database_id > @diff_dbid --no system dbs for diff backups AND d.recovery_model < @trn_rm --no simple mode dbs for log backups
This will create a temporary, static, copy of the rows your cursor will be stepping through. That way, if the data in sys.databases is changing, it won’t effect your cursor. Another workaround would be to explicitly create a temporary table for your sys.databases records and cursor through that. Potayto. Potahto.
I’ve updated the script in my original post to use the STATIC cursor, but I wanted to bring this to your attention, just in case you happened to run into the same problem.
This happens due to a collision with Windows and the volume shadow copy service. You might not be using shadow copies, but if your sysadmin is running OS level backups, even if they have excluded your database files from those backups, the software will kick off the shadow copy service. The shadow copy service the comes through and connects into the SQL Server and winds up changing entries in sys.databases, because it is flipping backup bits, even though it isn’t ever running a backup. We had this happening on a system where our tran log backups kept failing, because the recovery model was being changed. The fix was to make sure that 3rd party SQL Server backup agents were removed from every system in the environment and the shadow copy service was disabled on all of our SQL Server.
Interesting! That would never have occurred to me as the culprit. I’ll have to investigate that further. Thanks Michael.
Fortunately this is not because of Volume shadow copy service.
I have tried to figure out this problem and it seems many times users posted the solution that if you make the cursor declaration insensitive then it starts working. So I tested it and yes it does just make sure you cursor declare is static and it starts working.
The fact for which it fails is – Check your server level cursor threshold setting – if it is configured as -1 then it means all cursors are getting populated synchronously in other words while trying to read cursor key set data is synchronous and all is tried to read at same time. If we change this value to 0 that tells SQL server to do a asynchronous population in simple words cursor can fetch the records while key set will still be populating and you will see that after doing this change at server level you will never miss any database using cursors.
Solutions: Either declare Cursor Static Or Change Server level setting “Cursor Threshold” to 0 from -1.
Thanks, Gaurav Mishra | Senior DBA
Hi Gaurav – I can’t confirm whether changing the cursor threshold setting will, in fact, resolve this issue. However, even if it did, I would be hesitant to make an instance-level configuration change, one that could have a serious impact on server performance and application behavior, just to resolve this specific issue. It’s a bit like using a backhoe to bury a goldfish.
Thank you for sharing this excellent workaround. What I cannot figure out is why not use standard, SSIS based maintenance plans? They are not known for skipping databases…
That’s a valid point, Dmytro. In my case, there are a couple of reasons. First, I prefer methods that are scriptable, and therefore easier to standardize across multiple systems. Now, it could just be a limitation in my knowledge of SSIS-based maintenance plans, but I know of no way of scripting them. But the second reason deals with backups in particular: I need to be able to write my backups across multiple locations. I can’t do that in an SSIS-based plan.
As you have mentioned you can use a temporary table but rather than using a cursor you could just create the temporary table and loop through it removing the row you have just used.
— Table variable to store a list of databases to backup
DECLARE @DbList TABLE (DBName VARCHAR(200))
— Variable to store the current database name in
DECLARE @DBName NVARCHAR(200)
— Insert db names into a table variable ignoring the system db’s
INSERT INTO @DbList (DBName)
SELECT [name]
FROM sys.databases
WHERE [name] NOT IN (‘master’,’model’,’msdb’,’distribution’)
— Begin Loop
WHILE (SELECT COUNT(DBName) FROM @DbList) 0
BEGIN
SELECT TOP 1 @DBName = DBName FROM @DbList
— code to handle backups / maintenance here
DELETE @DbList WHERE DBName = @DBName
END
Another valid workaround. Thanks Gareth!
Hi
I have had this exact same problem. Weird! I have a backup script similar to what you are using (but much less comprehensive). It works fine on two servers, but when scheduled on a third server twice a week one instance works, one doesn’t. Same symptoms – stops part way through the DBs, exits the job as successfully completed etc.
I tried changing the time it ran with limited success i.e. the job got further, but still failed without reporting an error. I have now changed the cursor to be STATIC, but will have to wait a week to see whether it will run correctly at its scheduled time. If not, will try a temp table and cursor through that, rather than relying on sys.databases.
The other issue that has come about as a result of this is that @@FETCH_STATUS is Global. Hadn’t thought about that before, but I don’t believe it is the problem in this case.
Thanks for your post
Phil
Glad it helped, Phil. Check out MIchael’s comment for a possible explanation.
Wow, I saw this weird behavior like 1 year ago with one of our customers and I never really thought it could be hapenning to others too.
Everyone in the dba team were scratching our heads but we didn’t give it much importance then, we came up with the same solution that you are proposing and simply moved on.
My best explanation then was that, like Phil Garty said, the @@fetch_status function is global and perhaps another cursor was conflicting with the backup job.
Well,
As far as I know it happens because the backup processes are run inside the FETCH…NEXT.
But a cursor has a specific time-out.
Your backup process is too long – bang, cursor is timed-out, your backup script has finished, you still have some databases without backups.
My solution (how I fixed it, at least) – don’t run it inside the FETCH…NEXT, create a string T-SQL statement, close your cursor, then execute the created statement using the EXEC().
Thank you! Because of problems with Snapamager backups, I created a job to temporarily backup databases to disk and on 2 SQL2K5 servers ran into this problem. I also got the feeling that it has to do with the cursor timing out, but could find no info on that. I have just added STATIC to my cursor statement and am curious of the results. If this does solve the problem, I guess it had nothing to do with time-outs.
(The shaddow copy service on both these servers is disabled.)
Henriette – You’re welcome, glad it helped. I’ve had no problems with databases being skipped since making that cursor static.
Just to let you know that this has solved my problem and my backups have been running fine after having added ‘STATIC’ to my cursor!
Thanks again!
FYI … If your instance has any “database snapshots” on it and you’re using a script like this. You’ll want to add the following line to the script, so it won’t try and backup the snapshots
AND d.source_database_id = NULL — no database snapshots
Great script – we’ve been using it for years 🙂