Backup script skipping databases – and the fix!


A real head-scratcher

I’ve been using my RYO Maintenance scripts for a little while now, and they’ve been working great. Almost. One day I noticed that the backup script hadn’t backed up all of the databases on my instance. I checked the job history, no errors. I checked the SQL Server log, no errors. I checked the Windows event logs, nothing. Odd, and a little disconcerting (to put it mildly). So, I asked the Great Gazoogle.

Apparently, I’m not the only one with this problem. Others have run into the same odd behavior, and while the fix was simple, I’ve yet to find an explanation as to the cause. There have been some hypotheses thrown out, but nothing definitive.

The common theory is that something about the backup process is changing records in the sys.databases catalog. When the cursor tries to move to the next row, it can’t. So far, due to the random nature of the problem, I’ve not been able to reproduce and debug it. If anyone out there has found the cause, I’d love to hear from you.

In the meantime, the fix: use a STATIC cursor. Like so:

DECLARE getdbs CURSOR STATIC FOR
    SELECT d.database_id, d.name FROM sys.databases d
    WHERE d.name <> 'tempdb'
    AND d.state = 0
    AND d.name LIKE @dbs
    AND d.database_id > @diff_dbid       --no system dbs for diff backups
    AND d.recovery_model < @trn_rm       --no simple mode dbs for log backups
 

This will create a temporary, static, copy of the rows your cursor will be stepping through. That way, if the data in sys.databases is changing, it won’t effect your cursor. Another workaround would be to explicitly create a temporary table for your sys.databases records and cursor through that. Potayto. Potahto.

I’ve updated the script in my original post to use the STATIC cursor, but I wanted to bring this to your attention, just in case you happened to run into the same problem.

Also recommended:


Leave a comment

Your email address will not be published.

16 thoughts on “Backup script skipping databases – and the fix!

  • Michael Hotek

    This happens due to a collision with Windows and the volume shadow copy service. You might not be using shadow copies, but if your sysadmin is running OS level backups, even if they have excluded your database files from those backups, the software will kick off the shadow copy service. The shadow copy service the comes through and connects into the SQL Server and winds up changing entries in sys.databases, because it is flipping backup bits, even though it isn’t ever running a backup. We had this happening on a system where our tran log backups kept failing, because the recovery model was being changed. The fix was to make sure that 3rd party SQL Server backup agents were removed from every system in the environment and the shadow copy service was disabled on all of our SQL Server.

    • Gaurav Mishra

      Fortunately this is not because of Volume shadow copy service.

      I have tried to figure out this problem and it seems many times users posted the solution that if you make the cursor declaration insensitive then it starts working. So I tested it and yes it does just make sure you cursor declare is static and it starts working.

      The fact for which it fails is – Check your server level cursor threshold setting – if it is configured as -1 then it means all cursors are getting populated synchronously in other words while trying to read cursor key set data is synchronous and all is tried to read at same time. If we change this value to 0 that tells SQL server to do a asynchronous population in simple words cursor can fetch the records while key set will still be populating and you will see that after doing this change at server level you will never miss any database using cursors.

      Solutions: Either declare Cursor Static Or Change Server level setting “Cursor Threshold” to 0 from -1.

      Thanks, Gaurav Mishra | Senior DBA

      • Colleen M. Morrow Post author

        Hi Gaurav – I can’t confirm whether changing the cursor threshold setting will, in fact, resolve this issue. However, even if it did, I would be hesitant to make an instance-level configuration change, one that could have a serious impact on server performance and application behavior, just to resolve this specific issue. It’s a bit like using a backhoe to bury a goldfish.

  • Dmytro

    Thank you for sharing this excellent workaround. What I cannot figure out is why not use standard, SSIS based maintenance plans? They are not known for skipping databases…

    • Colleen M. Morrow Post author

      That’s a valid point, Dmytro. In my case, there are a couple of reasons. First, I prefer methods that are scriptable, and therefore easier to standardize across multiple systems. Now, it could just be a limitation in my knowledge of SSIS-based maintenance plans, but I know of no way of scripting them. But the second reason deals with backups in particular: I need to be able to write my backups across multiple locations. I can’t do that in an SSIS-based plan.

  • Gareth Ashby

    As you have mentioned you can use a temporary table but rather than using a cursor you could just create the temporary table and loop through it removing the row you have just used.

    — Table variable to store a list of databases to backup
    DECLARE @DbList TABLE (DBName VARCHAR(200))

    — Variable to store the current database name in
    DECLARE @DBName NVARCHAR(200)

    — Insert db names into a table variable ignoring the system db’s
    INSERT INTO @DbList (DBName)
    SELECT [name]
    FROM sys.databases
    WHERE [name] NOT IN (‘master’,’model’,’msdb’,’distribution’)

    — Begin Loop
    WHILE (SELECT COUNT(DBName) FROM @DbList) 0
    BEGIN
    SELECT TOP 1 @DBName = DBName FROM @DbList

    — code to handle backups / maintenance here

    DELETE @DbList WHERE DBName = @DBName
    END

  • Phil Garty

    Hi
    I have had this exact same problem. Weird! I have a backup script similar to what you are using (but much less comprehensive). It works fine on two servers, but when scheduled on a third server twice a week one instance works, one doesn’t. Same symptoms – stops part way through the DBs, exits the job as successfully completed etc.
    I tried changing the time it ran with limited success i.e. the job got further, but still failed without reporting an error. I have now changed the cursor to be STATIC, but will have to wait a week to see whether it will run correctly at its scheduled time. If not, will try a temp table and cursor through that, rather than relying on sys.databases.

    The other issue that has come about as a result of this is that @@FETCH_STATUS is Global. Hadn’t thought about that before, but I don’t believe it is the problem in this case.

    Thanks for your post

    Phil

  • Aaron

    Wow, I saw this weird behavior like 1 year ago with one of our customers and I never really thought it could be hapenning to others too.
    Everyone in the dba team were scratching our heads but we didn’t give it much importance then, we came up with the same solution that you are proposing and simply moved on.

    My best explanation then was that, like Phil Garty said, the @@fetch_status function is global and perhaps another cursor was conflicting with the backup job.

  • Simon Murin

    Well,

    As far as I know it happens because the backup processes are run inside the FETCH…NEXT.

    But a cursor has a specific time-out.

    Your backup process is too long – bang, cursor is timed-out, your backup script has finished, you still have some databases without backups.

    My solution (how I fixed it, at least) – don’t run it inside the FETCH…NEXT, create a string T-SQL statement, close your cursor, then execute the created statement using the EXEC().

  • Henriette

    Thank you! Because of problems with Snapamager backups, I created a job to temporarily backup databases to disk and on 2 SQL2K5 servers ran into this problem. I also got the feeling that it has to do with the cursor timing out, but could find no info on that. I have just added STATIC to my cursor statement and am curious of the results. If this does solve the problem, I guess it had nothing to do with time-outs.

    (The shaddow copy service on both these servers is disabled.)

  • Henriette

    Just to let you know that this has solved my problem and my backups have been running fine after having added ‘STATIC’ to my cursor!
    Thanks again!

  • Harry

    FYI … If your instance has any “database snapshots” on it and you’re using a script like this. You’ll want to add the following line to the script, so it won’t try and backup the snapshots

    AND d.source_database_id = NULL — no database snapshots

    Great script – we’ve been using it for years 🙂