Quantcast
Channel: Articles tagged with SQL Server - Simple Talk
Viewing all articles
Browse latest Browse all 62

It’s a recovery strategy, not a backup strategy

$
0
0

I’ve talked about it before; you shouldn’t have a backup strategy, you should have a recovery strategy. I can’t possibly care if my backups succeed if I’m not bothering to test that they can be restored. And if they can’t be restored then, both technically and practically, I don’t have backups.

In one of the systems I manage, they built a very simple “test restore” process long before I became involved. Every night, it would pull the full backup for each database, restore it on a test system, and run DBCC CHECKDB against it. It would alert on any failure, of course, but the primary purpose was to always be confident that the backups could, in fact, be restored.

The process could afford to be simple because it assumed the following would always be true:

  • All databases took full backups every night.
  • Each database was backed up to a single .bak file named <DatabaseName>.FULL.<YYYYMMDD>.bak.
  • Each database we were restoring this way consisted of a single data file (called DataFile) and a single log file (LogFile).
  • All databases were backed up to the same location (and in a subfolder exactly matching the database name), and all databases would always be able to fit in that location.
  • The biggest database, once restored, would always fit on a single drive on the test system.
  • All databases could be restored and CHECKDB‘d in under 24 hours.

There is a table that tells the code which databases to restore:

CREATE TABLE dbo.SourceDatabases
 (
   DatabaseName sysname
   /* other columns not pertinent to this post */
 );

And another table to tell the code, for example, where to find backups, and where to move data/log files on restore:

CREATE TABLE dbo.RestoreConfig
 (
   SourceFolder           nvarchar(4000),
   DestinationDataFolder  nvarchar(4000),
   DestinationLogFolder   nvarchar(4000),
   TempDBFolder           nvarchar(4000),
   ...
 );

 INSERT dbo.RestoreConfig
 (
   SourceFolder,
   DestinationDataFolder,
   DestinationLogFolder,
   TempDBFolder
 )
 VALUES
 (
   N'\\corp-backup-share\sql-backups\', 
   N'D:\SQL\Data\',
   N'L:\SQL\Log\',
   N'T:\SQL\TempDB\'
 );

Then the (vastly simplified) code looked something like this:

DECLARE @Source       nvarchar(4000), 
         @DataFolder   nvarchar(4000),
         @LogFolder    nvarchar(4000),
         @dbname       sysname,
         @ymd          char(8) = CONVERT(char(8), getutcdate(), 112),
         @sql          nvarchar(max),
         @c            cursor;

  SELECT @Source     = SourceFolder,
         @DataFolder = DestinationDataFolder,
         @LogFolder  = DestinationLogFolder
    FROM dbo.RestoreConfig;

     SET @c = CURSOR STATIC READ_ONLY FORWARD_ONLY
          FOR SELECT DatabaseName 
                FROM dbo.SourceDatabases
               ORDER BY DatabaseName;

 OPEN @c;
 FETCH NEXT FROM @c INTO @dbname;

 WHILE @@FETCH_STATUS <> -1
 BEGIN
   SET @sql = N'RESTORE DATABASE ' + QUOTENAME(@dbname + N'_testing') + N'
              FROM DISK = N''' + @Source + @dbname + N'\'
            + @dbname + N'.FULL.' + @ymd + N'.bak'''
              WITH REPLACE, RECOVERY,
              MOVE N''DataFile'' TO N''' + @DataFolder
            + @dbname + N'.' + @ymd + N'.mdf''',
              MOVE N''LogFile'' TO N''' + @LogFolder
            + @dbname + N'.' + @ymd + N'.ldf'';';

   EXEC sys.sp_executesql @sql;

   /* then do CHECKDB things */

   FETCH NEXT FROM @c INTO @dbname;
 END

I’ll let you guess how many of those assumptions remain facts today. If you guessed zero, you’d be right. Life got complicated, the restore system is now also being used for other purposes, and source databases have grown significantly. I’d have to help adjust the code to deal with those changes, and this post is about how.

Here is their new reality relative to the “truths” above:

  • They started performing full backups weekly, with differential backups in between.
  • They started striping backups to multiple files.
  • Many databases had additional files and filegroups; even those that didn’t started having logical file names straying from the existing convention.
  • Different databases had to be backed up to different locations.
  • The biggest database outgrew the D:\ drive on the test system.
  • The biggest database also squeezed out all the others in terms of duration, so that they couldn’t all be backed up and CHECKDB‘d within a calendar day.

Some things did remain true, though:

  • They always produce backups WITH INIT, so we never have to figure out which file (backup set) we need within any given .bak file.
  • They don’t use features like FILESTREAM, FILETABLE, or In-Memory OLTP, so we never have to consider those differences. They do use full-text search, but there are no dedicated files for full-text catalogs involved in backup/restore testing.
  • They encrypt backups, but we’re going to leave that complication out for now.

Also, while backups might be striped to 8 or 16 files (or any number, really), they produce file names in a predictable format, e.g. <DatabaseName>.FULL.<YYYYMMDD>.<file number>.bak. Backups are generated using Ola Hallengren’s solution, so this is easy to do by passing in the right @FileName and @NumberOfFiles arguments:

EXEC dbo.DatabaseBackup
   @FileName      = N'{DatabaseName}.{Type}.{Year}{Month}{Day}.{StripeNumber}.bak',
   @NumberOfFiles = 16
   /* , ... other parameters */

We actually store that pattern in the (distributed) config table too, so we can change it in one place instead of in every SQL Server Agent backup job on every instance.

Similarly, I created table types for things we do often enough that we don’t want to define them over and over again. Because the test server is on a different subnet, and because the backups come from servers all over the network, we can’t always query msdb on a source instance to get backup information about a given database. So, instead, we rely on looking at the folder and finding the most recent file, then examining the output of these RESTORE commands:

RESTORE LABELONLY (for determining how many stripes exist for a given backup):

CREATE TYPE dbo.RestoreLabelOnlyOutput AS TABLE
 (
   MediaName             nvarchar(128),
   MediaSetID            uniqueidentifier,
   FamilyCount           int,
   FamilySequenceNumber  int,
   MediaFamilityID       uniqueidentifier,
   MediaSequenceNumber   int,
   MediaLabelPresent     tinyint,
   MediaDescription      nvarchar(255),
   SoftwareName          nvarchar(128),
   SoftwareVendorID      int,
   MediaDate             datetime,
   MirrorCount           int,
   IsCompressed          bit
 );

RESTORE FILELISTONLY (for determining the logical data and log files present in the backup):

CREATE TYPE dbo.RestoreFilelistOnlyOutput AS TABLE
 (
   LogicalName           nvarchar(128),
   PhysicalName          nvarchar(260), 
   Type                  char(1), 
   FileGroupName         nvarchar(128), 
   Size                  numeric(20,0),
   MaxSize               numeric(20,0), 
   FileID                bigint, 
   CreateLSN             numeric(25,0), 
   DropLSN               numeric(25,0), 
   UniqueID              uniqueidentifier, 
   ReadOnlyLSN           numeric(25,0), 
   ReadWriteLSN          numeric(25,0),
   BackupSizeInBytes     bigint, 
   SourceBlockSize       int, 
   FileGroupID           int, 
   LogGroupGUID          uniqueidentifier, 
   DifferentialBaseLSN   numeric(25,0), 
   DifferentialBaseGUID  uniqueidentifier, 
   IsReadOnly            bit, 
   IsPresent             bit, 
   TDEThumbprint         varbinary(32), 
   SnapshotUrl           nvarchar(360)
 );

We don’t need all of those columns, of course, but there’s no way to say:

SELECT <only these columns> FROM (RESTORE FILELISTONLY ...);

Next, we added columns to the SourceDatabases table to indicate where each database sends its backups and on what day of the week the fulls run. Then we used snake sorting to distribute the backups and restores across days of the week as evenly as possible. Let’s say we have this:

CREATE TABLE dbo.SourceDatabases
 (
   DatabaseName     sysname,
   BackupFolder     nvarchar(4000),
   FullBackupDay    tinyint
   /* other columns not pertinent to this post */
 );

 INSERT dbo.SourceDatabases
 (
   DatabaseName,
   BackupFolder,
   FullBackupDay
 )
 VALUES(N'DB_A', N'\\backup-share-1\SourceServerA\Full\', 1),
       (N'DB_B', N'\\backup-share-1\SourceServerA\Full\', 1),
       (N'DB_C', N'\\backup-share-2\SourceServerB\Full\', 2);

 /* so, for example, DB_A is backed up to share 1 on Sundays */

With the table types in place and the knowledge of where and when to find backups, we could make the code a lot more dynamic and determine exactly how to restore each database – even if it has been striped to multiple backup files and contains multiple data files. If there are multiple data files, we can alternate them onto the two drives we have available – even though one drive is supposed to be for tempdb, it can serve as a temporary workaround as it has plenty of available space.

CREATE PROCEDURE dbo.TestRestores
   @RunCheckDB   bit = 0,
   @debug        bit = 0
 AS
 BEGIN
   SET NOCOUNT ON;

   DECLARE @DataFolderA     nvarchar(4000),
           @DataFolderB     nvarchar(4000),
           @LogFolder       nvarchar(4000),
           @SourceRoot      nvarchar(4000),
           @Source          nvarchar(4000),
           @DirCommand      nvarchar(4000),
           @BaseFile        nvarchar(4000),
           @FileCount       tinyint,
           @TestDBName      sysname,
           @DatabaseName    sysname,
           @Today           date = getutcdate(),
           @ymd             char(8) = CONVERT(char(8), getutcdate(), 112),
           @Stripe          tinyint,
           @FromCommands    nvarchar(max),
           @MoveCommands    nvarchar(max),
           @LabelCommand    nvarchar(4000),
           @FileListCommand nvarchar(4000),
           @c               cursor;

    SELECT @DataFolderA   = DestinationDataFolder,
           @DataFolderB   = TempDBFolder,
           @LogFolder     = DestinationLogFolder
      FROM dbo.RestoreConfig;

   /* get only databases with full backups today */

   SET @c = CURSOR STATIC READ_ONLY FORWARD_ONLY
        FOR SELECT DatabaseName, BackupFolder
              FROM dbo.SourceDatabases
             WHERE FullBackupDay = DATEPART(WEEKDAY, @Today)
             ORDER BY DatabaseName;
   OPEN @c;
   FETCH NEXT FROM @c INTO @DatabaseName, @SourceRoot;

   WHILE @@FETCH_STATUS <> -1
   BEGIN
     /* where are we looking for backups? */

     SELECT @Source     = CONCAT(@SourceRoot, @DatabaseName, N'\'),
            @TestDBName = CONCAT(@DatabaseName, N'_Testing');
   
     /* perform a directory listing, newest first, stuff into @table */
     /* xp_cmdshell must be enabled, and service account needs access */

     SET @DirCommand = CONCAT('dir /b /O:D "', @Source, '*.bak"');
     DECLARE @List TABLE(fn nvarchar(128));
     INSERT @List EXEC master.sys.xp_cmdshell @DirCommand;

     /* is there one created today? is it striped? */
     /* bit messy since database names may contain delimiters */
     /* uses SQL Server 2022's STRING_SPLIT with ordinal */
     /* if on earlier version, find ordered string splitter */

     SELECT @BaseFile = fn, @Stripe = REVERSE(value)
     FROM 
     (
       SELECT TOP (1) fn 
         FROM @List
        WHERE fn LIKE @DatabaseName + N'.FULL.' + @ymd + N'.%bak'
        ORDER BY fn DESC
     ) AS y 
       CROSS APPLY STRING_SPLIT(REVERSE(fn), N'.', 1) AS x
       WHERE ordinal = 2;

     IF @BaseFile IS NOT NULL
     BEGIN
       IF TRY_CONVERT(tinyint, @Stripe) BETWEEN 1 AND 99
       BEGIN
         /* striped backup! Need to build FROM DISK = args */

         SET @LabelCommand = CONCAT
                 (N'RESTORE LABELONLY FROM DISK = N''', @Source, @BaseFile, N''';');

         DECLARE @Labels dbo.RestoreLabelOnlyOutput;
         INSERT @Labels EXEC sys.sp_executesql @LabelCommand;
         SELECT @FileCount = FamilyCount FROM @Labels;

         /* relies on SQL Server 2017's STRING_AGG */
         /* relies on SQL Server 2022's GENERATE_SERIES */
         /* if earlier, find group concat / sequence generation functions */

         SELECT @FromCommands = N' FROM '
              + STRING_AGG
                (CONVERT(nvarchar(max), CONCAT(N'  
                 DISK = N''', @Source, @DatabaseName, N'.FULL.', @ymd, N'.',
                   value, N'.bak''')), N',')
           FROM GENERATE_SERIES(1, CONVERT(int, @Stripe)) AS x;
       END
       ELSE
       BEGIN
         /* just a single file */

         SET @FromCommands = CONCAT(N' FROM DISK = N''', @Source, @BaseFile, N'''');
       END
     END

     /* generate data/log file args from any file */

     SET @FileListCommand = CONCAT
                 (N'RESTORE FILELISTONLY FROM DISK = N''', @Source, @BaseFile, ''';');
     DECLARE @LogicalFiles dbo.RestoreFilelistOnlyOutput;
     INSERT @LogicalFiles EXEC sys.sp_executesql @FileListCommand;

     /* alternate data files between data folder A & B */
     /* note: assumes LogicalName is always valid in file name */

     SELECT @MoveCommands = STRING_AGG
            (CONVERT(nvarchar(max), CONCAT(N' 
             MOVE N''', LogicalName, N''' TO N''', 
                  CASE FileID % 2 WHEN 1 THEN @DataFolderA ELSE @DataFolderB END, 
                  @TestDBName, N'.', LogicalName, N'.', @ymd, N'.mdf''')), N',') 
       FROM @LogicalFiles
      WHERE Type = 'D';

     /* also add log file(s) - yes, that happens */

     SELECT @MoveCommands += N',' + STRING_AGG
            (CONVERT(nvarchar(max), CONCAT(N'
             MOVE N''', LogicalName, N''' TO N''', @LogFolder, 
                  @TestDBName, N'.', LogicalName, N'.', @ymd, N'.ldf''')), N',')
       FROM @LogicalFiles
      WHERE Type = 'L';

     DECLARE @RestoreCommand nvarchar(max) = CONCAT(
       N'RESTORE DATABASE ', QUOTENAME(@TestDBName), N'
        ', @FromCommands, N'
          WITH REPLACE, RECOVERY,', @MoveCommands, N';'
       );

     IF @debug = 1
     BEGIN
       PRINT @RestoreCommand;
     END
     ELSE
     BEGIN
       EXEC sys.sp_executesql @RestoreCommand;
     END

     IF @RunCheckDB = 1
     BEGIN
       /* also check @debug here */
       PRINT '/* then do CHECKDB things */';
     END
   
     FETCH NEXT FROM @c INTO @DatabaseName, @SourceRoot;
   END
 END

Imagine DB_A has 4 data files and a log file, and is backed up to 8 stripes, this command would restore the database using the following command (though maybe not quite as pretty, formatting-wise):

RESTORE DATABASE [DB_A_Testing]
   FROM 
     DISK = N'\\backup-share-1\SourceServerA\Full\DB_A.FULL.20240501.01.bak',
     DISK = N'\\backup-share-1\SourceServerA\Full\DB_A.FULL.20240501.02.bak',
     DISK = N'\\backup-share-1\SourceServerA\Full\DB_A.FULL.20240501.03.bak',
     DISK = N'\\backup-share-1\SourceServerA\Full\DB_A.FULL.20240501.04.bak',
     DISK = N'\\backup-share-1\SourceServerA\Full\DB_A.FULL.20240501.05.bak',
     DISK = N'\\backup-share-1\SourceServerA\Full\DB_A.FULL.20240501.06.bak',
     DISK = N'\\backup-share-1\SourceServerA\Full\DB_A.FULL.20240501.07.bak',
     DISK = N'\\backup-share-1\SourceServerA\Full\DB_A.FULL.20240501.08.bak'
   WITH REPLACE, RECOVERY,
     MOVE N'A_DataFile1' TO N'D:\SQL\Data\DB_A_Testing.A_DataFile1.20240501.mdf',
     MOVE N'A_DataFile2' TO N'T:\SQL\TempDB\DB_A_Testing.A_DataFile2.20240501.mdf',
     MOVE N'A_DataFile3' TO N'D:\SQL\Data\DB_A_Testing.A_DataFile3.20240501.mdf',
     MOVE N'A_DataFile4' TO N'T:\SQL\TempDB\DB_A_Testing.A_DataFile5.20240501.mdf',
     MOVE N'A_Log'       TO N'L:\SQL\Log\DB_A_Testing.A_Log.20240501.ldf';

This seems like complicated code, but we think it is worth the investment because we don’t just want a backup strategy. We want to always be confident that we can restore from our backups. And the only way to be truly confident is to always be doing it – and by always improving this process, it makes us that much more prepared for dealing with a disaster recovery situation when it happens for real.

Future considerations

The code could, in theory, determine in advance if a database has grown enough to come close to or even exceed the available space of both available drives. In that case, maybe it would be time to add some storage (if possible) or move these operations to a bigger server.

We also want to add into the mix occasional full + differential + log restores, since the current process only really tests full backups and not necessarily that we could recover to a specific point in time if we had to. This would complicate the code slightly – we’d have to identify the latest differential, and all subsequent log backups (or at least the first one), in order to generate the right sequence of RESTORE commands. But, overall, the logic would remain unchanged.

And we may later have to change it if we can no longer test all the databases in a week. Maybe we only CHECKDB any given database every other week, or distribute all the databases across two weeks, or only test each database monthly, or build additional test servers and split the work that way. That’s not a tomorrow or next week problem, but it’s coming someday.

Further reading

Some other information, particularly if you get snagged on an older version and need to swap in alternative functions:

The post It’s a recovery strategy, not a backup strategy appeared first on Simple Talk.


Viewing all articles
Browse latest Browse all 62

Trending Articles