There are many times, we face the LSN mismatch issue in alwaysON and other HA technologies. It is a bit hard to find the missing transaction log backup to apply. Since, there are hundreds of thousands log generated, depends on the transaction log frequency and it can be run in any secondary alwaysON database server.
Think about “the VLDB” and “the backup is in different data center” and database are out of sync in DR site because of LSN mismatch. For VLDB 8 TB database, we cannot take a full or differential backup to fix this. Since, it will take more and more time. Backup is in different data center in CIFS share restore over the WAN will kill the performance and time.
LSN: Every record in the SQL Server transaction log is uniquely identified by a log sequence number (LSN). The restore database will work with the sequence of LSN order, no break in log chain.
It is easy if we understand the LSN chains a bit internally. Let me try to show in this post.
Example:
For the first log backup, the first LSN is 100 and Last LSN is 200, second log backup the first LSN should be 200 and last LSN is 300 and third log backup the first LSN should be 300 and last LSN will be some number and the chain goes on.
How to track, where the LSN breaks out and how to fix it. There are many methods, I used. All it depends on the issue and situation. Let me show all the methods and try it out.
Try is in testing not in production: Easy way to break the LSN in AlwaysON database.
Remove a database from an AlwaysON group in a secondary.
ALTER DATABASE [dba3] SET HADR OFF;
Take couple more log backup in primary.
DECLARE @MyFileName varchar(200) SELECT @MyFileName='\\Sharepatht\dba3_' + REPLACE(convert(nvarchar(20),GetDate(),120),':','-') + '.trn' --select @MyFileName BACKUP log dba3 TO DISK=@MyFileName
Try to add the database back to secondary.
ALTER DATABASE [dba3] SET HADR AVAILABILITY GROUP = [AG-Test];
Error:
Msg 1478, Level 16, State 211, Line 1
The mirror database, has insufficient transaction log data to preserve the log backup chain of the principal database. This may happen if a log backup from the principal database has not been taken or has not been restored on the mirror database.
Use the backup table in MSDB to retrieve the backup history with dynamic SQL.
Note: The filter where clause is important and it will be modified based on my steps. I just uncommented all conditions.
Very easy fix: If we know the date when it got broken, just add that as condition and generate restore script and run it all in once, it will pick up the right file and follow it, get it in asc order.
Dynamic Backup script:
SELECT 'restore database dba3 from disk= ''' +f.physical_device_name+''' with norecovery', b.backup_finish_date,b.first_lsn,b.backup_size /1024/1024 AS size_MB,b.type,b.recovery_model, b.server_name ,b.database_name,b.user_name, f.physical_device_name,b.* FROM MSDB.DBO.BACKUPMEDIAFAMILY F JOIN MSDB.DBO.BACKUPSET B ON (f.media_set_id=b.media_set_id) WHERE database_name like'dba3' --and b.backup_finish_date>='2016-08-29 00:00:00.000' --and b.first_lsn<=40000000007600001 -- and last_lsn= 39000000047800001 --and b.checkpoint_lsn =39000000047800001 --and database_backup_lsn =39000000047800001 --AND B.type='L' --ORDER BY b.backup_finish_date desc --order by b.first_lsn desc
First method – A little tough, but good to understand the first and last LSN and the restore failures of LSN mismatch. This can be used for any other DR LSN issues, not only for alwaysON.
Step 1:
Run the dynamic backup script to get the latest transaction log backup from the server. Run this in all alwaysON group replica servers and make sure, you got the latest one.
Uncomment “ORDER BY b.backup_finish_date desc” to display in first line. Get the backup file and pass on it to the following restore command.
restore database dba3 from disk= '\\share\dba3_2016-08-29 09-43-58.trn' with norecovery
It will error and display the required LSN – “includes LSN 41000000017300001 can be restored”, which means, we need the one before the LSN of includes this.
To understand better, following is the example of restoring error, which is a very recent log backup and some old log backup restore.
An Old log backup error:
Msg 4326, Level 16, State 1, Line 1
The log in this backup set terminates at LSN 39000000023900001,
which is too early to apply to the database. A more recent log backup that includes LSN 41000000017300001 can be restored.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.
The Latest log backup error:
Msg 4305, Level 16, State 1, Line 1
The log in this backup set begins at LSN 41000000017600001,
which is too recent to apply to the database. An earlier log backup that includes LSN 41000000017300001 can be restored.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.
I just run the two restore command with the latest and old log backup, one says it is “too early to apply” which means it is a before LSN and other says “too recent to apply” which means it is a after LSN.
Example:
39000000023900001 – Before/early LSN
41000000017300001 – Required/includes LSN
41000000017600001 – After/Recent LSN
It is a little hard to find which backup set holds the required LSN which is included.
If we know the time when it breaks, you can run restore command one by one from that time or before and can find out the backup file. OR run “restore headeronly from disk = ‘\share\dba3_2016-08-28 07-22-32.bak’” to match closer to the required LSN comparing with first LSN.
Again, it is tough when it generated lots of transaction log backup. The above example the LSN are from the “restore headonly” and “restore database” both are showing the same.
Step 2:
We know the required LSN set”41000000017300001” from the restore error.
Use the dynamic backup script add the condition “and b.first_lsn<=41000000017300001” to get the LSN which is early or smaller than the needed (OR) which includes the required LSN with “order by b.first_lsn desc” clause.
Copy the first_LSN in the first row “41000000016700001”. It is the breaking point. (Error: log backup that includes LSN “41000000017300001” can be restored.)
SELECT 'restore database dba3 from disk= ''' +f.physical_device_name+''' with norecovery', b.backup_finish_date,b.first_lsn,b.backup_size /1024/1024 AS size_MB,b.type,b.recovery_model, b.server_name ,b.database_name,b.user_name, f.physical_device_name,b.* FROM MSDB.DBO.BACKUPMEDIAFAMILY F JOIN MSDB.DBO.BACKUPSET B ON (f.media_set_id=b.media_set_id) WHERE database_name like'dba3' and b.first_lsn<=41000000017300001 order by b.first_lsn desc
Step 3:
Use the dynamic backup script add “and b.first_lsn>=41000000016700001” with “ORDER BY b.backup_finish_date”. It will take all the required backup files to be applied in a sequence.
SELECT 'restore database dba3 from disk= ''' +f.physical_device_name+''' with norecovery', b.backup_finish_date,b.first_lsn,b.backup_size /1024/1024 AS size_MB,b.type,b.recovery_model, b.server_name ,b.database_name,b.user_name, f.physical_device_name,b.* FROM MSDB.DBO.BACKUPMEDIAFAMILY F JOIN MSDB.DBO.BACKUPSET B ON (f.media_set_id=b.media_set_id) WHERE database_name like'dba3' and b.first_lsn>=41000000016700001 ORDER BY b.backup_finish_date
Another Method: It is a bit easy one for alwaysON.
Use the following command it will display the exact required LSN in the “s.truncation_lsn”, which is same as “First_LSN” =41000000016700001. The “s.recovery_lsn & s.last_hardened_lsn” will show the recovery LSN which is same as “includes LSN” = 41000000017300001.
Find the required LSN of using “DMV of alwaysON” and note the truncation_lsn.
select database_id,db_name(database_id),name,replica_server_name,synchronization_health_desc,synchronization_state_desc, s.truncation_lsn,s.recovery_lsn,s.last_hardened_lsn,s.last_sent_time,s.last_received_time,s.last_redone_lsn, s.end_of_log_lsn,s.last_commit_lsn,s.last_sent_lsn,s.last_received_lsn,s.last_commit_lsn from sys.dm_hadr_database_replica_states S join sys.availability_groups G on (s.group_id=g.group_id) join sys.availability_replicas GR on (GR.group_id=g.group_id) --where synchronization_health_desc <>'HEALTHY' and db_name(database_id)='dba3' and synchronization_health_desc ='NOT_HEALTHY' -- change the DB name and health status
For more about columns read the MS DMVs link:
https://msdn.microsoft.com/en-us/library/ff877972.aspx
Run the dynamic backup script with the filter clause of the “s.truncation_lsn” “41000000016700001”
SELECT 'restore database dba3 from disk= ''' +f.physical_device_name+''' with norecovery', b.backup_finish_date,b.first_lsn,b.last_lsn,b.database_backup_lsn,b.checkpoint_lsn, b.type,b.is_copy_only,b.recovery_model,b.backup_size /1024/1024 AS size_MB, b.server_name ,b.database_name,b.user_name, f.physical_device_name,b.* FROM MSDB.DBO.BACKUPMEDIAFAMILY F JOIN MSDB.DBO.BACKUPSET B ON (f.media_set_id=b.media_set_id) WHERE database_name like'dba3' --and b.backup_finish_date>='2016-08-29 05:50:00.000' AND b.first_lsn>=41000000016700001 AND B.type='L' ORDER BY b.backup_finish_date
Copy the restore script and execute in the LSN break server, if you know the log backup only configured and run on the server. Mostly in two node alwaysON, it is configured and run more server see the following and use it.
Join the database into the alwaysON group.
ALTER DATABASE [dba3] SET HADR AVAILABILITY GROUP = [AG-Test];
How to fix if the log backup runs more than one secondary replica, when the break happens.
I had a case in more than one replica server, which are two in the primary data center and two are in DR data center.
We can use the above command and steps and can run on all the available replica server and compare the results and can prepare the restore command. But, it will take a little time and maybe some confusion as well. I used “Xp_cmdshell” to get the files from the backup folder and insert into a table to prepare the restore command.
To find the required transaction backup file, we need the required LSN:
1. Get the required LSN either from “DMV of alwaysON” OR from restore error 2. Get the backup file name by using “Dynamic backup script” by passing the required LSN. Filter and compare it with the “Xp_cmdshell” results.
It is a different test, so the LSN will differ from the previous one.
- You can use the above command “DMV of alwaysON” to find the required LSN, but it will show all the replica server, you just note the unhealthy of database server of required LSN from “s.truncation_lsn”
2. Use “Dynamic Backup script” and filter it “AND b.first_lsn>=41000000021500001” with “ORDER BY b.backup_finish_date”. Note the “f.physical_device_name”. That is “\share\dba3_2016-08-30 05-10-00.trn”
Step 1: Enable the xp_cmdshell
EXEC sp_configure 'show advanced options', 1; RECONFIGURE; EXEC sp_configure 'xp_cmdshell', 1; RECONFIGURE;
Step 2: Create table to export the backup files
--drop table tbl_backup_filename create table tbl_backup_filename (id int identity, Bak_filename varchar(500)) insert into tbl_backup_filename exec xp_cmdshell 'dir \\share\SQL-Backup\QA\DBA_Test /b /O:D' -- Bare format sort by date
Step 3: Compare the backup file name with the export backup file name and note the identity ID.
select * from tbl_backup_filename --\\share\dba3_2016-08-30 05-10-00.trn select * from tbl_backup_filename where Bak_filename like '%dba3_2016-08-30 05-10-00%'
Step 4: Run the dynamic SQL restore command to generate the restore script with the condition of identity number, which you got it from above step.
select 'restore headeronly from disk=''\\sclfilip13\MFGProcess\SQL-Backup\QA\DBA_Test\'+Bak_filename+'''', * from tbl_backup_filename where id >=76 -- DB select 'restore database dba3 from disk=''\\sclfilip13\MFGProcess\SQL-Backup\QA\DBA_Test\'+Bak_filename+''' with norecovery', * from tbl_backup_filename where id >=76
Step 5: Disable the xp_cmdshell
EXEC sp_configure 'xp_cmdshell', 0; RECONFIGURE; EXEC sp_configure 'show advanced options', 0; RECONFIGURE;
If you have a backup folder separate for a replica server, then you need to export all and can compare it with all replica servers.
If anyone interested in the test script, please drop me a note. It is always great to share the my findings and learning with you all.
Happy learning!
12 Comments
Pingback:
Vinoth
Muthu you have had skilled a lot. I will have a test on it. Just now read your tempDB full post.
Muthukkumaran kaliyamoorthy
Thanks Vinoth 🙂 Had good time here. I hope you too.
Johnson Weltch
Fantastic blog, full of useful information which was very helpful to me. There I found an informative article describing Log sequence number for backup which was also very helpful to me. You may have a look:http://www.sqlmvp.org/sql-server-log-sequence-numbers-for-backups/
Muthukkumaran kaliyamoorthy
Thanks for your kind words Johnson!
Pingback:
Mike
Very good blog with alot information about sqlserver, practical information not theory.
Muthukkumaran Kaliyamoorthy
Glad it helped you.
Donbosco
In a recovery pending state of a secondary replica database caused due to force manual failover for a very large size busy say 30 TB database, with a high volume transactions, how does one keep the AG sync and healthy?
Pingback:
Pingback:
Vivek Sharma
HI Muthu,
Awesome Blog. I enjoyed reading it. Could you please share Test Scripts you mentioned to Re-Create the Scenario?
Thanks in advance.