AlwaysON database NOT SYNCHRONIZED and RECOVERY PENDING
Check the AG database by expanding group and resume it.
A quick fix: Suspend and resume it.
select name,role_desc,operational_state_desc,recovery_health_desc, synchronization_health_desc,getdate() from sys.dm_hadr_availability_replica_cluster_states a join sys.dm_hadr_availability_replica_states b on a.group_id =b.group_id join sys.availability_groups_cluster c on b.group_id =c.group_id where b.synchronization_health_desc<>'HEALTHY' -- Get the replica ID and pass it select 'alter database ['+database_name+'] set HADR suspend' from sys.dm_hadr_database_replica_cluster_states where replica_id='' select 'alter database ['+database_name+'] set HADR resume' from sys.dm_hadr_database_replica_cluster_states where replica_id=''
Bug:
I have had one more issue Always ON database went into In Recovery mode, it was 10 GB database and I have read the log no improvement and rollback percentage etc. I had left 4 hours, still I had same.
Solution: Remove the database from AG group, the secondary database will go into restoring mode from in recovery. Try re-add, if it throw an error for small database drop database from secondary and re add with Automatic seeding option in the primary. For larger database, check the error and correct the LSN issue.
Reproduce one of the issue
I have recently come across an issue that, one of my alwaysON secondary replica databases are went into “NOT SYNCHRONIZED and RECOVERY PENDING” state. It is a geo cluster alwaysON with 4+2 nodes configured in both synchronous and asynchronous mode.
The issue is when the database removed from the primary replica, with the secondary disconnection the higher database IDs on the secondary went into “NOT SYNCHRONIZED and RECOVERY PENDING” state, but the lower database IDs are good and synchronous state only.
I just checked with one of the Microsoft MSFT and he said it considers as a bug. “Since, it occurs a DDL change on the Availability Group as the primary, while a secondary replica server is down”.
I just reproduced the same. It is a two node synchronous replica server.
Version: Microsoft SQL Server 2012 (SP1) – 11.0.3000.0
I have not tested it on SQL 2014.
Testing:
- Create five databases.
CREATE DATABASE dba
CREATE DATABASE dba1
CREATE DATABASE dba2
CREATE DATABASE dba3
CREATE DATABASE dba4
SELECT * FROM SYS.SYSDATABASES
2. Created a test availability group and added a five databases in synchronous mode.
3. Stop SQL Instance for Secondary Replica.
4. On the Primary Replica issue TSQL to remove a database from availability group.
ALTER AVAILABILITY GROUP [AG-Test] REMOVE DATABASE dba2;
- Start SQL Instance for the Secondary Replica.
You can see that, databases whose IDs are higher than the one removed dba2 are going to “NOT SYNCHRONIZED / RECOVERY PENDING” mode.
DB ID – 9, 10 for dba,dba1 – Good
DB ID – 11 for dba2 – Removed DB
DB ID – 12,13 for dba3, dba4 went into SYNCHRONIZED / RECOVERY PENDING.
SELECT DATABASE_ID,DB_NAME(DATABASE_ID),NAME,REPLICA_SERVER_NAME,
SYNCHRONIZATION_HEALTH_DESC,SYNCHRONIZATION_STATE_DESC,S.*
FROM SYS.DM_HADR_DATABASE_REPLICA_STATES S JOIN SYS.AVAILABILITY_GROUPS G ON (S.GROUP_ID=G.GROUP_ID)
JOIN SYS.AVAILABILITY_REPLICAS GR ON (GR.GROUP_ID=G.GROUP_ID)
–WHERE SYNCHRONIZATION_HEALTH_DESC <>’HEALTHY’
–AND REPLICA_SERVER_NAME =”
The fix and work around is resume other databases that are stuck in secondary replica.
- On the Secondary replica that has stuck databases, for each stuck database remove (will make it DB into restoring mode) and add (will make it DB into synchronized mode).
Remove a Secondary Database from an AG –> https://msdn.microsoft.com/en-us/library/hh231120.aspx
ALTER DATABASE [DBA3] SET HADR OFF
ALTER DATABASE [DBA4] SET HADR OFF
Join a Secondary Database to an AG –> https://msdn.microsoft.com/en-us/library/ff878535.aspx
ALTER DATABASE [DBA3] SET HADR AVAILABILITY GROUP = [AG-TEST]
ALTER DATABASE [DBA4] SET HADR AVAILABILITY GROUP = [AG-TEST]
Repeat for each stuck database.
The database that had been removed <dba2> could be in any number of states, but after it finishes its recovery process, it should return to a RESTORING state.
7. On primary replica, Add the removed database –> primary replica –> right click AG databases–>select database –> join only –> connect the secondary –> finish.
In case, if you get any following error, check the backup table and restore a log backup with norecovery mode in all the secondary replica. Which are all missed or run during that time.
Msg 1478, Level 16, State 211, Line 1
The mirror database, ‘DBA2’, has insufficient transaction log data to preserve the log backup chain of the principal database. This may happen if a log backup from the principal database has not been taken or has not been restored on the mirror database.
LSN issue fix –> https://sqlserverblogforum.com/alwayson/how-to-solve-the-lsn-mismatch-in-sql-server/
Added: Thanks Amit Banerjee for your reply. I just reported to the connect.
https://connect.microsoft.com/SQLServer/feedback/details/3022019
One Comment
Pingback: