AlwaysON

AlwaysON database NOT SYNCHRONIZED and RECOVERY PENDING

Advertisements

AlwaysON database NOT SYNCHRONIZED and RECOVERY PENDING

 

Check the AG database by expanding group and resume it.

A quick fix: Suspend and resume it.

select name,role_desc,operational_state_desc,recovery_health_desc,
synchronization_health_desc,getdate()
from sys.dm_hadr_availability_replica_cluster_states a join 
sys.dm_hadr_availability_replica_states b on a.group_id =b.group_id 
join sys.availability_groups_cluster c on b.group_id =c.group_id 
where b.synchronization_health_desc<>'HEALTHY'

-- Get the replica ID and pass it
select 'alter database ['+database_name+'] set HADR suspend' from sys.dm_hadr_database_replica_cluster_states where replica_id=''
select 'alter database ['+database_name+'] set HADR resume' from sys.dm_hadr_database_replica_cluster_states where replica_id=''

Bug:

I have had one more issue  Always ON database went into In Recovery mode, it was 10 GB database and I have read the log no improvement and rollback percentage etc. I had left 4 hours, still I had same.

Solution: Remove the database from AG group, the secondary database will go into restoring mode from in recovery. Try re-add, if it throw an error for small database drop database from secondary and re add with Automatic seeding option in the primary. For larger database, check the error and correct the LSN issue.

Reproduce one of the issue

I have recently come across an issue that, one of my alwaysON secondary replica databases are went into “NOT SYNCHRONIZED and RECOVERY PENDING” state. It is a geo cluster alwaysON with 4+2 nodes configured in both synchronous and asynchronous mode. 

The issue is when the database removed from the primary replica, with the secondary disconnection the higher database IDs on the secondary went into “NOT SYNCHRONIZED and RECOVERY PENDING” state, but the lower database IDs are good and synchronous state only.

I just checked with one of the Microsoft MSFT and he said it considers as a bug. “Since, it occurs a DDL change on the Availability Group as the primary, while a secondary replica server is down”.

I just reproduced the same. It is a two node synchronous replica server.

Version: Microsoft SQL Server 2012 (SP1) – 11.0.3000.0

I have not tested it on SQL 2014.

 

Testing:

  1. Create five databases.

CREATE DATABASE dba

CREATE DATABASE dba1

CREATE DATABASE dba2

CREATE DATABASE dba3

CREATE DATABASE dba4

SELECT * FROM SYS.SYSDATABASES

2. Created a test availability group and added a five databases in synchronous mode.

3. Stop SQL Instance for Secondary Replica.

4. On the Primary Replica issue TSQL to remove a database from availability group.

 

ALTER AVAILABILITY GROUP [AG-Test] REMOVE DATABASE dba2;

 

  1. Start SQL Instance for the Secondary Replica.

You can see that, databases whose IDs are higher than the one removed dba2 are going to “NOT SYNCHRONIZED / RECOVERY PENDING” mode.

DB ID – 9, 10 for dba,dba1 – Good

DB ID – 11 for dba2 – Removed DB

DB ID – 12,13 for dba3, dba4 went into SYNCHRONIZED / RECOVERY PENDING.

 

SELECT DATABASE_ID,DB_NAME(DATABASE_ID),NAME,REPLICA_SERVER_NAME,

SYNCHRONIZATION_HEALTH_DESC,SYNCHRONIZATION_STATE_DESC,S.*

FROM SYS.DM_HADR_DATABASE_REPLICA_STATES S JOIN  SYS.AVAILABILITY_GROUPS G ON (S.GROUP_ID=G.GROUP_ID)

JOIN SYS.AVAILABILITY_REPLICAS   GR ON (GR.GROUP_ID=G.GROUP_ID)

–WHERE SYNCHRONIZATION_HEALTH_DESC <>’HEALTHY’

–AND REPLICA_SERVER_NAME =”

 

The fix and work around is resume other databases that are stuck in secondary replica.

  1. On the Secondary replica that has stuck databases, for each stuck database remove (will make it DB into restoring mode) and add (will make it DB into synchronized mode).

Remove a Secondary Database from an AG –> https://msdn.microsoft.com/en-us/library/hh231120.aspx

 

ALTER DATABASE [DBA3] SET HADR OFF

ALTER DATABASE [DBA4] SET HADR OFF

Join a Secondary Database to an AG –> https://msdn.microsoft.com/en-us/library/ff878535.aspx

 

ALTER DATABASE [DBA3] SET HADR AVAILABILITY GROUP = [AG-TEST]

ALTER DATABASE [DBA4] SET HADR AVAILABILITY GROUP = [AG-TEST]

Repeat for each stuck database.

The database that had been removed <dba2> could be in any number of states, but after it finishes its recovery process, it should return to a RESTORING state.

7. On primary replica, Add the removed database –> primary replica –> right click AG databases–>select database –> join only –> connect the secondary –> finish.

In case, if you get any following error, check the backup table and restore a log backup with norecovery mode in all the secondary replica. Which are all missed or run during that time.

Msg 1478, Level 16, State 211, Line 1

The mirror database, ‘DBA2’, has insufficient transaction log data to preserve the log backup chain of the principal database.  This may happen if a log backup from the principal database has not been taken or has not been restored on the mirror database.

LSN issue fix –> https://sqlserverblogforum.com/alwayson/how-to-solve-the-lsn-mismatch-in-sql-server/

 

Added: Thanks Amit Banerjee for your reply. I just reported to the connect.

https://connect.microsoft.com/SQLServer/feedback/details/3022019

 

 

I’m currently working as a SQL server DBA in one of the top MNC. I’m passionate about SQL Server And I’m specialized in Administration and Performance tuning. I’m an active member of SQL server Central and MSDN forum. I also write articles in SQL server Central. For more Click here

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

5 + 2 =