| title | Operate availability group SQL Server on Linux | Microsoft Docs |
|---|---|
| description | |
| author | MikeRayMSFT |
| ms.author | mikeray |
| manager | jhubbard |
| ms.date | 07/20/2017 |
| ms.topic | article |
| ms.prod | sql-linux |
| ms.technology | database-engine |
| ms.assetid |
[!INCLUDEtsql-appliesto-sslinux-only]
Use the cluster management tools to fail over an availability group managed by an external cluster manager. For example, if a solution uses Pacemaker to manage a Linux cluster, use pcs to perform manual failovers on RHEL or Ubuntu. On SLES use crm.
Important
Under normal operations, do not fail over with Transact-SQL or SQL Server management tools like SSMS or PowerShell. When CLUSTER_TYPE = EXTERNAL, the only acceptable value for FAILOVER_MODE is EXTERNAL. With these settings, all manual or automatic failover actions are executed by the external cluster manager.
Manually fail over the availability group with the external cluster management tools. Under normal operations, do not initiate failover with Transact-SQL. If the external cluster management tools do not respond, you can force the availability group to fail over. For instructions to force the manual failover, see Manual move when cluster tools are not responsive.
Complete the manual failover in two steps.
-
Move the availability group resource from the cluster node that owns the resources to a new node.
The cluster manager moves the availability group resource and adds a location constraint. This constraint configures the resource to run on the new node. You must remove this constraint in order to move either manually or automatically fail over in the future.
-
Remove the location constraint.
To manually fail over an availability group resource named ag_cluster to cluster node named nodeName2, run appropriate command for your distribution:
-
RHEL/Ubuntu example
sudo pcs resource move ag_cluster-master nodeName2 --master
-
SLES example
crm resource migrate ag_cluster nodeName2
Important
After you manually fail over a resource, you need to remove a location constraint that is automatically added during the move.
During a manual move, the pcs command move or crm command migrate adds a location constraint for the resource to be placed on the new target node. To see the new constraint, run the following command after manually moving the resource:
-
RHEL/Ubuntu example
sudo pcs constraint --full
-
SLES example
crm config show
You need to remove the location constraint so future moves - including automatic failover - succeed.
To remove the constraint run the following command.
-
RHEL/Ubuntu example
In this example
ag_cluster-masteris the name of the resource that was moved.sudo pcs resource clear ag_cluster-master
-
SLES example
In this example
ag_clusteris the name of the resource that was moved.crm resource clear ag_cluster
Alternatively, you can run the following command to remove the location constraint.
-
RHEL/Ubuntu example
In the following command
cli-prefer-ag_cluster-masteris the ID of the constraint that needs to be removed.sudo pcs constraint --fullreturns this ID.sudo pcs constraint remove cli-prefer-ag_cluster-master
-
SLES example
In the following command
cli-prefer-ms-ag_clusteris the ID of the constraint.crm config showreturns this ID.crm configure delete cli-prefer-ms-ag_cluster commit
Note
Automatic failover does not add a location constraint, so no cleanup is necessary.
For more information:
- Red Hat - Managing Cluster Resources
- Pacemaker - Move Resources Manually SLES Administration Guide - Resources
In extreme cases, if a user cannot use the cluster management tools for interacting with the cluster (i.e. the cluster is unresponsive, cluster management tools have a faulty behavior), the user might have to fail over manually - bypassing the external cluster manager. This is not recommended for regular operations, and should be used within cases cluster is failing to execute the failover action using the cluster management tools.
If you cannot fail over the availability group with the cluster management tools, follow these steps to fail over from SQL Server tools:
-
Verify that the availability group resource is not managed by the cluster any more.
- Attempt to set the resource to unmanaged mode. This signals the resource agent to stop resource monitoring and management. For example:
sudo pcs resource unmanage <**resourceName**>
- If the attempt to set the resource mode to unmanaged mode fails, delete the resource. For example:
sudo pcs resource delete <**resourceName**>
[!NOTE] When you delete a resource it also deletes all of the associated constraints.
-
Manually set the session context variable
external_cluster.EXEC sp_set_session_context @key = N'external_cluster', @value = N'yes'; -
Fail over the availability group with Transact-SQL. In the example below replace
<**MyAg**>with the name of your availability group. Connect to the instance of SQL Server that hosts the target secondary replica and run the following command:ALTER AVAILABILITY GROUP <**MyAg**> FAILOVER; -
Restart cluster resource monitoring and management. Run the following command:
sudo pcs resource manage <**resourceName**> sudo pcs resource cleanup <**resourceName**>
For CLUSTER_TYPE=EXTERNAL, the failover trigger semantics are different compared to WSFC. When the availability group is on an instance of SQL Server in a WSFC, transitioning out of ONLINE state for the database causes the availability group health to report a fault. This will signal the cluster manager to trigger a failover action. On Linux, the SQL Server instance cannot communicate with the cluster. Monitoring for database health is done "outside-in". If user opted in for database level failover monitoring and failover (by setting the option DB_FAILOVER=ON when creating the availability group), the cluster will check if the database state is ONLINE every time when it runs a monitoring action. The cluster queries the state in sys.databases. For any state different than ONLINE, it will trigger a failover automatically (if automatic failover conditions are met). The actual time of the failover depends on the frequency of the monitoring action as well as the database state being updated in sys.databases.
Before you upgrade an availability group, review the best practices at Upgrading availability group replica instances.
The following sections explain how to perform a rolling upgrade with SQL Server instances on Linux with availability groups.
When availability group replicas are on instances of SQL Server in Linux, the cluster type of the availability group is either EXTERNAL or NONE. An availability group that is managed by a cluster manager besides Windows Server Failover Cluster (WSFC) is EXTERNAL. Pacemaker with Corosync is an example of an external cluster manager. An availability group with no cluster manager has cluster type NONE The upgrade steps outlined here are specific for availability groups of cluster type EXTERNAL or NONE.
-
Before you begin, backup each database.
-
Upgrade instances of SQL Server that host secondary replicas.
a. Upgrade asynchronous secondary replicas first.
b. Upgrade synchronous secondary replicas.
[!NOTE] If an availability group only has asynchronous replicas - to avoid any data loss change one replica to synchronous and wait until it is synchronized. Then upgrade this replica.
b.1. Stop the resource on the node hosting the secondary replica targeted for upgrade
Before running the upgrade command, stop the resource so the cluster will not monitor it and fail it unnecessarily. The following example adds a location constraint on the node that will result on the resource to be stopped. Update
ag_cluster-masterwith the resource name andnodeName1with the node hosting the replica targeted for upgrade.pcs constraint location ag_cluster-master avoids nodeName1
b.2. Upgrade SQL Server on the secondary replica
The following example upgrades
mssql-serverandmssql-server-hapackages.sudo yum update mssql-server sudo yum update mssql-server-ha
b.3. Remove the location constraint
Before running the upgrade command, stop the resource so the cluster will not monitor it and fail it unnecessarily. The following example adds a location constraint on the node that will result on the resource to be stopped. Update
ag_cluster-masterwith the resource name andnodeName1with the node hosting the replica targeted for upgrade.pcs constraint remove location-ag_cluster-master-rhel1--INFINITY
As a best practice, ensure the resource is started (using
pcs statuscommand) and the secondary replica is connected and synchronized state after upgrade. -
After all secondary replicas are upgraded, manually fail over to one of the synchronous secondary replicas.
For availability groups with
EXTERNALcluster type, use the cluster management tools to fail over; availability groups withNONEcluster type should use Transact-SQL to fail over. The following example fails over an availability group with the cluster management tools. Replace<targetReplicaName>with the name of the synchronous secondary replica that will become primary:sudo pcs resource move ag_cluster-master <targetReplicaName> --master
[!IMPORTANT] The following steps only apply to availability groups that do not have a cluster manager.
If the availability group cluster type isNONE, manually fail over. Complete the following steps in order:a. The following command sets the primary replica to secondary. Replace
AG1with the name of your availability group. Run the Transact-SQL command on the instance of SQL Server that hosts the primary replica.ALTER AVAILABILITY GROUP [ag1] SET (ROLE = SECONDARY);b. The following command sets a synchronous secondary replica to primary. Run the following Transact-SQL command on the target instance of SQL Server - the instance that hosts the synchronous secondary replica.
ALTER AVAILABILITY GROUP [ag1] FAILOVER; -
After failover, upgrade SQL Server on the old primary replica by repeating the same procedure described in steps b.1-b.3 above.
The following example upgrades
mssql-serverandmssql-server-hapackages.# add constraint for the resource to stop on the upgraded node # replace 'nodename2' with the name of the cluster node targeted for upgrade pcs constraint location ag_cluster-master avoids nodeName2 sudo yum update mssql-server sudo yum update mssql-server-ha
# upgrade mssql-server and mssql-server-ha packages sudo yum update mssql-server sudo yum update mssql-server-ha# remove the constraint; make sure the resource is started and replica is connected and synchronized pcs constraint remove location-ag_cluster-master-rhel1--INFINITY -
For an availability groups with an external cluster manager - where cluster type is EXTERNAL, cleanup the location constraint that was caused by the manual failover.
sudo pcs constraint remove cli-prefer-ag_cluster-master
-
Resume data movement for the newly upgraded secondary replica - the former primary replica. This is required when a higher version instance of SQL Server is transferring log blocks to a lower version instance in an availability group. Run the following command on the new secondary replica (the previous primary replica).
ALTER DATABASE database_name SET HADR RESUME;
After upgrading all servers, you can failback - fail over back to the original primary - if necessary.
To delete an availability group, run DROP AVAILABILITY GROUP. If the cluster type is EXTERNAL or NONE run the command on every instance of SQL Server that hosts a replica. For example, to drop an availability group named group_name run the following command:
DROP AVAILABILITY GROUP group_name
Configure Red Hat Enterprise Linux Cluster for SQL Server Availability Group Cluster Resources
Configure SUSE Linux Enterprise Server Cluster for SQL Server Availability Group Cluster Resources
Configure Ubuntu Cluster for SQL Server Availability Group Cluster Resources