Skip to content

Commit a8c02c6

Browse files
gitNameMashaMSFT
authored andcommitted
Light Freshness Edit: Azure SQL (Failover clustering)
1 parent 483c8ab commit a8c02c6

24 files changed

Lines changed: 1672 additions & 1544 deletions

docs/database-engine/availability-groups/windows/failover-clustering-and-always-on-availability-groups-sql-server.md

Lines changed: 103 additions & 107 deletions
Large diffs are not rendered by default.

docs/database-engine/availability-groups/windows/prereqs-restrictions-recommendations-always-on-availability.md

Lines changed: 140 additions & 85 deletions
Large diffs are not rendered by default.

docs/database-engine/availability-groups/windows/upgrading-always-on-availability-group-replica-instances.md

Lines changed: 45 additions & 51 deletions
Large diffs are not rendered by default.
Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,24 @@
11
---
2-
title: Business continuity and database recovery - SQL Server
2+
title: Business Continuity and Database Recovery - SQL Server
33
description: Use this overview of business continuity solutions for high availability and disaster recovery in SQL Server to provide resources with minimal interruption.
44
author: MashaMSFT
55
ms.author: mathoma
66
ms.reviewer: mikeray, randolphwest
7-
ms.date: 11/23/2022
7+
ms.date: 08/21/2025
8+
ms.update-cycle: 1825-days
89
ms.service: sql
910
ms.subservice: availability-groups
10-
ms.custom: linux-related-content
1111
ms.topic: conceptual
12+
ms.custom:
13+
- linux-related-content
1214
---
1315
# Business continuity and database recovery - SQL Server
1416

15-
[!INCLUDE[sqlserver2016](../includes/applies-to-version/sqlserver2016.md)]
17+
[!INCLUDE [sqlserver2016](../includes/applies-to-version/sqlserver2016.md)]
1618

17-
[!INCLUDE[business-continuity](../includes/business-continuity/business-continuity.md)]
19+
[!INCLUDE [business-continuity](../includes/business-continuity/business-continuity.md)]
1820

19-
## Next steps
21+
## Related content
2022

21-
- [Availability groups](availability-groups/windows/overview-of-always-on-availability-groups-sql-server.md)
22-
- [Failover clusters](../sql-server/failover-clusters/install/sql-server-failover-cluster-installation.md)
23+
- [What is an Always On availability group?](availability-groups/windows/overview-of-always-on-availability-groups-sql-server.md)
24+
- [SQL Server failover cluster installation](../sql-server/failover-clusters/install/sql-server-failover-cluster-installation.md)

docs/includes/business-continuity/business-continuity.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ With [!INCLUDE [ssnoversion-md](../ssnoversion-md.md)] now supported on both Win
5050
5151
## Distributed availability groups
5252

53-
Distributed AGs are designed to span AG configurations, whether those two underlying clusters underneath the AGs are two different WSFCs, Linux distributions, or one on a WSFC and the other on Linux. A distributed AG will be the primary method of having a cross platform solution. A distributed AG is also the primary solution for migrations such as converting from a Windows Server-based [!INCLUDE [ssnoversion-md](../ssnoversion-md.md)] infrastructure to a Linux-based one if that is what your company wants to do. As noted above, AGs, and especially distributed AGs, would minimize the time that an application would be unavailable for use. An example of a distributed AG that spans a WSFC and Pacemaker is shown below.
53+
Distributed AGs are designed to span AG configurations, whether those two underlying clusters underneath the AGs are two different WSFCs, Linux distributions, or one on a WSFC and the other on Linux. A distributed AG will be the primary method of having a cross platform solution. A distributed AG is also the primary solution for migrations such as converting from a Windows Server-based [!INCLUDE [ssnoversion-md](../ssnoversion-md.md)] infrastructure to a Linux-based one if that is what your company wants to do. As noted above, AGs, and especially distributed AGs, would minimize the time that an application would be unavailable for use. An example of a distributed AG that spans a WSFC and Pacemaker is shown in the following diagram:
5454

5555
:::image type="content" source="media/business-continuity/distributed-availability-group-span.png" alt-text="Diagram showing a distributed availability group that spans a WSFC and Pacemaker.":::
5656

docs/includes/business-continuity/high-availability.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Before [!INCLUDE[sssql22-md](../sssql22-md.md)], AGs only provide database-level
2020

2121
Starting with [!INCLUDE[sssql22-md](../sssql22-md.md)], you can manage metadata objects including users, logins, permissions and [!INCLUDE[ssnoversion-md](../ssnoversion-md.md)] Agent jobs at the AG level in addition to the instance level. For more information, see [Contained availability groups](../../database-engine/availability-groups/windows/contained-availability-groups-overview.md).
2222

23-
An AG also has another component called the *listener*, which allows applications and end users to connect without needing to know which [!INCLUDE[ssnoversion-md](../ssnoversion-md.md)] instance is hosting the primary replica. Each AG would have its own listener. While the implementations of the listener are slightly different on Windows Server versus Linux, the functionality it provides and how it's used is the same. The diagram below shows a Windows Server-based AG that is using a Windows Server Failover Cluster (WSFC). An underlying cluster at the OS layer is required for availability whether it is on Linux or Windows Server. The example shows a simple configuration with two servers, or *nodes*, with a WSFC as the underlying cluster.
23+
An AG also has another component called the *listener*, which allows applications and end users to connect without needing to know which [!INCLUDE[ssnoversion-md](../ssnoversion-md.md)] instance is hosting the primary replica. Each AG would have its own listener. While the implementations of the listener are slightly different on Windows Server versus Linux, the functionality it provides and how it's used is the same. The following diagram shows a Windows Server-based AG that is using a Windows Server Failover Cluster (WSFC). An underlying cluster at the OS layer is required for availability whether it is on Linux or Windows Server. The example shows a simple configuration with two servers, or *nodes*, with a WSFC as the underlying cluster.
2424

2525
:::image type="content" source="media/business-continuity/simple-availability-group.png" alt-text="Diagram of a simple availability group.":::
2626

@@ -31,7 +31,7 @@ Standard and Enterprise edition have different maximums when it comes to replica
3131
3232
When it comes to availability, AGs can provide either automatic or manual failover. Automatic failover can occur if synchronous data movement is configured and the database on the primary and secondary replica are in a synchronized state. As long as the listener is used and the application uses a later version of .NET Framework (3.5 with an update, or 4.0 and above), the failover should be handled with minimal to no effect on end users if a listener is utilized. Failing over to a secondary replica to make it the new primary replica can be configured to be automatic or manual, and is generally measured in seconds.
3333

34-
The list below highlights some differences with AGs on Windows Server versus Linux:
34+
The following list highlights some differences with AGs on Windows Server versus Linux:
3535

3636
- Owing to differences in the way the underlying cluster works on Linux and Windows Server, all failovers (manual or automatic) of AGs are done via the cluster on Linux. On Windows Server-based AG deployments, manual failovers must be done via [!INCLUDE[ssnoversion-md](../ssnoversion-md.md)]. Automatic failovers are handled by the underlying cluster on both Windows Server and Linux.
3737
- For [!INCLUDE [ssnoversion-md](../ssnoversion-md.md)] on Linux, the recommended configuration for AGs is a minimum of three replicas. This is due to the way that the underlying clustering works.
@@ -40,7 +40,7 @@ The list below highlights some differences with AGs on Windows Server versus Lin
4040
Starting with [!INCLUDE[sssql17-md](../sssql17-md.md)], there are some new features and enhancements to AGs:
4141

4242
- Cluster types
43-
- REQUIRED_SECONDARIES_TO_COMMIT
43+
- `REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT`
4444
- Enhanced Microsoft Distributor Transaction Coordinator (DTC) support for Windows Server-based configurations
4545
- Additional scale out scenarios for read-only databases (described later in this article)
4646

@@ -67,9 +67,9 @@ A cluster type of None can be used with both Windows Server and Linux AGs. Setti
6767
> [!IMPORTANT]
6868
> Starting with [!INCLUDE[sssql17-md](../sssql17-md.md)], you can't change a cluster type for an AG after it is created. This means that an AG cannot be switched from None to External or WSFC, or vice versa.
6969
70-
For those who are only looking to just add additional read-only copies of a database, or like what an AG provides for migration/upgrades but don't want to be tied to the additional complexity of an underlying cluster or even the replication, an AG with a cluster type of None is a perfect solution. For more information, see the sections [Migrations and Upgrades](#Migrations) and [read-scale](#ReadScaleOut).
70+
For those who are only looking to just add additional read-only copies of a database, or like what an AG provides for migration/upgrades but don't want to be tied to the additional complexity of an underlying cluster or even the replication, an AG with a cluster type of None is a perfect solution. For more information, see the sections [Migrations and upgrades](#Migrations) and [read-scale](#ReadScaleOut).
7171

72-
The screenshot below shows the support for the different kinds of cluster types in SQL Server Management Studio (SSMS). You must be running version 17.1 or later. The screenshot below is from version 17.2.
72+
The following screenshot shows the support for the different kinds of cluster types in SQL Server Management Studio (SSMS). You must be running version 17.1 or later. The following screenshot follow is from version 17.2:
7373

7474
:::image type="content" source="media/business-continuity/availability-group-options.png" alt-text="Screenshot of SSMS AG options." lightbox="media/business-continuity/availability-group-options.png":::
7575

@@ -100,19 +100,19 @@ In [!INCLUDE[sssql17-md](../sssql17-md.md)] and later versions, DTC support can
100100

101101
### Failover cluster instances
102102

103-
Clustered installations have been a feature of [!INCLUDE[ssnoversion-md](../ssnoversion-md.md)] since version 6.5. FCIs are a proven method of providing availability for the entire installation of [!INCLUDE[ssnoversion-md](../ssnoversion-md.md)], known as an instance. This means that everything inside the instance, including databases, [!INCLUDE[ssnoversion-md](../ssnoversion-md.md)] Agent jobs, linked servers, and so on, will move to another server should the underlying server encounter a problem. All FCIs require some sort of shared storage, even if it's provided via networking. The FCI's resources can only be running and owned by one node at any given time. In the diagram below, the first node of the cluster owns the FCI, which also means it owns the shared storage resources associated with it denoted by the solid line to the storage.
103+
Clustered installations have been a feature of [!INCLUDE[ssnoversion-md](../ssnoversion-md.md)] since version 6.5. FCIs are a proven method of providing availability for the entire installation of [!INCLUDE[ssnoversion-md](../ssnoversion-md.md)], known as an instance. This means that everything inside the instance, including databases, [!INCLUDE[ssnoversion-md](../ssnoversion-md.md)] Agent jobs, linked servers, and so on, will move to another server should the underlying server encounter a problem. All FCIs require some sort of shared storage, even if it's provided via networking. The FCI's resources can only be running and owned by one node at any given time. In the following diagram, the first node of the cluster owns the FCI, which also means it owns the shared storage resources associated with it denoted by the solid line to the storage.
104104

105-
:::image type="content" source="media/business-continuity/failover-cluster-instance.png" alt-text="Diagram of a Failover Cluster Instance.":::
105+
:::image type="content" source="media/business-continuity/failover-cluster-instance.png" alt-text="Diagram of a failover cluster instance.":::
106106

107-
After a failover, ownership changes as is seen in the diagram below.
107+
After a failover, ownership changes as is seen in the following diagram:
108108

109-
:::image type="content" source="media/business-continuity/failover-cluster-instance-post-failover.png" alt-text="Diagram of a Failover Cluster Instance, post failover.":::
109+
:::image type="content" source="media/business-continuity/failover-cluster-instance-post-failover.png" alt-text="Diagram of a failover cluster instance, post failover.":::
110110

111111
There's zero data loss with an FCI, but the underlying shared storage is a single point of failure since there's one copy of the data. FCIs are often combined with another availability method, such as an AG or log shipping, to have redundant copies of databases. The additional method deployed should use physically separate storage from the FCI. When the FCI fails over to another node, it stops on one node and starts on another, not unlike powering off a server and turning it on. An FCI goes through the normal recovery process, meaning any transactions that need to be rolled forward will be, and any transactions that are incomplete will be rolled back. Therefore, the database is consistent from a data point to the time of the failure or manual failover, hence no data loss. Databases are only available after recovery is complete, so recovery time will depend on many factors, and will generally be longer than failing over an AG. The tradeoff is that when you fail over an AG, there may be extra tasks required to make a database usable, such as enabling a [!INCLUDE[ssnoversion-md](../ssnoversion-md.md)] Agent job.
112112

113113
Like an AG, FCIs abstract which node of the underlying cluster is hosting it. An FCI always retains the same name. Applications and end users never connect to the nodes; the unique name assigned to the FCI is used. An FCI can participate in an AG as one of the instances hosting either a primary or secondary replica.
114114

115-
The list below highlights some differences with FCIs on Windows Server versus Linux:
115+
The following list highlights some differences with FCIs on Windows Server versus Linux:
116116

117117
- On Windows Server, an FCI is part of the installation process. An FCI on Linux is configured after installing [!INCLUDE[ssnoversion-md](../ssnoversion-md.md)].
118118
- Linux only supports a single installation of [!INCLUDE[ssnoversion-md](../ssnoversion-md.md)] per host, so all FCIs will be a default instance. Windows Server supports up to 25 FCIs per WSFC.
@@ -124,7 +124,7 @@ If recovery point and recovery time objectives are more flexible, or databases a
124124

125125
:::image type="content" source="media/business-continuity/log-shipping.png" alt-text="Diagram of Log Shipping.":::
126126

127-
Arguably the biggest advantage of using log shipping in some capacity is that it accounts for human error. The application of transaction logs can be delayed. Therefore, if someone issues something like an UPDATE without a WHERE clause, the standby may not have the change so you could switch to that while you repair the primary system. While log shipping is easy to configure, switching from the primary to a warm standby, known as a role change, is always manual. A role change is initiated via Transact-SQL, and like an AG, all objects not captured in the transaction log must be manually synchronized. Log shipping also needs to be configured per database, whereas a single AG can contain multiple databases.
127+
Arguably the biggest advantage of using log shipping in some capacity is that it accounts for human error. The application of transaction logs can be delayed. Therefore, if someone issues something like an `UPDATE` without a `WHERE` clause, the standby may not have the change so you could switch to that while you repair the primary system. While log shipping is easy to configure, switching from the primary to a warm standby, known as a role change, is always manual. A role change is initiated via Transact-SQL, and like an AG, all objects not captured in the transaction log must be manually synchronized. Log shipping also needs to be configured per database, whereas a single AG can contain multiple databases.
128128

129129
Unlike an AG or FCI, log shipping has no abstraction for a role change, which applications must be able to handle. Techniques such as a DNS alias (CNAME) could be employed, but there are pros and cons, such as the time it takes for DNS to refresh after the switch.
130130

@@ -134,7 +134,7 @@ When your primary availability location experiences a catastrophic event like an
134134

135135
### Availability groups
136136

137-
One of the benefits of AGs is that both high availability and disaster recovery can be configured using a single feature. Without the requirement for ensuring that shared storage is also highly available, it's much easier to have replicas that are local in one data center for high availability, and remote ones in other data centers for disaster recovery each with separate storage. Having extra copies of the database is the tradeoff for ensuring redundancy. An example of an AG that spans multiple data centers is shown below. One primary replica is responsible for keeping all secondary replicas synchronized.
137+
One of the benefits of AGs is that both high availability and disaster recovery can be configured using a single feature. Without the requirement for ensuring that shared storage is also highly available, it's much easier to have replicas that are local in one data center for high availability, and remote ones in other data centers for disaster recovery each with separate storage. Having extra copies of the database is the tradeoff for ensuring redundancy. An example of an AG that spans multiple data centers is shown in the following diagram. One primary replica is responsible for keeping all secondary replicas synchronized.
138138

139139
:::image type="content" source="media/business-continuity/availability-group-span.png" alt-text="Diagram of an availability group spanning data centers.":::
140140

docs/includes/business-continuity/migrations.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ AGs can provide minimal downtime during patching of the underlying OS by manuall
3333
### Failover cluster instances
3434

3535
FCIs on their own can't assist with a traditional migration or upgrade; an AG or log shipping would have to be configured for the databases in the FCI and all other objects accounted for. However, FCIs under Windows Server are still a popular option for when the underlying Windows Servers need to be patched. A manual failover can be initiated, which means a brief outage instead of having the instance unavailable for the entire time that Windows Server is being patched.
36-
An FCI can be upgraded in place to later versions of [!INCLUDE[ssnoversion-md](../ssnoversion-md.md)]. For information, see [Upgrade a SQL Server Failover Cluster Instance](../../sql-server/failover-clusters/windows/upgrade-a-sql-server-failover-cluster-instance.md).
36+
An FCI can be upgraded in place to later versions of [!INCLUDE[ssnoversion-md](../ssnoversion-md.md)]. For information, see [Upgrade a SQL Server failover cluster instance](../../sql-server/failover-clusters/windows/upgrade-a-sql-server-failover-cluster-instance.md).
3737

3838
### Log shipping
3939

@@ -78,6 +78,6 @@ Starting with [!INCLUDE[sssql17-md](../sssql17-md.md)], It's possible to create
7878

7979
The only major caveat is that due to no underlying cluster with a cluster type of None, configuring read-only routing is a little different. From a [!INCLUDE[ssnoversion-md](../ssnoversion-md.md)] perspective, a listener is still required to route the requests even though there's no cluster. Instead of configuring a traditional listener, the IP address or name of the primary replica is used. The primary replica is then used to route the read-only requests.
8080

81-
A log shipping warm standby can technically be configured for readable usage by restoring the database WITH STANDBY. However, because the transaction logs require exclusive use of the database for restoration, it means that users can't be accessing the database while that happens. This makes log shipping a less than ideal solution - especially if near real-time data is required.
81+
A log shipping warm standby can technically be configured for readable usage by restoring the database `WITH STANDBY`. However, because the transaction logs require exclusive use of the database for restoration, it means that users can't be accessing the database while that happens. This makes log shipping a less than ideal solution - especially if near real-time data is required.
8282

8383
One thing that should be noted for all read-scale scenarios with AGs is that unlike using transactional replication where all of the data is live, each secondary replica isn't in a state where unique indexes can be applied, the replica is an exact copy of the primary. If any indexes are required for reporting or data needs to be manipulated, they must be created on the database(s) on the primary replica. If you need that flexibility, replication is a better solution for readable data.

0 commit comments

Comments
 (0)