Skip to content

Commit 38b2d26

Browse files
committed
Refresh low-scoring Linux include file (Pacemaker cluster concepts)
1 parent d25f51a commit 38b2d26

1 file changed

Lines changed: 20 additions & 21 deletions

File tree

Lines changed: 20 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
---
22
author: rwestMSFT
33
ms.author: randolphwest
4-
ms.reviewer: randolphwest
5-
ms.date: 09/15/2022
4+
ms.date: 09/14/2023
65
ms.service: sql
76
ms.subservice: linux
87
ms.topic: include
@@ -11,30 +10,30 @@ ms.custom:
1110
---
1211
## <a id="pacemakerNotify"></a> Understand SQL Server resource agent for Pacemaker
1312

14-
[!INCLUDE [sssql17-md](../../includes/sssql17-md.md)] introduced `sequence_number` to `sys.availability_groups` to show if a replica marked as `SYNCHRONOUS_COMMIT` was up to date. `sequence_number` is a monotonically increasing BIGINT that represents how up-to-date the local availability group replica is with respect to the rest of the replicas in the availability group. Performing failovers, adding or removing replicas, and other availability group operations update this number. The number is updated on the primary, then pushed to secondary replicas. Thus a secondary replica that is up-to-date will have the same sequence_number as the primary.
13+
[!INCLUDE [sssql17-md](../../includes/sssql17-md.md)] introduced `sequence_number` to `sys.availability_groups` to show if a replica marked as `SYNCHRONOUS_COMMIT` was up to date. `sequence_number` is a monotonically increasing BIGINT that represents how up-to-date the local availability group replica is with respect to the rest of the replicas in the availability group. Performing failovers, adding or removing replicas, and other availability group operations update this number. The number is updated on the primary, then pushed to secondary replicas. Thus a secondary replica that is up-to-date has the same `sequence_number` as the primary.
1514

16-
When Pacemaker decides to promote a replica to primary, it first sends a notification to all replicas to extract the sequence number and store it (we call this the pre-promote notification). Next, when Pacemaker actually tries to promote a replica to primary, the replica only promotes itself if its sequence number is the highest of all the sequence numbers from all replicas and rejects the promote operation otherwise. In this way only the replica with the highest sequence number can be promoted to primary, ensuring no data loss.
15+
When Pacemaker decides to promote a replica to primary, it first sends a notification to all replicas to extract the sequence number and store it (this notification is called the pre-promote notification). Next, when Pacemaker tries to promote a replica to primary, the replica only promotes itself if its sequence number is the highest of all the sequence numbers from all replicas, otherwise it rejects the promote operation. In this way only the replica with the highest sequence number can be promoted to primary, ensuring no data loss.
1716

18-
This is only guaranteed to work as long as at least one replica available for promotion has the same sequence number as the previous primary. To ensure this, the default behavior is for the Pacemaker resource agent to automatically set `REQUIRED_COPIES_TO_COMMIT` such that at least one synchronous commit secondary replica is up to date and available to be the target of an automatic failover. With each monitoring action, the value of `REQUIRED_COPIES_TO_COMMIT` is computed (and updated if necessary) as ('number of synchronous commit replicas' / 2). Then, at failover time, the resource agent will require (`total number of replicas` - `required_copies_to_commit` replicas) to respond to the pre-promote notification to be able to promote one of them to primary. The replica with the highest `sequence_number` will be promoted to primary.
17+
Promotion is only guaranteed to work as long as at least one replica available for promotion has the same sequence number as the previous primary. The default behavior is for the Pacemaker resource agent to automatically set `REQUIRED_COPIES_TO_COMMIT` such that at least one synchronous commit secondary replica is up to date and available, to be the target of an automatic failover. With each monitoring action, the value of `REQUIRED_COPIES_TO_COMMIT` is computed (and updated if necessary) as ('number of synchronous commit replicas' / 2). Then, at failover time, the resource agent requires (`total number of replicas` - `required_copies_to_commit` replicas) to respond to the pre-promote notification to be able to promote one of them to primary. The replica with the highest `sequence_number` is promoted to primary.
1918

2019
For example, let's consider the case of an availability group with three synchronous replicas - one primary replica and two synchronous commit secondary replicas.
2120

2221
- `REQUIRED_COPIES_TO_COMMIT` is 3 / 2 = 1
2322

24-
- The required number of replicas to respond to pre-promote action is 3 - 1 = 2. So 2 replicas have to be up for the failover to be triggered. This means that, in the case of primary outage, if one of the secondary replicas is unresponsive and only one of the secondaries responds to the pre-promote action, the resource agent can't guarantee that the secondary that responded has the highest sequence_number, and a failover isn't triggered.
23+
- The required number of replicas to respond to pre-promote action is 3 - 1 = 2. So two replicas have to be up for the failover to be triggered. When a primary outage occurs, if one of the secondary replicas is unresponsive and only one of the secondaries responds to the pre-promote action, the resource agent can't guarantee that the secondary that responded has the highest `sequence_number`, and a failover isn't triggered.
2524

26-
A user can choose to override the default behavior, and configure the availability group resource to not set `REQUIRED_COPIES_TO_COMMIT` automatically as above.
25+
A user can choose to override the default behavior, and configure the availability group resource to not set `REQUIRED_COPIES_TO_COMMIT` automatically as shown previously.
2726

2827
> [!IMPORTANT]
29-
> When `REQUIRED_COPIES_TO_COMMIT` is 0 there is risk of data loss. In the case of an outage of the primary, the resource agent will not automatically trigger a failover. The user has to decide if they want to wait for primary to recover or manually fail over.
28+
> When `REQUIRED_COPIES_TO_COMMIT` is `0` there's risk of data loss. In the case of an outage of the primary, the resource agent will not automatically trigger a failover. The user has to decide if they want to wait for primary to recover or manually fail over.
3029
31-
To set `REQUIRED_COPIES_TO_COMMIT` to 0, run:
30+
To set `REQUIRED_COPIES_TO_COMMIT` to `0`, run:
3231

3332
```bash
3433
sudo pcs resource update <ag_cluster> required_copies_to_commit=0
3534
```
3635

37-
The equivalent command using crm (on SLES) is:
36+
The equivalent command using **crm** (on SLES) is:
3837

3938
```bash
4039
sudo crm resource param <ag_cluster> set required_synchronized_secondaries_to_commit 0
@@ -47,37 +46,37 @@ sudo pcs resource update <ag_cluster> required_copies_to_commit=
4746
```
4847

4948
> [!NOTE]
50-
> Updating resource properties causes all replicas to stop and restart. This means primary will temporarily be demoted to secondary, then promoted again which will cause temporary write unavailability. The new value for REQUIRED_COPIES_TO_COMMIT will only be set once replicas are restarted, so it won't be instantaneous with running the pcs command.
49+
> Updating resource properties causes all replicas to stop and restart. This means primary will temporarily be demoted to secondary, then promoted again which will cause temporary write unavailability. The new value for `REQUIRED_COPIES_TO_COMMIT` will only be set once replicas are restarted, so it won't be instantaneous with running the **pcs** command.
5150
5251
## Balance high availability and data protection
5352

54-
The above default behavior applies to the case of 2 synchronous replicas (primary + secondary) as well. Pacemaker will default `REQUIRED_COPIES_TO_COMMIT = 1` to ensure the secondary replica is always up to date for maximum data protection.
53+
The above default behavior applies to the case of two synchronous replicas (primary + secondary) as well. Pacemaker defaults `REQUIRED_COPIES_TO_COMMIT = 1` to ensure the secondary replica is always up to date for maximum data protection.
5554

5655
> [!WARNING]
57-
> This comes with higher risk of unavailability of the primary replica due to planned or unplanned outages on the secondary. The user can choose to change the default behavior of the resource agent and override the `REQUIRED_COPIES_TO_COMMIT` to 0:
56+
> This comes with higher risk of unavailability of the primary replica due to planned or unplanned outages on the secondary. The user can choose to change the default behavior of the resource agent and override the `REQUIRED_COPIES_TO_COMMIT` to `0`:
5857
5958
```bash
6059
sudo pcs resource update <ag1> required_copies_to_commit=0
6160
```
6261

63-
Once overridden, the resource agent will use the new setting for `REQUIRED_COPIES_TO_COMMIT` and stop computing it. This means that users have to manually update it accordingly (for example, if they increase the number of replicas).
62+
Once overridden, the resource agent uses the new setting for `REQUIRED_COPIES_TO_COMMIT` and stops computing it. Users have to manually update it accordingly (for example, if they increase the number of replicas).
6463

65-
The tables below describes the outcome of an outage for primary or secondary replicas in different availability group resource configurations:
64+
The following tables describe the outcome of an outage for primary or secondary replicas in different availability group resource configurations:
6665

67-
### Availability group - 2 sync replicas
66+
### Availability group - two sync replicas
6867

6968
| Configuration | Primary outage | One secondary replica outage |
7069
| :--- | :--- | :--- |
71-
| `REQUIRED_COPIES_TO_COMMIT=0` | User has to issue a manual FAILOVER.<br />Might have data loss.<br />New primary is R/W | Primary is R/W, running exposed to data loss |
72-
| `REQUIRED_COPIES_TO_COMMIT=1` <sup>1</sup> | Cluster will automatically issue FAILOVER<br />No data loss.<br />New primary will reject all connections until former primary recovers and joins availability group as secondary. | Primary will reject all connections until secondary recovers. |
70+
| `REQUIRED_COPIES_TO_COMMIT = 0` | User has to issue a manual `FAILOVER`.<br />Might have data loss.<br />New primary is R/W | Primary is R/W, running exposed to data loss. |
71+
| `REQUIRED_COPIES_TO_COMMIT = 1` <sup>1</sup> | Cluster automatically issues `FAILOVER`<br />No data loss.<br />New primary rejects all connections until former primary recovers and joins availability group as secondary. | Primary rejects all connections until secondary recovers. |
7372

7473
<sup>1</sup> SQL Server resource agent for Pacemaker default behavior.
7574

76-
### Availability group - 3 sync replicas
75+
### Availability group - three sync replicas
7776

7877
| Configuration |Primary outage |One secondary replica outage
7978
| :--- | :--- | :--- |
80-
|`REQUIRED_COPIES_TO_COMMIT=0`|User has to issue a manual FAILOVER.<br />Might have data loss.<br />New primary is R/W |Primary is R/W
81-
|`REQUIRED_COPIES_TO_COMMIT=1` <sup>1</sup> |Cluster will automatically issue FAILOVER.<br />No data loss.<br />New primary is RW |Primary is R/W
79+
| `REQUIRED_COPIES_TO_COMMIT = 0` | User has to issue a manual `FAILOVER`.<br />Might have data loss.<br />New primary is R/W | Primary is R/W |
80+
| `REQUIRED_COPIES_TO_COMMIT = 1` <sup>1</sup> | Cluster automatically issues `FAILOVER`.<br />No data loss.<br />New primary is RW | Primary is R/W |
8281

8382
<sup>1</sup> SQL Server resource agent for Pacemaker default behavior.

0 commit comments

Comments
 (0)