You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/linux/sql-server-linux-availability-group-cluster-rhel.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -83,7 +83,7 @@ sudo pcs property set stonith-enabled=false
83
83
`Start-failure-is-fatal` indicates whether a failure to start a resource on a node prevents further start attempts on that node. When set to `false`, the cluster will decide whether to try starting on the same node again based on the resource's current failure count and migration threshold. So, after failover occurs, Pacemaker will retry starting the availability group resource on the former primary once the SQL instance is available. Pacemaker will take care of demoting the replica to secondary and it will automatically rejoin the availability group.
84
84
To update the property value to false run:
85
85
```bash
86
-
pcs property set start-failure-is-fatal=false
86
+
sudo pcs property set start-failure-is-fatal=false
87
87
```
88
88
If the property has the default value of `true`, if first attempt to start the resource fails, user intervention is required after an automatic failover to cleanup the resource failure count and reset the configuration using: `pcs resource cleanup <resourceName>` command.
Copy file name to clipboardExpand all lines: docs/linux/sql-server-linux-availability-group-cluster-sles.md
+65-25Lines changed: 65 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,7 +31,7 @@ This guide provides instructions to create a two-node cluster for SQL Server on
31
31
For more details on cluster configuration, resource agent options, management, best practices, and recommendations, see [SUSE Linux Enterprise High Availability Extension 12 SP2](https://www.suse.com/documentation/sle-ha-12/index.html).
32
32
33
33
>[!NOTE]
34
-
>At this point, SQL Server's integration with Pacemaker on Linux is not as coupled as with WSFC on Windows. SQL Server service on Linux is not cluster aware. Pacemaker controls all of the orchestration of the cluster resources as if SQL Server were a standalone instance. Also, virtual network name is specific to WSFC, there is no equivalent of the same in Pacemaker. On Linux, Always On Availability Group Dynamic Management Views (DMVs) will return empty rows. You can still create a listener to use it for transparent reconnection after failover, but you will have to manually register the listener name in the DNS server with the IP used to create the virtual IP resource (as explained below).
34
+
>At this point, SQL Server's integration with Pacemaker on Linux is not as coupled as with WSFC on Windows. SQL Server service on Linux is not cluster aware. Pacemaker controls all of the orchestration of the cluster resources, including the availability group resource. On Linux, you should not rely on Always On Availability Group Dynamic Management Views (DMVs) that provide cluster information like sys.dm_hadr_cluster. Also, virtual network name is specific to WSFC, there is no equivalent of the same in Pacemaker. You can still create a listener to use it for transparent reconnection after failover, but you will have to manually register the listener name in the DNS server with the IP used to create the virtual IP resource (as explained below).
35
35
36
36
37
37
## Roadmap
@@ -49,38 +49,39 @@ The steps to create an availability group on Linux servers for high availability
49
49
>[!IMPORTANT]
50
50
>Production environments require a fencing agent, like STONITH for high availability. The demonstrations in this documentation do not use fencing agents. The demonstrations are for testing and validation only.
51
51
52
-
>A Linux cluster uses fencing to return the cluster to a known state. The way to configure fencing depends on the distribution and the environment. At this time, fencing is not available in some cloud environments. See [SUSE Linux Enterprise High Availability Extension](https://www.suse.com/documentation/sle-ha-12/singlehtml/book_sleha/book_sleha.html#cha.ha.fencing).
52
+
>A Pacemaker cluster uses fencing to return the cluster to a known state. The way to configure fencing depends on the distribution and the environment. At this time, fencing is not available in some cloud environments. See [SUSE Linux Enterprise High Availability Extension](https://www.suse.com/documentation/sle-ha-12/singlehtml/book_sleha/book_sleha.html#cha.ha.fencing).
53
53
54
54
5.[Add the availability group as a resource in the cluster](sql-server-linux-availability-group-cluster-sles.md#configure-the-cluster-resources-for-sql-server).
55
55
56
56
## Prerequisites
57
57
58
-
To complete the end-to-end scenario below you need two machines to deploy the two nodes cluster. The steps below outline how to configure these servers.
58
+
To complete the end-to-end scenario below you need three machines to deploy the three nodes cluster. The steps below outline how to configure these servers.
59
59
60
60
## Setup and configure the operating system on each cluster node
61
61
62
62
The first step is to configure the operating system on the cluster nodes. For this walk through, use SLES 12 SP2 with a valid subscription for the HA add-on.
63
63
64
64
### Install and configure SQL Server service on each cluster node
65
65
66
-
1. Install and setup SQL Server service on both nodes. For detailed instructions see [Install SQL Server on Linux](sql-server-linux-setup.md).
66
+
1. Install and setup SQL Server service on all nodes. For detailed instructions see [Install SQL Server on Linux](sql-server-linux-setup.md).
67
67
68
-
1. Designate one node as primary and the other as secondary, for purposes of configuration. Use these terms throughout this guide.
68
+
1. Designate one node as primary and other nodes as secondaries. Use these terms throughout this guide.
69
69
70
70
1. Make sure nodes that are going to be part of the cluster can communicate to each other.
71
71
72
-
The following example shows `/etc/hosts` with additions for two nodes named SLES1and SLES2.
72
+
The following example shows `/etc/hosts` with additions for three nodes named SLES1, SLES2 and SLES3.
73
73
74
74
```
75
75
127.0.0.1 localhost
76
-
10.128.18.128 SLES1
76
+
10.128.16.33 SLES1
77
77
10.128.16.77 SLES2
78
+
10.128.16.22 SLES3
78
79
```
79
80
80
81
All cluster nodes must be able to access each other via SSH. Tools like `hb_report` or `crm_report` (for troubleshooting) and Hawk's History Explorer require passwordless SSH access between the nodes, otherwise they can only collect data from the current node. In case you use a non-standard SSH port, use the -X option (see `man` page). For example, if your SSH port is 3479, invoke a `crm_report` with:
81
82
82
83
```bash
83
-
crm_report -X "-p 3479" [...]
84
+
sudo crm_report -X "-p 3479" [...]
84
85
```
85
86
86
87
For additional information, see the [SLES Administration Guide - Miscellaneous section](http://www.suse.com/documentation/sle-ha-12/singlehtml/book_sleha/book_sleha.html#sec.ha.troubleshooting.misc).
@@ -113,7 +114,7 @@ On Linux servers configure the availability group and then configure the cluster
113
114
1. Log in as `root` to the physical or virtual machine you want to use as cluster node.
114
115
2. Start the bootstrap script by executing:
115
116
```bash
116
-
ha-cluster-init
117
+
sudo ha-cluster-init
117
118
```
118
119
119
120
If NTP has not been configured to start at boot time, a message appears.
@@ -134,15 +135,15 @@ On Linux servers configure the availability group and then configure the cluster
134
135
4. For any details of the setup process, check `/var/log/sleha-bootstrap.log`. You now have a running one-node cluster. Check the cluster status with crm status:
135
136
136
137
```bash
137
-
crm status
138
+
sudo crm status
138
139
```
139
140
140
141
You can also see cluster configuration with `crm configure show xml` or `crm configure show`.
141
142
142
143
5. The bootstrap procedure creates a Linux user named hacluster with the password linux. Replace the default password with a secure one as soon as possible:
143
144
144
145
```bash
145
-
passwd hacluster
146
+
sudo passwd hacluster
146
147
```
147
148
148
149
## Add nodes to the existing cluster
@@ -157,7 +158,7 @@ If you have configured the existing cluster nodes with the `YaST` cluster module
157
158
2. Start the bootstrap script by executing:
158
159
159
160
```bash
160
-
ha-cluster-join
161
+
sudo ha-cluster-join
161
162
```
162
163
163
164
If NTP has not been configured to start at boot time, a message appears.
@@ -168,14 +169,14 @@ After logging in to the specified node, the script will copy the Corosync config
168
169
169
170
6. For details of the process, check `/var/log/ha-cluster-bootstrap.log`.
170
171
171
-
1. Check the cluster status with `crm status`. If you have successfully added a second node, the output will be similar to the following:
172
+
1. Check the cluster status with `sudo crm status`. If you have successfully added a second node, the output will be similar to the following:
172
173
173
174
```bash
174
-
crm status
175
+
sudo crm status
175
176
176
-
2 nodes configured
177
+
3 nodes configured
177
178
1 resource configured
178
-
Online: [ SLES1 SLES2 ]
179
+
Online: [ SLES1 SLES2 SLES3]
179
180
Full list of resources:
180
181
admin_addr (ocf::heartbeat:IPaddr2): Started SLES1
181
182
```
@@ -185,29 +186,68 @@ After logging in to the specified node, the script will copy the Corosync config
185
186
186
187
After adding all nodes, check if you need to adjust the no-quorum-policy in the global cluster options. This is especially important for two-node clusters. For more information, refer to Section 4.1.2, Option no-quorum-policy.
187
188
189
+
## Set cluster property start-failure-is-fatal to false
190
+
191
+
`Start-failure-is-fatal` indicates whether a failure to start a resource on a node prevents further start attempts on that node. When set to `false`, the cluster will decide whether to try starting on the same node again based on the resource's current failure count and migration threshold. So, after failover occurs, Pacemaker will retry starting the availability group resource on the former primary once the SQL instance is available. Pacemaker will take care of demoting the replica to secondary and it will automatically rejoin the availability group. Also, if `start-failure-is-fatal` is set to `false`, the cluster will fall back to the configured failcount limits configured with migration-threshold so you need to make sure default for migration threshold is updated accordingly.
If the property has the default value of `true`, if first attempt to start the resource fails, user intervention is required after an automatic failover to cleanup the resource failure count and reset the configuration using: `sudo crm resource cleanup <resourceName>` command.
199
+
200
+
For more details on Pacemaker cluster properties see [Configuring Cluster Resources](https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_config_crm_resources.html).
201
+
202
+
# Configure fencing (STONITH)
203
+
Pacemaker cluster vendors require STONITH to be enabled and a fencing device configured for a supported cluster setup. When the cluster resource manager cannot determine the state of a node or of a resource on a node, fencing is used to bring the cluster to a known state again.
204
+
Resource level fencing ensures mainly that there is no data corruption in case of an outage by configuring a resource. You can use resource level fencing, for instance, with DRBD (Distributed Replicated Block Device) to mark the disk on a node as outdated when the communication link goes down.
205
+
Node level fencing ensures that a node does not run any resources. This is done by resetting the node and the Pacemaker implementation of it is called STONITH (which stands for "shoot the other node in the head"). Pacemaker supports a great variety of fencing devices, e.g. an uninterruptible power supply or management interface cards for servers.
206
+
For more details, see [Pacemaker Clusters from Scratch](http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/ch05.html), [Fencing and Stonith](http://clusterlabs.org/doc/crm_fencing.html) and [SUSE HA documentation: Fencing and STONITH](https://www.suse.com/documentation/sle_ha/book_sleha/data/cha_ha_fencing.html).
207
+
208
+
At cluster initialization time, STONITH is disabled if no configuration is detected. It can be enabled later by running below command
209
+
210
+
```bash
211
+
sudo crm configure property stonith-enabled=true
212
+
```
213
+
214
+
>[!IMPORTANT]
215
+
>Disabling STONITH is just for testing purposes. If you plan to use Pacemaker in a production environment, you should plan a STONITH implementation depending on your environment and keep it enabled. Note that SUSE does not provide fencing agents for any cloud environments (including Azure) or Hyper-V. Consequentially, the cluster vendor does not offer support for running production clusters in these environments. We are working on a solution for this gap that will be available in future releases.
216
+
217
+
188
218
## Configure the cluster resources for SQL Server
189
219
190
220
Refer to [SLES Administration Guid](https://www.suse.com/documentation/sle-ha-12/singlehtml/book_sleha/book_sleha.html#cha.ha.manual_config)
191
221
192
222
### Create availability group resource
193
223
194
-
The following command creates and configures the availability group resource for 2 replicas of availability group [ag1]. Run the command on one of the nodes in the cluster:
224
+
The following command creates and configures the availability group resource for 3 replicas of availability group [ag1]. The monitor operations and timeouts have to be specified explicitly in SLES based on the fact that timeouts are highly workload dependent and need to be carefully adjusted for each deployment.
225
+
Run the command on one of the nodes in the cluster:
195
226
196
227
1. Run `crm configure` to open the crm prompt:
197
228
198
229
```bash
199
-
crm configure
230
+
sudo crm configure
200
231
```
201
232
202
233
1. In the crm prompt, run the command below to configure the resource properties.
203
234
204
235
```bash
205
-
primitive ag_cluster \
206
-
ocf:mssql:ag \
207
-
params ag_name="ag1"
208
-
ms ms-ag_cluster ag_cluster \
209
-
meta notify="true"
210
-
commit
236
+
primitive ag_cluster \
237
+
ocf:mssql:ag \
238
+
params ag_name="ag1" \
239
+
op start timeout=60s \
240
+
op stop timeout=60s \
241
+
op promote timeout=60s \
242
+
op demote timeout=10s \
243
+
op monitor timeout=60s interval=10s \
244
+
op monitor timeout=60s interval=11s role="Master" \
245
+
op monitor timeout=60s interval=12s role="Slave" \
246
+
op notify timeout=60s
247
+
ms ms-ag_cluster ag_cluster \
248
+
meta master-max="1" master-node-max="1" clone-max="3" \
249
+
clone-node-max="1" notify="true" \
250
+
commit
211
251
```
212
252
213
253
### Create virtual IP resource
@@ -246,7 +286,7 @@ To prevent the IP address from temporarily pointing to the node with the pre-fai
246
286
To add an ordering constraint, run the following command on one node:
247
287
248
288
```bash
249
-
crm configure \
289
+
crm crm configure \
250
290
order ag_first inf: ms-ag_cluster:promote admin_addr:start
Copy file name to clipboardExpand all lines: docs/linux/sql-server-linux-availability-group-cluster-ubuntu.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -158,7 +158,7 @@ sudo pcs property set stonith-enabled=false
158
158
To update the property value to `false` run:
159
159
160
160
```bash
161
-
pcs property set start-failure-is-fatal=false
161
+
sudo pcs property set start-failure-is-fatal=false
162
162
```
163
163
If the property has the default value of `true`, if first attempt to start the resource fails, user intervention is required after an automatic failover to cleanup the resource failure count and reset the configuration using: `pcs resource cleanup <resourceName>` command.
0 commit comments