Skip to content

Commit 8885164

Browse files
authored
Merge pull request #14702 from MikeRayMSFT/20200424-bdc-troubleshoot
Stage AD reverse lookup zone.
2 parents 73cf5ab + f5c7bc9 commit 8885164

3 files changed

Lines changed: 221 additions & 66 deletions

File tree

docs/big-data-cluster/troubleshoot-active-directory.md

Lines changed: 74 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Troubleshoot Active Directory integration
2+
title: Troubleshoot Active Directory domain group scope
33
titleSuffix: SQL Server Big Data Cluster
44
description: Troubleshoot deployment of a SQL Server Big Data Cluster in an Active Directory domain.
55
author: rl-msft
@@ -17,25 +17,81 @@ ms.technology: big-data-cluster
1717

1818
This article explains how to troubleshoot deployment of a SQL Server Big Data Cluster in Active Directory mode.
1919

20-
## Check deployment progress
20+
## Symptom
2121

22-
Deployment can take several minutes. If the cluster is not ready after 15 minutes, check controller logs for more details.
22+
You started deploying BDC with AD mode however the deployment is stuck and not moving forward.
2323

24-
While the cluster is deploying, check the pods.
24+
The following example shows the deployment results in a bash shell.
2525

26-
```console
27-
kubectl get pods -n mssql-cluster
26+
```
27+
The privacy statement can be viewed at:
28+
https://go.microsoft.com/fwlink/?LinkId=853010
29+
 
30+
The license terms for SQL Server Big Data Cluster can be viewed at:
31+
Enterprise: https://go.microsoft.com/fwlink/?linkid=2104292
32+
Standard: https://go.microsoft.com/fwlink/?linkid=2104294
33+
Developer: https://go.microsoft.com/fwlink/?linkid=2104079
34+
 
35+
Cluster deployment documentation can be viewed at:
36+
https://aka.ms/bdc-deploy
37+
 
38+
NOTE: Cluster creation can take a significant amount of time depending on
39+
configuration, network speed, and the number of nodes in the cluster.
40+
 
41+
Starting cluster deployment.
42+
Cluster controller endpoint is available at bdc-control.contoso.com:30080, 193.168.5.14:30080.
43+
Waiting for control plane to be ready after 5 minutes.
44+
Waiting for control plane to be ready after 10 minutes.
45+
Waiting for control plane to be ready after 15 minutes.
46+
Waiting for control plane to be ready after 20 minutes.
47+
Waiting for control plane to be ready after 25 minutes.
2848
```
2949

30-
Verify that the list of pods returned includes:
50+
Check the current deployed pods.
3151

32-
- `compute-`$
33-
- `data-`
34-
- `storage-`
52+
```bash
53+
kubectl get pods -n mssql-cluster
54+
```
3555

36-
If the compute, data, and storage pods are not created, check the logs to identify why.
56+
The following list shows only pods that belong to the controller have been deployed. No Compute, data or storage pool pods are being created.
57+
58+
```
59+
NAME READY STATUS RESTARTS AGE
60+
appproxy-6q4rm 2/2 Running 0 32m
61+
compute-0-0 3/3 Running 0 32m
62+
control-n8jqh 3/3 Running 0 35m
63+
controldb-0 2/2 Running 0 35m
64+
controlwd-fgpj8 1/1 Running 0 34m
65+
data-0-0 3/3 Running 0 32m
66+
data-0-1 3/3 Running 0 32m
67+
dns-fjp7n 2/2 Running 0 34m
68+
gateway-0 2/2 Running 0 32m
69+
logsdb-0 1/1 Running 0 34m
70+
logsui-d26c5 1/1 Running 0 34m
71+
master-0 3/4 Running 0 32m
72+
master-1 3/4 Running 0 32m
73+
master-2 3/4 Running 0 32m
74+
metricsdb-0 1/1 Running 0 34m
75+
metricsdc-c2kbh 1/1 Running 0 34m
76+
metricsdc-lmqzx 1/1 Running 0 34m
77+
metricsdc-r6499 1/1 Running 0 34m
78+
metricsdc-tj99w 1/1 Running 0 34m
79+
metricsui-dg8rz 1/1 Running 0 34m
80+
mgmtproxy-dvzpc 2/2 Running 0 34m
81+
nmnode-0-0 2/2 Running 0 32m
82+
nmnode-0-1 2/2 Running 0 32m
83+
operator-27gt9 1/1 Running 0 32m
84+
sparkhead-0 4/4 Running 0 31m
85+
sparkhead-1 4/4 Running 0 31m
86+
storage-0-0 4/4 Running 0 31m
87+
storage-0-1 4/4 Running 0 31m
88+
storage-0-2 4/4 Running 0 31m
89+
zookeeper-0 2/2 Running 0 32m
90+
zookeeper-1 2/2 Running 0 32m
91+
zookeeper-2 2/2 Running 0 32m
92+
```
3793

38-
## Check logs
94+
### Check logs
3995

4096
To identify why deployment quit without creating compute, data, or storage pods, check the following logs:
4197

@@ -60,9 +116,11 @@ To identify why deployment quit without creating compute, data, or storage pods,
60116
WARNING | Retrying.
61117
```
62118

63-
In the example above, the deployment fails to create a login for the domain user because the domain group is scoped as domain local. Use domain global or domain universal scoped groups. [Deploy [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)] in Active Directory mode](deploy-active-directory.md) explains AD group scope requirements.
119+
## Cause
64120

65-
## Check the scope of domain groups.
121+
In the example above, the deployment fails to create a login for the domain user because the domain group is scoped as domain local. Use domain global or domain universal scoped groups. [Deploy [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)] in Active Directory mode](deploy-active-directory.md) explains AD group scope requirements.
122+
123+
## Resolution
66124

67125
Check the scope of the domain group (<`domain-group`>). Use [get-adgroup](/powershell/module/addsadministration/get-adgroup/).
68126

@@ -112,56 +170,7 @@ catch {
112170
$ClusterUsersGroupScope_Result
113171
```
114172

115-
## Check security-support container
116-
117-
Review the security-support container logs.
118-
119-
The following command collects the security-support logs in a cluster at namespace `mssql-cluster`.
120-
121-
```console
122-
azdata bdc debug copy-logs -n mssql-cluster -c security-support
123-
```
124-
125-
Extract the logs and locate `\mssql-cluster\control-<identifier>\controller\control-rts5t-controller-stdout.log`.
173+
## Resolution
126174

127-
Look for the following entries in the log:
128-
129-
```
130-
ERROR | Failed to create AD user account 'cntrl-controller'. Error code: 53. Message: Failed to create user object: Failed to add object 'CN=cntrl-controller,OU=bdc, DC=CONTOSO, DC=com' to ' <domain>.<top-level-domain> ': Server is unwilling to perform.
131-
ERROR | Failed to create AD user account 'ldap-user'. Error code: 53. Message: Failed to create user object: Failed to add object 'CN=ldap-user,OU=bdc, DC=CONTOSO, DC=com' to ' <domain>.<top-level-domain> ': Server is unwilling to perform.
132-
ERROR | Failed to create AD user account 'nginx-mgmtproxy'. Error code: 53. Message: Failed to create user object: Failed to add object 'CN=nginx-mgmtproxy,OU=bdc, DC=CONTOSO, DC=com' to ' <domain>.<top-level-domain> ': Server is unwilling to perform.
133-
```
134-
135-
These entries can happen when the domain controller DNS server is missing reverse DNS entry (PTR record).
136-
137-
## Verify reverse lookup (PTR record)
138-
139-
Run the following PowerShell script to confirm if you have reverse DNS entry (PTR record) configured.
140-
141-
```powershell
142-
#Domain Controller FQDN 'DCserver01.contoso.local'
143-
$Domain_controller_FQDN = 'DCserver01.contoso.local'
144-
145-
#Performing Domain Controller DNS record, reverse PTR Checks...
146-
$DcControllerDnsPtr_Result = New-Object System.Collections.ArrayList
147-
try {
148-
    $Domain_controller_DNS_Record = Resolve-DnsName $Domain_controller_FQDN -Type A -Server $Domain_DNS_IP_address -ErrorAction Stop
149-
    foreach ($ip in $Domain_controller_DNS_Record.IPAddress) {
150-
        #resolving hostname by IP address to make sure we have reverse PTR record 
151-
        if ((Resolve-DnsName $ip).NameHost -eq $Domain_controller_FQDN) {
152-
            [void]$DcControllerDnsPtr_Result.add("OK - $Domain_controller_FQDN has an A record with an IP $ip, Reverse PTR record is in place") 
153-
        }
154-
        else {
155-
            [void]$DcControllerDnsPtr_Result.add("Missing - $Domain_controller_FQDN has an A record with an IP $ip, But no reverse PTR record was found for the host")
156-
        }
157-
    }
158-
}
159-
catch {
160-
    [void]$DcControllerDnsPtr_Result.add("Error - " + $_.exception.message)
161-
}
162-
163-
#show the results
164-
$DcControllerDnsPtr_Result
165-
```
175+
To resolve the problem, create the AD groups with either universal or global scope and run the deployment again.
166176

167-
[Verify reverse DNS entry (PTR record) for domain controller](deploy-active-directory.md#verify-reverse-dns-entry-for-domain-controller).
Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
---
2+
title: AD mode deployment stopped - missing reverse lookup zone entry for DC
3+
titleSuffix: SQL Server Big Data Cluster
4+
description: Deployment of BDC with AD mode stuck due to missing reverse lookup zone entry for the domain controller in the domain controller DNS server.
5+
author: MikeRayMSFT
6+
ms.author: mikeray
7+
ms.reviewer: mikeray
8+
ms.date: 04/21/2020
9+
ms.topic: how-to
10+
ms.prod: sql
11+
ms.technology: big-data-cluster
12+
---
13+
14+
# AD mode deployment stopped - missing reverse lookup zone entry for DC
15+
16+
Deployment in Active Directory (AD) mode freezes. Check symptoms to see if cause is the domain controller DNS server is missing reverse lookup zone entry.
17+
18+
## Symptom
19+
20+
You started deploying BDC with AD mode however the deployment is stuck and not moving forward.
21+
22+
The following example shows the deployment results in a bash shell.
23+
24+
```
25+
The privacy statement can be viewed at:
26+
https://go.microsoft.com/fwlink/?LinkId=853010
27+
 
28+
The license terms for SQL Server Big Data Cluster can be viewed at:
29+
Enterprise: https://go.microsoft.com/fwlink/?linkid=2104292
30+
Standard: https://go.microsoft.com/fwlink/?linkid=2104294
31+
Developer: https://go.microsoft.com/fwlink/?linkid=2104079
32+
 
33+
Cluster deployment documentation can be viewed at:
34+
https://aka.ms/bdc-deploy
35+
 
36+
NOTE: Cluster creation can take a significant amount of time depending on
37+
configuration, network speed, and the number of nodes in the cluster.
38+
 
39+
Starting cluster deployment.
40+
Cluster controller endpoint is available at bdc-control.contoso.com:30080, 193.168.5.14:30080.
41+
Waiting for control plane to be ready after 5 minutes.
42+
Waiting for control plane to be ready after 10 minutes.
43+
Waiting for control plane to be ready after 15 minutes.
44+
Waiting for control plane to be ready after 20 minutes.
45+
Waiting for control plane to be ready after 25 minutes.
46+
```
47+
48+
Check the current deployed pods.
49+
50+
```bash
51+
kubectl get pods -n mssql-cluster
52+
```
53+
54+
The results below indicate that only pods belonging to the controller have been deployed. The pods for compute, data, or storage are not being created.
55+
56+
```
57+
NAME READY STATUS RESTARTS AGE
58+
control-rts5t 3/3 Running 0 18m
59+
controldb-0 2/2 Running 0 18m
60+
controlwd-csgst 1/1 Running 0 16m
61+
dns-7kfnz 2/2 Running 0 16m
62+
logsdb-0 1/1 Running 0 16m
63+
logsui-2pc29 1/1 Running 0 16m
64+
metricsdb-0 1/1 Running 0 16m
65+
metricsdc-4rtm4 1/1 Running 0 16m
66+
metricsdc-6lr2t 1/1 Running 0 16m
67+
metricsdc-ftx9m 1/1 Running 0 16m
68+
metricsdc-h59jb 1/1 Running 0 16m
69+
metricsui-lvdpt 1/1 Running 0 16m
70+
mgmtproxy-mkmxp 2/2 Running 0 16m
71+
```
72+
73+
Inspect the security support container logs. Look for LDAP errors.
74+
75+
## Check security-support container
76+
77+
Review the security-support container logs.
78+
79+
The following command collects the security-support logs in a cluster at namespace `mssql-cluster`.
80+
81+
```bash
82+
azdata bdc debug copy-logs -n mssql-cluster -c security-support
83+
```
84+
85+
Extract the logs and locate `\mssql-cluster\control-<identifier>\controller\control-rts5t-controller-stdout.log`.
86+
87+
> [!TIP]
88+
> There are multiple ways to collect the logs. Instead of copying the logs with `azdata`, you can use a notebook in Azure Data Studio.
89+
> In Azure Data Studio, connect to the Kubernetes cluster, and run an appropriate troubleshooting notebook. The following are examples of notebooks.
90+
>
91+
> - TSG027 - Observe cluster deployment
92+
> - TSG061 - Get tail of all container logs for pods in BDC namespace
93+
> - TSG001 - Run `azdata` copy-logs
94+
>
95+
96+
## Inspect the logs
97+
98+
Locate the log. The following example points to a controller deployment log.
99+
100+
`<folderOfDebugCopyLog>\debuglogs-mssql-cluster-YYYYMMDD-HHMMSS\<namespace>\control-<identifier>\controller\control-<identifier>-controller-stdout.log`"
101+
102+
```
103+
YYYY-MM-DD HH:MM:SS.ms | ERROR | Failed to create AD user account 'cntrl-controller'. Error code: 53. Message: Failed to create user object: Failed to add object 'CN=cntrl-controller,OU=bdc, DC=CONTOSO, DC=com' to 'CONTOSO.COM': Server is unwilling to perform.
104+
YYYY-MM-DD HH:MM:SS.ms | ERROR | Failed to create AD user account 'ldap-user'. Error code: 53. Message: Failed to create user object: Failed to add object 'CN=ldap-user,OU=bdc, DC=CONTOSO, DC=com' to 'CONTOSO.COM': Server is unwilling to perform.
105+
YYYY-MM-DD HH:MM:SS.ms | ERROR | Failed to create AD user account 'nginx-mgmtproxy'. Error code: 53. Message: Failed to create user object: Failed to add object 'CN=nginx-mgmtproxy,OU=bdc, DC=CONTOSO, DC=com' to 'CONTOSO.COM': Server is unwilling to perform.
106+
```
107+
108+
## Cause
109+
110+
The reverse lookup zone entry for the domain controller in the domain controller DNS entry is missing.
111+
112+
## Resolution
113+
114+
Run the following PowerShell script to confirm if you have reverse DNS entry (PTR record) configured.
115+
116+
```powershell
117+
#Domain Controller FQDN 'DCserver01.contoso.local'
118+
$Domain_controller_FQDN = 'DCserver01.contoso.local'
119+
120+
#Performing Domain Controller DNS record, reverse PTR Checks...
121+
$DcControllerDnsPtr_Result = New-Object System.Collections.ArrayList
122+
try {
123+
    $Domain_controller_DNS_Record = Resolve-DnsName $Domain_controller_FQDN -Type A -Server $Domain_DNS_IP_address -ErrorAction Stop
124+
    foreach ($ip in $Domain_controller_DNS_Record.IPAddress) {
125+
        #resolving hostname by IP address to make sure we have reverse PTR record 
126+
        if ((Resolve-DnsName $ip).NameHost -eq $Domain_controller_FQDN) {
127+
            [void]$DcControllerDnsPtr_Result.add("OK - $Domain_controller_FQDN has an A record with an IP $ip, Reverse PTR record is in place") 
128+
        }
129+
        else {
130+
            [void]$DcControllerDnsPtr_Result.add("Missing - $Domain_controller_FQDN has an A record with an IP $ip, But no reverse PTR record was found for the host")
131+
        }
132+
    }
133+
}
134+
catch {
135+
    [void]$DcControllerDnsPtr_Result.add("Error - " + $_.exception.message)
136+
}
137+
138+
#show the results
139+
$DcControllerDnsPtr_Result
140+
```
141+
142+
## Next steps
143+
144+
[Verify reverse DNS entry (PTR record) for domain controller](deploy-active-directory.md#verify-reverse-dns-entry-for-domain-controller).

docs/toc.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -221,8 +221,10 @@
221221
items:
222222
- name: Troubleshoot Kubernetes
223223
href: big-data-cluster/cluster-troubleshooting-commands.md
224-
- name: Active directory integration
224+
- name: Active directory domain group scope
225225
href: big-data-cluster/troubleshoot-active-directory.md
226+
- name: Active directory DNS
227+
href: big-data-cluster/troubleshoot-ad-reverse-lookup-zone.md
226228
- name: HDFS admin rights
227229
href: big-data-cluster/troubleshoot-hdfs-restore-admin.md
228230
- name: Reference

0 commit comments

Comments
 (0)