Skip to content

Commit 45183b1

Browse files
committed
Merge branch 'release-sqlseattle' of https://github.com/MicrosoftDocs/sql-docs-pr into heidist-ctp2-toc
2 parents 00aefd7 + 087ece1 commit 45183b1

3 files changed

Lines changed: 225 additions & 49 deletions

File tree

Lines changed: 187 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,52 +1,222 @@
11
---
2-
title: How to deploy SQL Server Big Data Cluster on Kubernetes | Microsoft Docs
2+
title: How to deploy SQL Server Big Data cluster on Kubernetes | Microsoft Docs
33
description:
44
author: rothja
55
ms.author: jroth
66
manager: craigg
7-
ms.date: 08/27/2018
7+
ms.date: 09/07/2018
88
ms.topic: conceptual
99
ms.prod: sql
1010
---
1111

12-
# How to deploy SQL Server Big Data Cluster on Kubernetes
12+
# How to deploy SQL Server Big Data cluster on Kubernetes
1313

14-
SQL Server 2019 CTP 2.0 can be deployed as docker containers on a Kubernetes cluster. This is an overview of the setup and configuration steps:
14+
SQL Server Big Data cluster can be deployed as docker containers on a Kubernetes cluster. This is an overview of the setup and configuration steps:
1515

1616
- Setup Kubernetes cluster on a single VM, cluster of VMs or in Azure Container Service
17-
- Deploy SQL Server 2019 CTP 2.0 in a Kubernetes cluster
18-
- Configure the SQL Server master instance
17+
- Install the cluster configuration tool `mssqlctl` on your client machine
18+
- Deploy SQL Server Big Data cluster in a Kubernetes cluster
1919

2020
## Kubernetes prerequisistes
2121

22-
SQL Server 2019 CTP 2.0 requires a minimum 1.10 version for Kubernetes, for both server and client. To install a specific version on kubectl client, see [Install kubectl binary via curl](https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl). Latest versions of minikube and ACS are at least 1.10. For ACS you will need to use --orchestrator-version parameter to specify a version different than default.
22+
SQL Server Big Data cluster requires a minimum v1.10 version for Kubernetes, for both server and client. To install a specific version on kubectl client, see [Install kubectl binary via curl](https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl). Latest versions of minikube and AKS are at least 1.10. For AKS you will need to use `--kubernetes-version` parameter to specify a version different than default.
2323

24-
Also, note that the client/server version skew that is supported is +/-1 minor version. The Kubernetes documentation states that "a client should be skewed no more than one minor version from the master, but may lead the master by up to one minor version. For example, a v1.3 master should work with v1.1, v1.2, and v1.3 nodes, and should work with v1.2, v1.3, and v1.4 clients." For more information, see [Kubernetes supported releases and component skew](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/release/versioning.md#supported-releases-and-component-skew).
24+
Also, note that the client/server Kubernetes version skew that is supported is +/-1 minor version. The Kubernetes documentation states that "a client should be skewed no more than one minor version from the master, but may lead the master by up to one minor version. For example, a v1.3 master should work with v1.1, v1.2, and v1.3 nodes, and should work with v1.2, v1.3, and v1.4 clients." For more information, see [Kubernetes supported releases and component skew](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/release/versioning.md#supported-releases-and-component-skew).
2525

2626
## <a id="kubernetes"></a> Kubernetes cluster setup
2727

28-
If you already have a Kubernetes cluster, then you can skip directly to the [deployment step](#deploy). This section assumes a basic understanding of Kubernetes concepts. For detailed information on Kubernetes, see the [Kubernetes documentation](https://kubernetes.io/docs/home).
28+
If you already have a Kubernetes cluster that meets above prerequisites, then you can skip directly to the [deployment step](#deploy). This section assumes a basic understanding of Kubernetes concepts. For detailed information on Kubernetes, see the [Kubernetes documentation](https://kubernetes.io/docs/home).
2929

3030
You can choose to deploy Kubernetes in any of three ways:
3131

3232
| Deploy Kubernetes on: | Description |
3333
|---|---|
3434
| **Minikube** | A single-node Kubernetes cluster in a VM. |
35-
| **Azure Kubernetes Services (AKS)** | A managed Kubernetes cluster in Azure. |
36-
| **Multiple VMs** | Please refer to **CTP1.8/documentation/k8s-deployment-multiple-vms.docx** document in GitHub for instructions. |
35+
| **Azure Kubernetes Services (AKS)** | A managed Kubernetes container service in Azure. |
36+
| **Multiple VMs** | A Kubernetes cluster deployed on your VMs using kubeadm |
3737

38-
For guidance on configuring one of these Kubernetes cluster options for SQL Server 2019 CTP 2.0, see one of the followin articles:
38+
For guidance on configuring one of these Kubernetes cluster options for SQL Server Big Data cluster, see one of the following articles:
3939

4040
- [Configure Minikube](deploy-on-minikube.md)
41-
- [Configure on Azure Kubernetes Service](deploy-on-aks.md)
41+
- [Configure Kubernetes on Azure Kubernetes Service](deploy-on-aks.md)
4242
- [Configure Kubernetes on multiple VMs](deploy-on-vms.md)
4343

44-
## <a id="deploy"></a> Deploy SQL Server 2019 CTP 2.0
44+
## <a id="deploy"></a> Deploy SQL Server Big Data cluster
45+
46+
After you have configured your Kubernetes cluster, you can proceed with the deployment for SQL Server Big Data cluster. To deploy an Aris cluster with all default configurations for a dev/test environment, follow the instructions in this article:
47+
48+
[Quickstart: Deploy SQL Server Aris on Kubernetes](quickstart-big-data-cluster-deploy.md)
49+
50+
If you want to customize your Aris configuration, according to your workload needs, follow the next set of instructions.
51+
52+
## Verify kubernetes configuration
53+
54+
Execute the below kubectl command to view the cluster configuration. Ensure that kubectl is pointed to the correct cluster context.
55+
56+
```bash
57+
kubectl config view
58+
```
59+
60+
## Install mssqlctl CLI management tool for SQL Server Big Data cluster
61+
`mssqlctl` is a command line utility written in Python that enables cluster administrators to bootstrap and manage the Big Data Cluster via REST APIs. The minimum Python version required is v3.5. You must also have `pip` that is used to download and install `mssqlctl` tool.
62+
On a Windows client, you can download the necessary Python package from [https://www.python.org/downloads/](https://www.python.org/downloads/). For python3.5.3 and later, pip3 is also installed when you install Python. When you install it you may not select the add to path. Then you can find where the pip3 located and add it to path manually.
63+
On Linux (WSL or Ubuntu client for example), these commands will install the latest 3.5 version of Python and pip:
64+
65+
```bash
66+
sudo apt-get update
67+
sudo apt-get install python3
68+
sudo apt-get install python3-pip
69+
sudo pip3 install --upgrade pip
70+
```
71+
72+
# Download and install mssqlctl tool
73+
Run the below command to install msqlctl:
74+
75+
TBD Fix the right path and name
76+
```bash
77+
pip install mssqlctl-1.0.0-py3-none-any.whl
78+
```
79+
80+
## Define environment variables
4581

46-
After you have configured your Kubernetes cluster, you can proceed with the deployment for SQL Server 2019 CTP 2.0. The deployment steps are described in the following article:
82+
The cluster configuration can be customized using a set of environment variables that are passed to the `mssqlctl create cluster` command. Most of the environment variables are optional with default avalues as per below. Note that there are environment variables like credentials that require user input.
83+
84+
| Environment variable | Required | Default value | Description |
85+
|---|---|---|---|
86+
| **ACCEPT_EULA** | Yes | N/A | Accept the SQL Server license agreement (for example, 'Y'). |
87+
| **CLUSTER_NAME** | Yes | N/A | The name of the Kubernetes namespace to deploy SQLServer Big Data cluster into. |
88+
| **CLUSTER_PLATFORM** | Yes | N/A | The pltform the Kubernetes cluster is deployed. Can be `aks`,`minikube` |
89+
| **CLUSTER_COMPUTE_POOL_REPLICAS** | No | 1 | The number of compute pool replicas to build out. In CTP2.0 only valued allowed is 1. |
90+
| **CLUSTER_DATA_POOL_REPLICAS** | No | 2 | The number of data pool replicas to build out. |
91+
| **CLUSTER_STORAGE_POOL_REPLICAS** | No | 2 | The number of storage pool replicas to build out. |
92+
| **DOCKER_REGISTRY** | Yes | TBD | The private registry where the images used to deploy the cluster are stored. See this <TBD add link> for a complete list of images. |
93+
| **DOCKER_REPOSITORY** | Yes | TBD | The private repository within the above registry where images are stored. |
94+
| **DOCKER_USERNAME** | Yes | N/A | The username to access the container images in case they are stored in a private repository. It is required for the duration of the gated public preview. |
95+
| **DOCKER_PASSWORD** | Yes | N/A | The password to access the above private repository. It is required for the duration of the gated public preview.|
96+
| **DOCKER_EMAIL** | Yes | N/A | The email associated with the above private repository. It is required for the duration of the gated private preview. |
97+
| **DOCKER_IMAGE_TAG** | No | latest | The lable used to tag the images. |
98+
| **DOCKER_IMAGE_POLICY** | No | Always | Always force a pull of the images. |
99+
| **DOCKER_PRIVATE_REGISTRY** | Yes | 1 | For the timeframe of the gated public preview, this value has to be set to 1. |
100+
| **CONTROLLER_USERNAME** | Yes | N/A | The username for the cluster administrator. |
101+
| **CONTROLLER_PASSWORD** | Yes | N/A | The password for the cluster administrator. |
102+
| **KNOX_USERNAME** | Yes | N/A | The username for Knox user. |
103+
| **KNOX_PASSWORD** | Yes | N/A | The password for Knox user. |
104+
| **MSSQL_SA_PASSWORD** | Yes | N/A | The passowrd of SA user for SQL master instance. |
105+
| **USE_PERSISTENT_VOLUME** | No | true | `true` to use Kubernetes Persistent Volume Claims for pod storage. `false` to use ephemeral host storage for pod storage. |
106+
| **STORAGE_CLASS_NAME** | No | default | If `USE_PERSISTENT_VOLUME` is `true` this indicates the name of the Kubernetes Storage Class to use. |
107+
| **MASTER_SQL_PORT** | No | 31433 | The TCP/IP port that the master SQL instance listens on the public network. |
108+
| **KNOX_PORT** | No | 30443 | The TCP/IP port that Apache Knox listens on the public network. |
109+
| **GRAFNA_PORT** | No | 30888 | The TCP/IP port that the Grafana monitoring application listens on the public network. |
110+
| **KIBANA_PORT** | No | 30999 | The TCP/IP port that the Kibana log search application listens on the public network. |
111+
112+
Setting the environment variables required for deploying Aris cluster differs depending on whether you are using Windows or Linux client. Choose the steps below depending on which operating system you are using.
113+
114+
> [!IMPORTANT]
115+
> Make sure you wrap the passwords in double quotes if it contains any special characters.
116+
>
117+
> You can set the MSSQL_SA_PASSWORD to whatever you like, but make sure they are sufficiently complex and don’t use the ! & or ‘ characters.
118+
119+
Initialize the following environment variables, they are required for deploying the cluster:
120+
121+
### Windows
122+
123+
Using a CMD window (not PowerShell), configure the following environment variables:
124+
125+
```cmd
126+
SET ACCEPT_EULA=Y
127+
SET CLUSTER_PLATFORM=<minikube or aks>
128+
SET CLUSTER_NAME=<your SQL server Big Data cluster name>
129+
130+
SET CONTROLLER_USERNAME=<controller_admin_name – can be anything>
131+
SET CONTROLLER_PASSWORD=<controller_admin_password – can be anything, password complexity compliant>
132+
SET KNOX_USERNAME=<knox_username – can be anything>
133+
SET KNOX_PASSWORD=<knox_password – can be anything, password complexity compliant>
134+
SET MSSQL_SA_PASSWORD=<sa_password_of_master_sql_instances>
135+
136+
SET DOCKER_REGISTRY=private-repo.microsoft.com
137+
SET DOCKER_REPOSITORY=mssql-private-preview
138+
SET DOCKER_USERNAME=<your username>
139+
SET DOCKER_PASSWORD=<your password>
140+
SET DOCKER_PRIVATE_REGISTRY="1"
141+
```
142+
143+
### Linux
144+
145+
Initialize the following environment variables:
146+
147+
```bash
148+
export ACCEPT_EULA=Y
149+
export CLUSTER_PLATFORM=<minikube or aks>
150+
export CLUSTER_PLATFORM=<minikube or aks>
151+
export CLUSTER_NODE_REPLICAS=<number_of_nodes_excluding_master>
152+
153+
export CONTROLLER_USERNAME=<controller_admin_name – can be anything>
154+
export CONTROLLER_PASSWORD=<controller_admin_password – can be anything, password complexity compliant>
155+
export KNOX_USERNAME=<knox_username – can be anything>
156+
export KNOX_PASSWORD=<knox_password – can be anything, password complexity compliant>
157+
export MSSQL_SA_PASSWORD=<sa_password_of_master_sql_instances>
158+
159+
export DOCKER_REGISTRY=private-repo.microsoft.com
160+
export DOCKER_REPOSITORY=mssql-private-preview
161+
export DOCKER_USERNAME=<your username>
162+
export DOCKER_PASSWORD=<your password>
163+
export DOCKER_PRIVATE_REGISTRY="1"
164+
```
165+
166+
> [!NOTE]
167+
> For an on-premises cluster built with kubeadm, when USE_PERSISTENT_STORAGE=true, you must pre-provision a Kubernetes storage class and pass it through using the STORAGE_CLASS_NAME.
168+
169+
## Deploy SQL Server Big Data cluster
170+
171+
The create cluster API is used to initialize the Kubernetes namespace and deploy all the application pods into the namespace. To deploy SQL Server Big Data cluster on your Kubernetes cluster, run the following command:
172+
173+
```bash
174+
mssqlctl create cluster <name of your cluster>
175+
```
176+
177+
> [!NOTE]
178+
> The name of your cluster needs to be only lower case alpha-numeric characters, no spaces. All Kubernetes artifacts (containers, pods, statefull sets, services) for the cluster will be created in a namespace with same name as the cluster name specified.
179+
180+
181+
The client uses the Get Cluster and Get Logs APIs to check the status of the Create Cluster operation.The command window will ouput the deployment status. You can also check the deployment status by running these commands in a different cmd window:
182+
183+
```bash
184+
kubectl get all -n <name of your cluster>
185+
kubectl get pods -n <name of your cluster>
186+
kubectl get svc -n <name of your cluster>
187+
```
188+
189+
You can see a more granular status and configuration for each pod by running:
190+
```bash
191+
kubectl describe pod <pod name> -n <name of your cluster>
192+
```
193+
194+
Once the Controller pod is running, you can use the Cluster Administration Portal to monitor the deployment. The portal will be launched automatically.
195+
196+
## <a id="masterip"></a> Get the master instance IP address
197+
198+
After the deployment script has completed successfully, you can obtain the IP address of the SQL Server master instance using the steps outlined below. You will use this IP address and port number 31433 to connect to the SQL Server master instance (for example: **\<ip-address\>,31433**). Similarly, for the Knox Gateway endpoint. All cluster endpoints are outlined in the Service Endpoints tab in the Cluster Administration Portal as well.
199+
200+
### AKS
201+
202+
If you are using AKS, Azure provides the Azure LoadBalancer service. Run following command:
203+
204+
```bash
205+
kubectl get svc service-master-lb -n <name of your cluster>
206+
kubectl get svc service-security-lb -n <name of your cluster>
207+
```
208+
209+
Look for the **External-IP** value that is assigned to the service. Then, connect to the SQL Server master instance using the IP address at port 31433 (Ex: **\<ip-address\>,31433**) and to Knox/HDFS Gateway endpoint using the external-IP for `service-security-lb` service.
210+
211+
### Minikube
212+
213+
If you are using Minikube, you need to run the following command to get the IP address you need to connect to. In addition to the IP, specify the port for the endpoint you need to connect to.
214+
215+
```bash
216+
minikube ip
217+
```
47218

48-
[Quickstart: Deploy SQL Server Big Data Cluster on Kubernetes](quickstart-big-data-cluster-deploy.md)
49219

50220
## Next steps
51221

52-
After successfully deploying SQL Server 2019 CTP 2.0 to Kubernetes, [install the big data tools](deploy-big-data-tools.md) and learn more in the [getting started quickstart](quickstart-big-data-cluster-get-started.md).
222+
After successfully deploying SQL Server Big Data cluster to Kubernetes, [install the big data tools](deploy-big-data-tools.md) and learn more in the [getting started quickstart](quickstart-big-data-cluster-get-started.md).

docs/relational-databases/polybase/polybase-type-mapping.md

Lines changed: 37 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,43 @@ manager: craigg
1616

1717
[!INCLUDE[appliesto-ss-xxxx-asdw-pdw-md-winonly](../../includes/appliesto-ss-xxxx-xxxx-xxx-md-winonly.md)]
1818

19-
TBD
19+
This article describes the mapping between PolyBase external data sources and SQL Server. You can use this information to correctly define external tables with the [CREATE EXTERNAL TABLE](../../t-sql/statements/create-external-table-transact-sql.md) Transact-SQL command.
20+
21+
## Overview
22+
23+
When creating and external table with PolyBase, the column definitions, including the data types and number of columns, must match the data in the external files. If there is a mismatch, the file rows is rejected when querying the actual data.
24+
25+
For external tables that reference files in external data sources, the column and type definitions must map to the exact schema of the external file. When defining data types that reference data stored in Hadoop/Hive, use the following mappings between SQL and Hive data types and cast the type into a SQL data type when selecting from it. The types include all versions of Hive unless stated otherwise.
26+
27+
> [!NOTE]
28+
> SQL Server does not support the Hive *infinity* data value in any conversion. PolyBase will fail with a data type conversion error.
29+
30+
## Type mapping reference
31+
32+
| SQL Data Type | .NET Data Type | Hive Data Type | Hadoop/Java Data Type | Comments |
33+
| ------------- | ------------------------- | -------------- | --------------------- | ------------------------------ |
34+
| tinyint | Byte | tinyint | ByteWritable | For unsigned numbers only. |
35+
| smallint | Int16 | smallint | ShortWritable |
36+
| int | Int32 | int | IntWritable |
37+
| bigint | Int64 | bigint | LongWritable |
38+
| bit | Boolean | boolean | BooleanWritable |
39+
| float | Double | double | DoubleWritable |
40+
| real | Single | float | FloatWritable |
41+
| money | Decimal | double | DoubleWritable |
42+
| smallmoney | Decimal | double | DoubleWritable |
43+
| nchar | String<br /><br /> Char[] | string | text |
44+
| nvarchar | String<br /><br /> Char[] | string | Text |
45+
| char | String<br /><br /> Char[] | string | Text |
46+
| varchar | String<br /><br /> Char[] | string | Text |
47+
| binary | Byte[] | binary | BytesWritable | Applies to Hive 0.8 and later. |
48+
| varbinary | Byte[] | binary | BytesWritable | Applies to Hive 0.8 and later. |
49+
| date | DateTime | timestamp | TimestampWritable |
50+
| smalldatetime | DateTime | timestamp | TimestampWritable |
51+
| datetime2 | DateTime | timestamp | TimestampWritable |
52+
| datetime | DateTime | timestamp | TimestampWritable |
53+
| time | TimeSpan | timestamp | TimestampWritable |
54+
| decimal | Decimal | decimal | BigDecimalWritable | Applies to Hive0.11 and later. |
2055

2156
## Next steps
2257

23-
TBD
58+
For more information on how this is used, see Transact-SQL reference article for [CREATE EXTERNAL TABLE](../../t-sql/statements/create-external-table-transact-sql.md).

0 commit comments

Comments
 (0)