Merge branch 'release-sqlseattle' of

HeidiSteen · HeidiSteen · commit 45183b130706 · 2018-09-07T12:50:24.000-07:00
diff --git a/docs/big-data-cluster/deployment-guidance.md b/docs/big-data-cluster/deployment-guidance.md
@@ -1,52 +1,222 @@
 ---
-title: How to deploy SQL Server Big Data Cluster on Kubernetes | Microsoft Docs
+title: How to deploy SQL Server Big Data cluster on Kubernetes | Microsoft Docs
 description:
 author: rothja 
 ms.author: jroth 
 manager: craigg
-ms.date: 08/27/2018
+ms.date: 09/07/2018
 ms.topic: conceptual
 ms.prod: sql
 ---
 
-# How to deploy SQL Server Big Data Cluster on Kubernetes
+# How to deploy SQL Server Big Data cluster on Kubernetes
 
-SQL Server 2019 CTP 2.0 can be deployed as docker containers on a Kubernetes cluster. This is an overview of the setup and configuration steps:
+SQL Server Big Data cluster can be deployed as docker containers on a Kubernetes cluster. This is an overview of the setup and configuration steps:
 
 - Setup Kubernetes cluster on a single VM, cluster of VMs or in Azure Container Service
-- Deploy SQL Server 2019 CTP 2.0 in a Kubernetes cluster
-- Configure the SQL Server master instance
+- Install the cluster configuration tool `mssqlctl` on your client machine
+- Deploy SQL Server Big Data cluster in a Kubernetes cluster
 
 ## Kubernetes prerequisistes
 
-SQL Server 2019 CTP 2.0 requires a minimum 1.10 version for Kubernetes, for both server and client. To install a specific version on kubectl client, see [Install kubectl binary via curl](https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl).  Latest versions of minikube and ACS are at least 1.10. For ACS you will need to use --orchestrator-version parameter to specify a version different than default.
+SQL Server Big Data cluster requires a minimum v1.10 version for Kubernetes, for both server and client. To install a specific version on kubectl client, see [Install kubectl binary via curl](https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl).  Latest versions of minikube and AKS are at least 1.10. For AKS you will need to use `--kubernetes-version` parameter to specify a version different than default.
 
-Also, note that the client/server version skew that is supported is +/-1 minor version. The Kubernetes documentation states that  "a client should be skewed no more than one minor version from the master, but may lead the master by up to one minor version. For example, a v1.3 master should work with v1.1, v1.2, and v1.3 nodes, and should work with v1.2, v1.3, and v1.4 clients." For more information, see [Kubernetes supported releases and component skew](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/release/versioning.md#supported-releases-and-component-skew).
+Also, note that the client/server Kubernetes version skew that is supported is +/-1 minor version. The Kubernetes documentation states that  "a client should be skewed no more than one minor version from the master, but may lead the master by up to one minor version. For example, a v1.3 master should work with v1.1, v1.2, and v1.3 nodes, and should work with v1.2, v1.3, and v1.4 clients." For more information, see [Kubernetes supported releases and component skew](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/release/versioning.md#supported-releases-and-component-skew).
 
 ## <a id="kubernetes"></a> Kubernetes cluster setup
 
-If you already have a Kubernetes cluster, then you can skip directly to the [deployment step](#deploy). This section assumes a basic understanding of Kubernetes concepts.  For detailed information on Kubernetes, see the [Kubernetes documentation](https://kubernetes.io/docs/home).
+If you already have a Kubernetes cluster that meets above prerequisites, then you can skip directly to the [deployment step](#deploy). This section assumes a basic understanding of Kubernetes concepts.  For detailed information on Kubernetes, see the [Kubernetes documentation](https://kubernetes.io/docs/home).
 
 You can choose to deploy Kubernetes in any of three ways:
 
 | Deploy Kubernetes on: | Description |
 |---|---|
 | **Minikube** | A single-node Kubernetes cluster in a VM. |
-| **Azure Kubernetes Services (AKS)** | A managed Kubernetes cluster in Azure. |
-| **Multiple VMs** | Please refer to **CTP1.8/documentation/k8s-deployment-multiple-vms.docx** document in GitHub for instructions. |
+| **Azure Kubernetes Services (AKS)** | A managed Kubernetes container service in Azure. |
+| **Multiple VMs** | A Kubernetes cluster deployed on your VMs using kubeadm |
 
-For guidance on configuring one of these Kubernetes cluster options for SQL Server 2019 CTP 2.0, see one of the followin articles:
+For guidance on configuring one of these Kubernetes cluster options for SQL Server Big Data cluster, see one of the following articles:
 
    - [Configure Minikube](deploy-on-minikube.md)
-   - [Configure on Azure Kubernetes Service](deploy-on-aks.md)
+   - [Configure Kubernetes on Azure Kubernetes Service](deploy-on-aks.md)
    - [Configure Kubernetes on multiple VMs](deploy-on-vms.md)
 
-## <a id="deploy"></a> Deploy SQL Server 2019 CTP 2.0
+## <a id="deploy"></a> Deploy SQL Server Big Data cluster
+
+After you have configured your Kubernetes cluster, you can proceed with the deployment for SQL Server Big Data cluster. To deploy an Aris cluster with all default configurations for a dev/test environment, follow the instructions in this article:
+
+[Quickstart: Deploy SQL Server Aris on Kubernetes](quickstart-big-data-cluster-deploy.md)
+
+If you want to customize your Aris configuration, according to your workload needs, follow the next set of instructions.
+
+## Verify kubernetes configuration
+
+Execute the below kubectl command to view the cluster configuration. Ensure that kubectl is pointed to the correct cluster context.
+
+```bash
+kubectl config view
+```
+
+## Install mssqlctl CLI management tool for SQL Server Big Data cluster
+`mssqlctl` is a command line utility written in Python that enables cluster administrators to bootstrap and manage the Big Data Cluster via REST APIs. The minimum Python version required is v3.5. You must also have `pip` that is used to download and install `mssqlctl` tool. 
+On a Windows client, you can download the necessary Python package from [https://www.python.org/downloads/](https://www.python.org/downloads/). For python3.5.3 and later, pip3 is also installed when you install Python. When you install it you may not select the add to path. Then you can find where the pip3 located and add it to path manually.
+On Linux (WSL or Ubuntu client for example), these commands will install the latest 3.5 version of Python and pip:
+
+```bash
+sudo apt-get update
+sudo apt-get install python3
+sudo apt-get install python3-pip
+sudo pip3 install --upgrade pip
+```
+
+# Download and install mssqlctl tool 
+Run the below command to install msqlctl:
+
+TBD Fix the right path and name
+```bash
+pip install mssqlctl-1.0.0-py3-none-any.whl
+```
+
+## Define environment variables
 
-After you have configured your Kubernetes cluster, you can proceed with the deployment for SQL Server 2019 CTP 2.0. The deployment steps are described in the following article:
+The cluster configuration can be customized using a set of environment variables that are passed to the `mssqlctl create cluster` command. Most of the environment variables are optional with default avalues as per below. Note that there are environment variables like credentials that require user input.
+
+| Environment variable | Required | Default value | Description |
+|---|---|---|---|
+| **ACCEPT_EULA** | Yes | N/A | Accept the SQL Server license agreement (for example, 'Y').  |
+| **CLUSTER_NAME** | Yes | N/A | The name of the Kubernetes namespace to deploy SQLServer Big Data cluster into. |
+| **CLUSTER_PLATFORM** | Yes | N/A | The pltform the Kubernetes cluster is deployed. Can be `aks`,`minikube` |
+| **CLUSTER_COMPUTE_POOL_REPLICAS** | No | 1 | The number of compute pool replicas to build out. In CTP2.0 only valued allowed is 1. |
+| **CLUSTER_DATA_POOL_REPLICAS** | No | 2 | The number of data pool replicas to build out. |
+| **CLUSTER_STORAGE_POOL_REPLICAS** | No | 2 | The number of storage pool replicas to build out. |
+| **DOCKER_REGISTRY** | Yes | TBD | The private registry where the images used to deploy the cluster are stored. See this <TBD add link> for a complete list of images. |
+| **DOCKER_REPOSITORY** | Yes | TBD | The private repository within the above registry where images are stored. |
+| **DOCKER_USERNAME** | Yes | N/A | The username to access the container images in case they are stored in a private repository. It is required for the duration of the gated public preview. |
+| **DOCKER_PASSWORD** | Yes | N/A | The password to access the above private repository. It is required for the duration of the gated public preview.|
+| **DOCKER_EMAIL** | Yes | N/A | The email associated with the above private repository. It is required for the duration of the gated private preview. |
+| **DOCKER_IMAGE_TAG** | No | latest | The lable used to tag the images. |
+| **DOCKER_IMAGE_POLICY** | No | Always | Always force a pull of the images.  |
+| **DOCKER_PRIVATE_REGISTRY** | Yes | 1 | For the timeframe of the gated public preview, this value has to be set to 1. |
+| **CONTROLLER_USERNAME** | Yes | N/A | The username for the cluster administrator. |
+| **CONTROLLER_PASSWORD** | Yes | N/A | The password for the cluster administrator. |
+| **KNOX_USERNAME** | Yes | N/A | The username for Knox user. |
+| **KNOX_PASSWORD** | Yes | N/A | The password for Knox user. |
+| **MSSQL_SA_PASSWORD** | Yes | N/A | The passowrd of SA user for SQL master instance. |
+| **USE_PERSISTENT_VOLUME** | No | true | `true` to use Kubernetes Persistent Volume Claims for pod storage.  `false` to use ephemeral host storage for pod storage. |
+| **STORAGE_CLASS_NAME** | No | default | If `USE_PERSISTENT_VOLUME` is `true` this indicates the name of the Kubernetes Storage Class to use.  |
+| **MASTER_SQL_PORT** | No | 31433 | The TCP/IP port that the master SQL instance listens on the public network. |
+| **KNOX_PORT** | No | 30443 | The TCP/IP port that Apache Knox listens on the public network. |
+| **GRAFNA_PORT** | No | 30888 | The TCP/IP port that the Grafana monitoring application listens on the public network. |
+| **KIBANA_PORT** | No | 30999 | The TCP/IP port that the Kibana log search application listens on the public network. |
+
+Setting the environment variables required for deploying Aris cluster differs depending on whether you are using Windows or Linux client.  Choose the steps below depending on which operating system you are using.
+
+> [!IMPORTANT]
+> Make sure you wrap the passwords in double quotes if it contains any special characters.
+>
+> You can set the MSSQL_SA_PASSWORD to whatever you like, but make sure they are sufficiently complex and don’t use the ! & or ‘ characters.
+
+Initialize the following environment variables, they are required for deploying the cluster:
+
+### Windows
+
+Using a CMD window (not PowerShell), configure the following environment variables:
+
+```cmd
+SET ACCEPT_EULA=Y
+SET CLUSTER_PLATFORM=<minikube or aks>
+SET CLUSTER_NAME=<your SQL server Big Data cluster name>
+
+SET CONTROLLER_USERNAME=<controller_admin_name – can be anything>
+SET CONTROLLER_PASSWORD=<controller_admin_password – can be anything, password complexity compliant>
+SET KNOX_USERNAME=<knox_username – can be anything>
+SET KNOX_PASSWORD=<knox_password – can be anything, password complexity compliant>
+SET MSSQL_SA_PASSWORD=<sa_password_of_master_sql_instances>
+
+SET DOCKER_REGISTRY=private-repo.microsoft.com
+SET DOCKER_REPOSITORY=mssql-private-preview
+SET DOCKER_USERNAME=<your username>
+SET DOCKER_PASSWORD=<your password>
+SET DOCKER_PRIVATE_REGISTRY="1"
+```
+
+### Linux
+
+Initialize the following environment variables:
+
+```bash
+export ACCEPT_EULA=Y
+export CLUSTER_PLATFORM=<minikube or aks>
+export CLUSTER_PLATFORM=<minikube or aks>
+export CLUSTER_NODE_REPLICAS=<number_of_nodes_excluding_master>
+
+export CONTROLLER_USERNAME=<controller_admin_name – can be anything>
+export CONTROLLER_PASSWORD=<controller_admin_password – can be anything, password complexity compliant>
+export KNOX_USERNAME=<knox_username – can be anything>
+export KNOX_PASSWORD=<knox_password – can be anything, password complexity compliant>
+export MSSQL_SA_PASSWORD=<sa_password_of_master_sql_instances>
+
+export DOCKER_REGISTRY=private-repo.microsoft.com
+export DOCKER_REPOSITORY=mssql-private-preview
+export DOCKER_USERNAME=<your username>
+export DOCKER_PASSWORD=<your password>
+export DOCKER_PRIVATE_REGISTRY="1"
+```
+
+> [!NOTE]
+> For an on-premises cluster built with kubeadm, when USE_PERSISTENT_STORAGE=true, you must pre-provision a Kubernetes storage class and pass it through using the STORAGE_CLASS_NAME.
+
+## Deploy SQL Server Big Data cluster
+
+The create cluster API is used to initialize the Kubernetes namespace and deploy all the application pods into the namespace. To deploy SQL Server Big Data cluster on your Kubernetes cluster, run the following command:
+
+```bash
+mssqlctl create cluster <name of your cluster>
+```
+
+> [!NOTE]
+> The name of your cluster needs to be only lower case alpha-numeric characters, no spaces. All Kubernetes artifacts (containers, pods, statefull sets, services) for the cluster will be created in a namespace with same name as the cluster name specified.
+
+
+  The client uses the Get Cluster and Get Logs APIs to check the status of the Create Cluster operation.The command window will ouput the deployment status. You can also check the deployment status by running these commands in a different cmd window:
+
+```bash
+kubectl get all -n <name of your cluster>
+kubectl get pods -n <name of your cluster>
+kubectl get svc -n <name of your cluster>
+```
+
+You can see a more granular status and configuration for each pod by running:
+```bash
+kubectl describe pod <pod name> -n <name of your cluster>
+```
+
+Once the Controller pod is running, you can use the Cluster Administration Portal to monitor the deployment. The portal will be launched automatically. 
+
+## <a id="masterip"></a> Get the master instance IP address
+
+After the deployment script has completed successfully, you can obtain the IP address of the SQL Server master instance using the steps outlined below. You will use this IP address and port number 31433 to connect to the SQL Server master instance (for example: **\<ip-address\>,31433**). Similarly, for the Knox Gateway endpoint. All cluster endpoints are outlined in the Service Endpoints tab in the Cluster Administration Portal as well.
+
+### AKS
+
+If you are using AKS, Azure provides the Azure LoadBalancer service. Run following command:
+
+```bash
+kubectl get svc service-master-lb -n <name of your cluster>
+kubectl get svc service-security-lb -n <name of your cluster>
+```
+
+Look for the **External-IP** value that is assigned to the service. Then, connect to the SQL Server master instance using the IP address at port 31433 (Ex: **\<ip-address\>,31433**) and to Knox/HDFS Gateway endpoint using the external-IP for `service-security-lb` service. 
+
+### Minikube
+
+If you are using Minikube, you need to run the following command to get the IP address you need to connect to. In addition to the IP, specify the port for the endpoint you need to connect to.
+
+```bash
+minikube ip
+```
 
-[Quickstart: Deploy SQL Server Big Data Cluster on Kubernetes](quickstart-big-data-cluster-deploy.md)
 
 ## Next steps
 
-After successfully deploying SQL Server 2019 CTP 2.0 to Kubernetes, [install the big data tools](deploy-big-data-tools.md) and learn more in the [getting started quickstart](quickstart-big-data-cluster-get-started.md).
+After successfully deploying SQL Server Big Data cluster to Kubernetes, [install the big data tools](deploy-big-data-tools.md) and learn more in the [getting started quickstart](quickstart-big-data-cluster-get-started.md).
diff --git a/docs/relational-databases/polybase/polybase-type-mapping.md b/docs/relational-databases/polybase/polybase-type-mapping.md
@@ -16,8 +16,43 @@ manager: craigg
 
 [!INCLUDE[appliesto-ss-xxxx-asdw-pdw-md-winonly](../../includes/appliesto-ss-xxxx-xxxx-xxx-md-winonly.md)]
 
-TBD
+This article describes the mapping between PolyBase external data sources and SQL Server. You can use this information to correctly define external tables with the [CREATE EXTERNAL TABLE](../../t-sql/statements/create-external-table-transact-sql.md) Transact-SQL command.
+
+## Overview
+
+When creating and external table with PolyBase, the column definitions, including the data types and number of columns, must match the data in the external files. If there is a mismatch, the file rows is rejected when querying the actual data.  
+  
+For external tables that reference files in external data sources, the column and type definitions must map to the exact schema of the external file. When defining data types that reference data stored in Hadoop/Hive, use the following mappings between SQL and Hive data types and cast the type into a SQL data type when selecting from it. The types include all versions of Hive unless stated otherwise.
+
+> [!NOTE]  
+> SQL Server does not support the Hive *infinity* data value in any conversion. PolyBase will fail with a data type conversion error.
+
+## Type mapping reference
+
+| SQL Data Type | .NET Data Type            | Hive Data Type | Hadoop/Java Data Type | Comments                       |
+| ------------- | ------------------------- | -------------- | --------------------- | ------------------------------ |
+| tinyint       | Byte                      | tinyint        | ByteWritable          | For unsigned numbers only.     |
+| smallint      | Int16                     | smallint       | ShortWritable         |
+| int           | Int32                     | int            | IntWritable           |
+| bigint        | Int64                     | bigint         | LongWritable          |
+| bit           | Boolean                   | boolean        | BooleanWritable       |
+| float         | Double                    | double         | DoubleWritable        |
+| real          | Single                    | float          | FloatWritable         |
+| money         | Decimal                   | double         | DoubleWritable        |
+| smallmoney    | Decimal                   | double         | DoubleWritable        |
+| nchar         | String<br /><br /> Char[] | string         | text                  |
+| nvarchar      | String<br /><br /> Char[] | string         | Text                  |
+| char          | String<br /><br /> Char[] | string         | Text                  |
+| varchar       | String<br /><br /> Char[] | string         | Text                  |
+| binary        | Byte[]                    | binary         | BytesWritable         | Applies to Hive 0.8 and later. |
+| varbinary     | Byte[]                    | binary         | BytesWritable         | Applies to Hive 0.8 and later. |
+| date          | DateTime                  | timestamp      | TimestampWritable     |
+| smalldatetime | DateTime                  | timestamp      | TimestampWritable     |
+| datetime2     | DateTime                  | timestamp      | TimestampWritable     |
+| datetime      | DateTime                  | timestamp      | TimestampWritable     |
+| time          | TimeSpan                  | timestamp      | TimestampWritable     |
+| decimal       | Decimal                   | decimal        | BigDecimalWritable    | Applies to Hive0.11 and later. |
 
 ## Next steps
 
-TBD
+For more information on how this is used, see Transact-SQL reference article for [CREATE EXTERNAL TABLE](../../t-sql/statements/create-external-table-transact-sql.md).
diff --git a/docs/t-sql/statements/create-external-table-transact-sql.md b/docs/t-sql/statements/create-external-table-transact-sql.md