Skip to content

Commit c60ac15

Browse files
committed
Merge branch 'master' of https://github.com/MicrosoftDocs/sql-docs-pr into 10-06-prep-articles-for-mi
2 parents fd0f79a + 89ea9b3 commit c60ac15

322 files changed

Lines changed: 1122 additions & 1131 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/azdata/reference/reference-azdata-bdc-spark-batch.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ azdata bdc spark batch create --file -f
6262
### Examples
6363
Create a new Spark batch.
6464
```bash
65-
azdata spark batch create --code "2+2"
65+
azdata bdc spark batch create --code "2+2"
6666
```
6767
### Required Parameters
6868
#### `--file -f`
@@ -115,7 +115,7 @@ azdata bdc spark batch list
115115
### Examples
116116
List all the active batches.
117117
```bash
118-
azdata spark batch list
118+
azdata bdc spark batch list
119119
```
120120
### Global Arguments
121121
#### `--debug`
@@ -137,7 +137,7 @@ azdata bdc spark batch info --batch-id -i
137137
### Examples
138138
Get batch info for batch with ID of 0.
139139
```bash
140-
azdata spark batch info --batch-id 0
140+
azdata bdc spark batch info --batch-id 0
141141
```
142142
### Required Parameters
143143
#### `--batch-id -i`
@@ -162,7 +162,7 @@ azdata bdc spark batch log --batch-id -i
162162
### Examples
163163
Get batch log for batch with ID of 0.
164164
```bash
165-
azdata spark batch log --batch-id 0
165+
azdata bdc spark batch log --batch-id 0
166166
```
167167
### Required Parameters
168168
#### `--batch-id -i`
@@ -187,7 +187,7 @@ azdata bdc spark batch state --batch-id -i
187187
### Examples
188188
Get batch state for batch with ID of 0.
189189
```bash
190-
azdata spark batch state --batch-id 0
190+
azdata bdc spark batch state --batch-id 0
191191
```
192192
### Required Parameters
193193
#### `--batch-id -i`
@@ -212,7 +212,7 @@ azdata bdc spark batch delete --batch-id -i
212212
### Examples
213213
Delete a batch.
214214
```bash
215-
azdata spark batch delete --batch-id 0
215+
azdata bdc spark batch delete --batch-id 0
216216
```
217217
### Required Parameters
218218
#### `--batch-id -i`

docs/azdata/reference/reference-azdata-bdc-spark-session.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ azdata bdc spark session create [--session-kind -k]
6060
### Examples
6161
Create a session.
6262
```bash
63-
azdata spark session create --session-kind pyspark
63+
azdata bdc spark session create --session-kind pyspark
6464
```
6565
### Optional Parameters
6666
#### `--session-kind -k`
@@ -110,7 +110,7 @@ azdata bdc spark session list
110110
### Examples
111111
List all the active sessions.
112112
```bash
113-
azdata spark session list
113+
azdata bdc spark session list
114114
```
115115
### Global Arguments
116116
#### `--debug`
@@ -132,7 +132,7 @@ azdata bdc spark session info --session-id -i
132132
### Examples
133133
Get session info for session with ID of 0.
134134
```bash
135-
azdata spark session info --session-id 0
135+
azdata bdc spark session info --session-id 0
136136
```
137137
### Required Parameters
138138
#### `--session-id -i`
@@ -157,7 +157,7 @@ azdata bdc spark session log --session-id -i
157157
### Examples
158158
Get session log for session with ID of 0.
159159
```bash
160-
azdata spark session log --session-id 0
160+
azdata bdc spark session log --session-id 0
161161
```
162162
### Required Parameters
163163
#### `--session-id -i`
@@ -182,7 +182,7 @@ azdata bdc spark session state --session-id -i
182182
### Examples
183183
Get session state for session with ID of 0.
184184
```bash
185-
azdata spark session state --session-id 0
185+
azdata bdc spark session state --session-id 0
186186
```
187187
### Required Parameters
188188
#### `--session-id -i`
@@ -207,7 +207,7 @@ azdata bdc spark session delete --session-id -i
207207
### Examples
208208
Delete a session.
209209
```bash
210-
azdata spark session delete --session-id 0
210+
azdata bdc spark session delete --session-id 0
211211
```
212212
### Required Parameters
213213
#### `--session-id -i`

docs/azdata/reference/reference-azdata-bdc-spark-statement.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ azdata bdc spark statement list --session-id -i
3434
### Examples
3535
List all the session statements.
3636
```bash
37-
azdata spark statement list --session-id 0
37+
azdata bdc spark statement list --session-id 0
3838
```
3939
### Required Parameters
4040
#### `--session-id -i`
@@ -59,7 +59,7 @@ azdata bdc spark statement create --session-id -i
5959
### Examples
6060
Run a statement.
6161
```bash
62-
azdata spark statement create --session-id 0 --code "2+2"
62+
azdata bdc spark statement create --session-id 0 --code "2+2"
6363
```
6464
### Required Parameters
6565
#### `--session-id -i`
@@ -86,7 +86,7 @@ azdata bdc spark statement info --session-id -i
8686
### Examples
8787
Get statement info for session with ID of 0 and statement ID of 0.
8888
```bash
89-
azdata spark statement info --session-id 0 --statement-id 0
89+
azdata bdc spark statement info --session-id 0 --statement-id 0
9090
```
9191
### Required Parameters
9292
#### `--session-id -i`
@@ -113,7 +113,7 @@ azdata bdc spark statement cancel --session-id -i
113113
### Examples
114114
Cancel a statement.
115115
```bash
116-
azdata spark statement cancel --session-id 0 --statement-id 0
116+
azdata bdc spark statement cancel --session-id 0 --statement-id 0
117117
```
118118
### Required Parameters
119119
#### `--session-id -i`

docs/big-data-cluster/concept-compute-pool.md

Lines changed: 30 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,30 +5,55 @@ description: This article describes the compute pool in a SQL Server 2019 big da
55
author: MikeRayMSFT
66
ms.author: mikeray
77
ms.reviewer: mihaelab
8-
ms.date: 11/04/2019
8+
ms.date: 10/15/2020
99
ms.topic: conceptual
1010
ms.prod: sql
1111
ms.technology: big-data-cluster
1212
---
1313

14-
# What are compute pools in a SQL Server big data cluster?
14+
# What are compute pools SQL Server Big Data Clusters?
1515

1616
[!INCLUDE[SQL Server 2019](../includes/applies-to-version/sqlserver2019.md)]
1717

18-
This article describes the role of *SQL Server compute pools* in a SQL Server big data cluster. Compute pools provide scale-out computational resources for a big data cluster. The following sections describe the architecture and functionality of a compute pool.
18+
This article describes the role of *SQL Server compute pools* in SQL Server Big Data Clusters. Compute pools provide scale-out computational resources for a Big Data Cluster. They are used to offload computational work, or intermediate result sets, from the SQL Server master instance. The following sections describe the architecture, functionality and usage scenarios of a compute pool.
1919

2020
You can also watch this 5-minute video for an introduction into compute pools:
2121

2222
> [!VIDEO https://channel9.msdn.com/Shows/Data-Exposed/Overview-Big-Data-Cluster-Compute-Pool/player?WT.mc_id=dataexposed-c9-niner]
2323
24-
2524
## Compute pool architecture
2625

2726
A compute pool is made of one or more compute pods running in Kubernetes. The automated creation and management of these pods is coordinated by the [SQL Server master instance](concept-master-instance.md). Each pod contains a set of base services and an instance of the SQL Server database engine.
2827

28+
![Compute pool architecture](media/concept-compute-pool/compute-pool-architecture.png)
29+
2930
## Scale-out groups
3031

31-
A compute pool can act as a PolyBase scale-out group for distributed queries over different data sources--such as HDFS, Oracle, MongoDB, or Teradata. By using compute pods in Kubernetes, big data clusters can automate creating and configuring compute pods for PolyBase scale-out groups.
32+
A compute pool can act as a PolyBase scale-out group for distributed queries over different external data sources such as SQL Server, Oracle, MongoDB, Teradata and HDFS. By using compute pods in Kubernetes, Big Data Clusters can automate creating and configuring compute pods for PolyBase scale-out groups.
33+
34+
## Compute pool scenarios
35+
36+
Scenarios where the compute pool is used include:
37+
38+
- When queries submitted to the master instance use one or more tables located in the [Storage Pool](concept-storage-pool.md).
39+
40+
- When queries submitted to the master instance use one or more tables with round-robin distribution located in the [Data Pool](concept-data-pool.md).
41+
42+
- When queries submitted to the master instance use **partitioned** tables with external data sources of SQL Server, Oracle, MongoDB, and Teradata. For this scenario, the query hint OPTION (FORCE SCALEOUTEXECUTION) must be enabled.
43+
44+
- When queries submitted to the master instance use one or more tables located in [HDFS Tiering](hdfs-tiering.md).
45+
46+
Scenarios where the compute pool is **not** used include:
47+
48+
- When queries submitted to the master instance use one or more tables in an external Hadoop HDFS cluster.
49+
50+
- When queries submitted to the master instance use one or more tables in Azure Blob Storage.
51+
52+
- When queries submitted to the master instance use **non-partitioned** tables with external data sources of SQL Server, Oracle, MongoDB, and Teradata.
53+
54+
- When the query hint OPTION (DISABLE SCALEOUTEXECUTION) is enabled.
55+
56+
- When queries submitted to the master instance apply to databases located on the master instance.
3257

3358
## Next steps
3459

docs/big-data-cluster/data-ingestion-curl.md

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -42,51 +42,63 @@ For example:
4242

4343
`https://13.66.190.205:30443/gateway/default/webhdfs/v1/`
4444

45+
## Authentication with Active Directory
46+
47+
For deployments with Active Directory, use the authentication parameter with `curl` with Negotiate authentication.
48+
49+
To use `curl` with Active Directory authentication, run this command:
50+
51+
```
52+
kinit <username>
53+
```
54+
55+
The command generates a Kerberos token for `curl` to use. The commands demonstrated in the next sections specify the `--anyauth` parameter for `curl`. For URLs that require Negotiate authentication, `curl` automatically detects and uses the generated Kerberos token instead of username and password to authenticate to the URLs.
56+
4557
## List a file
4658

4759
To list file under **hdfs:///product_review_data**, use the following curl command:
4860

4961
```terminal
50-
curl -i -k -u root:<AZDATA_PASSWORD> -X GET 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/product_review_data/?op=liststatus'
62+
curl -i -k --anyauth -u root:<AZDATA_PASSWORD> -X GET 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/product_review_data/?op=liststatus'
5163
```
5264

5365
[!INCLUDE [big-data-cluster-root-user](../includes/big-data-cluster-root-user.md)]
5466

5567
For endpoints that do not use root, use the following curl command:
5668

5769
```terminal
58-
curl -i -k -u <AZDATA_USERNAME>:<AZDATA_PASSWORD> -X GET 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/product_review_data/?op=liststatus'
70+
curl -i -k --anyauth -u <AZDATA_USERNAME>:<AZDATA_PASSWORD> -X GET 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/product_review_data/?op=liststatus'
5971
```
6072

6173
## Put a local file into HDFS
6274

6375
To put a new file **test.csv** from local directory to product_review_data directory, use the following curl command (the **Content-Type** parameter is required):
6476

6577
```terminal
66-
curl -i -L -k -u root:<AZDATA_PASSWORD> -X PUT 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/product_review_data/test.csv?op=create' -H 'Content-Type: application/octet-stream' -T 'test.csv'
78+
curl -i -L -k --anyauth -u root:<AZDATA_PASSWORD> -X PUT 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/product_review_data/test.csv?op=create' -H 'Content-Type: application/octet-stream' -T 'test.csv'
6779
```
6880

6981
[!INCLUDE [big-data-cluster-root-user](../includes/big-data-cluster-root-user.md)]
7082

7183
For endpoints that do not use root, use the following curl command:
7284

7385
```terminal
74-
curl -i -L -k -u <AZDATA_USERNAME>:<AZDATA_PASSWORD> -X PUT 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/product_review_data/test.csv?op=create' -H 'Content-Type: application/octet-stream' -T 'test.csv'
86+
curl -i -L -k --anyauth -u <AZDATA_USERNAME>:<AZDATA_PASSWORD> -X PUT 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/product_review_data/test.csv?op=create' -H 'Content-Type: application/octet-stream' -T 'test.csv'
7587
```
7688

7789
## Create a directory
7890

7991
To create a directory **test** under `hdfs:///`, use the following command:
8092

8193
```terminal
82-
curl -i -L -k -u root:<AZDATA_PASSWORD> -X PUT 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/test?op=MKDIRS'
94+
curl -i -L -k --anyauth -u root:<AZDATA_PASSWORD> -X PUT 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/test?op=MKDIRS'
8395
```
8496

8597
[!INCLUDE [big-data-cluster-root-user](../includes/big-data-cluster-root-user.md)]
8698
For endpoints that do not use root, use the following curl command:
8799

88100
```terminal
89-
curl -i -L -k -u <AZDATA_USERNAME>:<AZDATA_PASSWORD> -X PUT 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/test?op=MKDIRS'
101+
curl -i -L -k --anyauth -u <AZDATA_USERNAME>:<AZDATA_PASSWORD> -X PUT 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/test?op=MKDIRS'
90102
```
91103

92104
## Next steps
166 KB
Loading

docs/connect/odbc/linux-mac/known-issues-in-this-version-of-the-driver.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ Additional issues will be posted on the [SQL Server Drivers blog](https://techco
2828

2929
- If the client encoding is UTF-8, the driver manager does not always correctly convert from UTF-8 to UTF-16. Currently, data corruption occurs when one or more characters in the string are not valid UTF-8 characters. ASCII characters are mapped correctly. The driver manager attempts this conversion when calling the SQLCHAR versions of the ODBC API (for example, SQLDriverConnectA). The driver manager will not attempt this conversion when calling the SQLWCHAR versions of the ODBC API (for example, SQLDriverConnectW).
3030

31-
- The *ColumnSize* parameter of **SQLBindParameter** refers to the number of characters in the SQL type, while *BufferLength* is the number of bytes in the application's buffer. However, if the SQL data type is `varchar(n)` or `char(n)`, the application binds the parameter as SQL_C_CHAR or SQL_C_VARCHAR, and the character encoding of the client is UTF-8, you may get a "String data, right truncation" error from the driver even if the value of *ColumnSize* is aligned with the size of the data type on the server. This error occurs since conversions between character encodings may change the length of the data. For example, a right apostrophe character (U+2019) is encoded in CP-1252 as the single byte 0x92, but in UTF-8 as the 3-byte sequence 0xe2 0x80 0x99.
31+
- The *ColumnSize* parameter of **SQLBindParameter** refers to the number of characters in the SQL type, while *BufferLength* is the number of bytes in the application's buffer. However, if the SQL data type is `varchar(n)` or `char(n)`, the application binds the parameter as SQL_C_CHAR for the C type, and SQL_CHAR or SQL_VARCHAR for the SQL type, and the character encoding of the client is UTF-8, you may get a "String data, right truncation" error from the driver even if the value of *ColumnSize* is aligned with the size of the data type on the server. This error occurs since conversions between character encodings may change the length of the data. For example, a right apostrophe character (U+2019) is encoded in CP-1252 as the single byte 0x92, but in UTF-8 as the 3-byte sequence 0xe2 0x80 0x99.
3232

3333
For example, if your encoding is UTF-8 and you specify 1 for both *BufferLength* and *ColumnSize* in **SQLBindParameter** for an out-parameter, and then attempt to retrieve the preceding character stored in a `char(1)` column on the server (using CP-1252), the driver attempts to convert it to the 3-byte UTF-8 encoding, but cannot fit the result into a 1-byte buffer. In the other direction, it compares *ColumnSize* with the *BufferLength* in **SQLBindParameter** before doing the conversion between the different code pages on the client and server. Because a *ColumnSize* of 1 is less than a *BufferLength* of (for example) 3, the driver generates an error. To avoid this error, ensure that the length of the data after conversion fits into the specified buffer or column. Note that *ColumnSize* cannot be greater than 8000 for the `varchar(n)` type.
3434

@@ -86,4 +86,4 @@ For ODBC driver installation instructions, see the following articles:
8686
- [Installing the Microsoft ODBC Driver for SQL Server on Linux](installing-the-microsoft-odbc-driver-for-sql-server.md)
8787
- [Installing the Microsoft ODBC Driver for SQL Server on macOS](install-microsoft-odbc-driver-sql-server-macos.md)
8888

89-
For more information, see the [Programming guidelines](programming-guidelines.md) and the [Release notes](release-notes-odbc-sql-server-linux-mac.md).
89+
For more information, see the [Programming guidelines](programming-guidelines.md) and the [Release notes](release-notes-odbc-sql-server-linux-mac.md).

docs/connect/spark/connector.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ This library contains the source code for the Apache Spark Connector for SQL Ser
2020

2121
[Apache Spark](https://spark.apache.org/) is a unified analytics engine for large-scale data processing.
2222

23-
You can build the connector from source or download the jar from the Release section in GitHub. For the latest information about the connector, see [SQL Spark connector GitHub repository](https://github.com/microsoft/sql-spark-connector).
23+
You can import the connector into your project through the Maven coordinates: `com.microsoft.azure:spark-mssql-connector:1.0.0`. You can also build the connector from source or download the jar from the Release section in GitHub. For the latest information about the connector, see [SQL Spark connector GitHub repository](https://github.com/microsoft/sql-spark-connector).
2424

2525
## Supported Features
2626

@@ -44,14 +44,15 @@ You can build the connector from source or download the jar from the Release sec
4444
### Supported Options
4545
The Apache Spark Connector for SQL Server and Azure SQL supports the options defined here: [SQL DataSource JDBC](https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html)
4646

47-
In addition, the following options are supported
47+
In addition following options are supported
4848

4949
| Option | Default | Description |
5050
| --------- | ------------------ | ------------------------------------------ |
51-
| `reliabilityLevel` | `BEST_EFFORT` | `BEST_EFFORT` or `NO_DUPLICATES`. `NO_DUPLICATES` implements a reliable insert in executor restart scenarios |
52-
| `dataPoolDataSource` | `none` | `none` implies the value is not set and the connector should write to a single instance of SQL Server. Set this value to data source name to write to a data pool table in a SQL Server Big Data Cluster|
51+
| `reliabilityLevel` | `BEST_EFFORT` | `BEST_EFFORT` or `NO_DUPLICATES`. `NO_DUPLICATES` implements an reliable insert in executor restart scenarios |
52+
| `dataPoolDataSource` | `none` | `none` implies the value is not set and the connector should write to SQL Server single instance. Set this value to data source name to write a data pool table in Big Data Clusters|
5353
| `isolationLevel` | `READ_COMMITTED` | Specify the isolation level |
5454
| `tableLock` | `false` | Implements an insert with `TABLOCK` option to improve write performance |
55+
| `schemaCheckEnabled` | `true` | Disables strict data frame and sql table schema check when set to false |
5556

5657
Other [bulk copy options](../jdbc/using-bulk-copy-with-the-jdbc-driver.md#sqlserverbulkcopyoptions) can be set as options on the `dataframe` and will be passed to `bulkcopy` APIs on write
5758

@@ -223,4 +224,6 @@ The Apache Spark Connector for Azure SQL and SQL Server is an open-source projec
223224

224225
## Next steps
225226

226-
Visit the [SQL Spark connector GitHub repository](https://github.com/microsoft/sql-spark-connector).
227+
Visit the [SQL Spark connector GitHub repository](https://github.com/microsoft/sql-spark-connector).
228+
229+
For information about isolation levels, see [SET TRANSACTION ISOLATION LEVEL (Transact-SQL)](../../t-sql/statements/set-transaction-isolation-level-transact-sql.md).

0 commit comments

Comments
 (0)