Skip to content

Commit 21e745e

Browse files
Merge pull request #18678 from WilliamDAssafMSFT/20210225-what-is-polybase
20210225 what is polybase? better definition of pushdown. update image
2 parents ac55708 + b8e85be commit 21e745e

5 files changed

Lines changed: 159 additions & 65 deletions

File tree

-8.93 KB
Loading

docs/relational-databases/polybase/polybase-configuration.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.topic: conceptual
88
author: MikeRayMSFT
99
ms.author: mikeray
1010
ms.reviewer: ""
11-
monikerRange: ">= sql-server-2016"
11+
monikerRange: ">= sql-server-2016 "
1212
---
1313

1414
# PolyBase configuration and security for Hadoop
Lines changed: 58 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
2-
title: "What is PolyBase? | Microsoft Docs"
3-
description: PolyBase enables your SQL Server instance to process Transact-SQL queries that read data from external data sources such as Hadoop and Azure Blob Storage.
4-
ms.date: 12/14/2019
2+
title: "Introducing data virtualization with PolyBase"
3+
description: PolyBase enables your SQL Server instance to process Transact-SQL queries that read data from external data sources such as Hadoop and Azure blob storage.
4+
ms.date: 03/23/2021
55
ms.prod: sql
66
ms.technology: polybase
77
ms.topic: "overview"
@@ -19,53 +19,76 @@ ms.custom: contperf-fy21q2
1919
author: MikeRayMSFT
2020
ms.author: mikeray
2121
ms.reviewer: ""
22-
monikerRange: ">=sql-server-2016||>=sql-server-linux-2017||>=aps-pdw-2016||=azure-sqldw-latest"
22+
monikerRange: ">=sql-server-2016||>=sql-server-linux-ver15||>=aps-pdw-2016||=azure-sqldw-latest"
2323
---
2424

25-
# What is PolyBase?
25+
# Introducing data virtualization with PolyBase
2626

2727
[!INCLUDE[appliesto-ss-xxxx-asdw-pdw-md](../../includes/appliesto-ss-xxxx-asdw-pdw-md.md)]
2828

29-
PolyBase enables your SQL Server instance to process Transact-SQL queries that read data from external data sources. The same query can also access relational tables in your instance of SQL Server. PolyBase enables the same query to also join the data from external sources and SQL Server.
29+
PolyBase is a data virtualization feature for [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)].
3030

31-
To use PolyBase, in an instance of SQL Server:
31+
## What is PolyBase?
3232

33-
1. [Install PolyBase on Windows](polybase-installation.md)
34-
1. Create an [external data source](../../t-sql/statements/create-external-data-source-transact-sql.md)
35-
1. Create an [external table](../../t-sql/statements/create-external-table-transact-sql.md)
33+
PolyBase enables your [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] instance to query data with T-SQL directly from [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)], Oracle, Teradata, MongoDB, Hadoop clusters, Cosmos DB without separately installing client connection software. You can also use the generic ODBC connector to connect to additional providers using third-party ODBC drivers. PolyBase allows T-SQL queries to join the data from external sources to relational tables in an instance of [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)].
3634

37-
Together, these provide the connection to the external data source.
35+
A key use case for data virtualization with the PolyBase feature is to allow the data to stay in its original location and format. You can virtualize the external data through the [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] instance, so that it can be queried in place like any other table in [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)]. This process minimizes the need for ETL processes for data movement. This data virtualization scenario is possible with the use of PolyBase connectors.
3836

39-
SQL Server 2016 introduces PolyBase with support for connections to Hadoop and Azure Blob Storage.
37+
> [!NOTE]
38+
> Some functionality of the PolyBase feature is in private preview for **Azure SQL managed instances**, including the ability to query external data (Parquet files) in Azure Data Lake Storage (ADLS) Gen2. Private preview includes access to client libraries and documentation for testing purposes that are not yet available publicly. If you are interested and ready to invest some time in trying out the functionalities and sharing your feedback and questions, please review the [Azure SQL Managed Instance PolyBase Private Preview Guide](https://sqlmipg.blob.core.windows.net/azsqlpolybaseshare/Azure_SQL_Managed_Instance_Polybase_Private_Preview_Onboarding_Guide.pdf).
4039
41-
SQL Server 2019 introduces additional connectors, including SQL Server, Oracle, Teradata, and MongoDB.
40+
### Supported SQL products and services
4241

43-
![PolyBase logical](../../relational-databases/polybase/media/polybase-logical.png "PolyBase logical")
42+
PolyBase provides these same functionalities for the following SQL products from Microsoft:
4443

45-
PolyBase pushes some computations to the external source to optimize the overall query. PolyBase external access is not limited to Hadoop. Other unstructured non-relational tables are also supported, such as delimited text files.
44+
- [!INCLUDE[sssql16-md](../../includes/sssql16-md.md)] and later versions (Windows only)
45+
- [!INCLUDE[sssql19-md](../../includes/sssql19-md.md)] and later versions (Linux)
46+
- [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] [!INCLUDE[pdw](../../includes/sspdw-md.md)] (PDW), hosted in the Analytics Platform System (APS)
47+
- [!INCLUDE[ssazuresynapse_md](../../includes/ssazuresynapse_md.md)]
4648

47-
Examples of external connectors include:
49+
### PolyBase connectors
4850

49-
- [SQL Server](polybase-configure-sql-server.md)
51+
The PolyBase feature provides connectivity to the following external data sources:
52+
53+
| External data sources | [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] with PolyBase | APS PDW | [!INCLUDE[ssazuresynapse_md](../../includes/ssazuresynapse_md.md)] |
54+
|---------------------------|--------------------------|------------|---------------|
55+
| Oracle, MongoDB, Teradata | Read | **No** | **No** |
56+
| Generic ODBC | Read (Windows Only) | **No** | **No** |
57+
| Azure Storage | Read/Write | Read/Write | Read/Write |
58+
| Hadoop | Read/Write | Read/Write | **No** |
59+
| [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] | Read | **No** | **No** |
60+
| | | | |
61+
62+
63+
* [!INCLUDE[sssql16-md](../../includes/sssql16-md.md)] introduced PolyBase with support for connections to Hadoop and Azure blob storage.
64+
* [!INCLUDE[sssql19-md](../../includes/sssql19-md.md)] introduced additional connectors, including [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)], Oracle, Teradata, and MongoDB.
65+
66+
Examples of external connectors include:
67+
68+
- [[!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)]](polybase-configure-sql-server.md)
5069
- [Oracle](polybase-configure-oracle.md)
5170
- [Teradata](polybase-configure-teradata.md)
5271
- [MongoDB](polybase-configure-mongodb.md)
72+
- [Hadoop](polybase-configure-hadoop.md)*
5373

54-
### Supported SQL products and services
74+
\* PolyBase supports two Hadoop providers, Hortonworks Data Platform (HDP) and Cloudera Distributed Hadoop (CDH).
75+
76+
To use PolyBase in an instance of [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)]:
77+
78+
1. [Install PolyBase on Windows](polybase-installation.md) or [Install PolyBase on Linux](polybase-linux-setup.md).
79+
1. Starting with [!INCLUDE[sssql19-md](../../includes/sssql19-md.md)], [enable PolyBase in sp_configure](polybase-installation.md#enable), if necessary.
80+
1. Create an [external data source](../../t-sql/statements/create-external-data-source-transact-sql.md).
81+
1. Create an [external table](../../t-sql/statements/create-external-table-transact-sql.md).
5582

56-
PolyBase provides these same functionalities for the following SQL products from Microsoft:
5783

58-
- SQL Server 2016 and later versions (Windows only)
59-
- Analytics Platform System (formerly Parallel Data Warehouse)
60-
- Azure Synapse Analytics
6184

6285
### Azure integration
6386

64-
With the underlying help of PolyBase, T-SQL queries can also import and export data from Azure Blob Storage. Further, PolyBase enables Azure Synapse Analytics to import and export data from Azure Data Lake Store, and from Azure Blob Storage.
87+
With the underlying help of PolyBase, T-SQL queries can also import and export data from Azure blob storage. Further, PolyBase enables [!INCLUDE[ssazuresynapse_md](../../includes/ssazuresynapse_md.md)] to import and export data from Azure Data Lake Store, and from Azure blob storage.
6588

6689
## Why use PolyBase?
6790

68-
PolyBase allows you to join data from a SQL Server instance with external data. Prior to PolyBase to join data to external data sources you could either:
91+
PolyBase allows you to join data from a [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] instance with external data. Prior to PolyBase to join data to external data sources you could either:
6992

7093
- Transfer half your data so that all the data was in one location.
7194
- Query both sources of data, then write custom query logic to join and integrate the data at the client level.
@@ -76,32 +99,32 @@ PolyBase does not require you to install additional software to your Hadoop envi
7699

77100
### PolyBase uses
78101

79-
PolyBase enables the following scenarios in SQL Server:
102+
PolyBase enables the following scenarios in [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)]:
80103

81-
- **Query data stored in Hadoop from a SQL Server instance or PDW.** Users are storing data in cost-effective distributed and scalable systems, such as Hadoop. PolyBase makes it easy to query the data by using T-SQL.
104+
- **Query data stored in Hadoop from a [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] instance or PDW.** Users are storing data in cost-effective distributed and scalable systems, such as Hadoop. PolyBase makes it easy to query the data by using T-SQL.
82105

83-
- **Query data stored in Azure Blob Storage.** Azure blob storage is a convenient place to store data for use by Azure services. PolyBase makes it easy to access the data by using T-SQL.
106+
- **Query data stored in Azure blob storage.** Azure blob storage is a convenient place to store data for use by Azure services. PolyBase makes it easy to access the data by using T-SQL.
84107

85-
- **Import data from Hadoop, Azure Blob Storage, or Azure Data Lake Store.** Leverage the speed of Microsoft SQL's columnstore technology and analysis capabilities by importing data from Hadoop, Azure Blob Storage, or Azure Data Lake Store into relational tables. There is no need for a separate ETL or import tool.
108+
- **Import data from Hadoop, Azure blob storage, or Azure Data Lake Store.** Leverage the speed of Microsoft SQL's columnstore technology and analysis capabilities by importing data from Hadoop, Azure blob storage, or Azure Data Lake Store into relational tables. There is no need for a separate ETL or import tool.
86109

87-
- **Export data to Hadoop, Azure Blob Storage, or Azure Data Lake Store.** Archive data to Hadoop, Azure Blob Storage, or Azure Data Lake Store to achieve cost-effective storage and keep it online for easy access.
110+
- **Export data to Hadoop, Azure blob storage, or Azure Data Lake Store.** Archive data to Hadoop, Azure blob storage, or Azure Data Lake Store to achieve cost-effective storage and keep it online for easy access.
88111

89-
- **Integrate with BI tools.** Use PolyBase with Microsoft's business intelligence and analysis stack, or use any third party tools that are compatible with SQL Server.
112+
- **Integrate with BI tools.** Use PolyBase with Microsoft's business intelligence and analysis stack, or use any third-party tools that are compatible with [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)].
90113

91114
## Performance
92115

93-
- **Push computation to Hadoop.** The query optimizer makes a cost-based decision to push computation to Hadoop, if that will improve query performance. The query optimizer uses statistics on external tables to make the cost-based decision. Pushing computation creates MapReduce jobs and leverages Hadoop's distributed computational resources.
116+
- **Push computation to Hadoop.** PolyBase pushes some computations to the external source to optimize the overall query. The query optimizer makes a cost-based decision to push computation to Hadoop, if that will improve query performance. The query optimizer uses statistics on external tables to make the cost-based decision. Pushing computation creates MapReduce jobs and leverages Hadoop's distributed computational resources. For more information, see [Pushdown computations in PolyBase](polybase-pushdown-computation.md).
94117

95-
- **Scale compute resources.** To improve query performance, you can use SQL Server [PolyBase scale-out groups](../../relational-databases/polybase/polybase-scale-out-groups.md). This enables parallel data transfer between SQL Server instances and Hadoop nodes, and it adds compute resources for operating on the external data.
118+
- **Scale compute resources.** To improve query performance, you can use [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] [PolyBase scale-out groups](../../relational-databases/polybase/polybase-scale-out-groups.md). This enables parallel data transfer between [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] instances and Hadoop nodes, and it adds compute resources for operating on the external data.
96119

97120
## Next steps
98121

99-
Before using PolyBase, you must [install the PolyBase feature](polybase-installation.md). Then see the following configuration guides depending on your data source:
122+
Before using PolyBase, you must [install PolyBase on Windows](polybase-installation.md) or [install PolyBase on Linux](polybase-linux-setup.md), and [enable PolyBase in sp_configure](polybase-installation.md#enable) if necessary. Then see the following configuration guides depending on your data source:
100123

101124
- [Hadoop](polybase-configure-hadoop.md)
102-
- [Azure Blob Storage](polybase-configure-azure-blob-storage.md)
125+
- [Azure blob storage](polybase-configure-azure-blob-storage.md)
103126
- [SQL Server](polybase-configure-sql-server.md)
104127
- [Oracle](polybase-configure-oracle.md)
105128
- [Teradata](polybase-configure-teradata.md)
106129
- [MongoDB](polybase-configure-mongodb.md)
107-
- [ODBC Generic Types](polybase-configure-odbc-generic.md)
130+
- [ODBC Generic Types](polybase-configure-odbc-generic.md)

0 commit comments

Comments
 (0)