Skip to content

Commit e1e31fd

Browse files
20210225 1909
1 parent fe53812 commit e1e31fd

1 file changed

Lines changed: 6 additions & 7 deletions

File tree

docs/relational-databases/polybase/polybase-guide.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -28,27 +28,26 @@ monikerRange: ">=sql-server-2016||>=sql-server-linux-2017||>=aps-pdw-2016||=azur
2828

2929
PolyBase is a data virtualization feature for [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)].
3030

31-
Data virtualization allows you to use your [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] as a data hub, directly querying data from [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)], Oracle, SAP HANA, MongoDB, Hadoop clusters, Cosmos DB using T-SQL, and without separately installing client connection software. Data virtualization allows one T-SQL query to join the data from external sources and other SQL Server instances to relational tables in an instance of SQL Server.
31+
Data virtualization enables your [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] to query data directly from [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)], Oracle, SAP HANA, MongoDB, Hadoop clusters, Cosmos DB using T-SQL, without separately installing client connection software. Data virtualization allows one T-SQL query to join the data from external sources and other SQL Server instances to relational tables in an instance of SQL Server.
3232

33-
A key use case for data virtualization is to allow the data to stay in its original location and format. You can virtualize the external data through the [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] instance, so that it can be queried in place like any other table in SQL Server. This process minimizes the need for ETL processes to facilitate data movement. This data virtualization scenario is possible with the use of PolyBase connectors.
33+
A key use case for data virtualization is to allow the data to stay in its original location and format. You can virtualize the external data through the [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] instance, so that it can be queried in place like any other table in SQL Server. This process minimizes the need for ETL processes to facilitate data movement. This data virtualization scenario is possible with the use of PolyBase connectors.
3434

3535
To use PolyBase in an instance of [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)]:
3636

3737
1. [Install PolyBase on Windows](polybase-installation.md) or [Install PolyBase on Linux](polybase-linux-setup.md)
3838
1. Create an [external data source](../../t-sql/statements/create-external-data-source-transact-sql.md)
3939
1. Create an [external table](../../t-sql/statements/create-external-table-transact-sql.md)
4040

41+
### PolyBase connectors
42+
4143
The PolyBase feature provides the connection to the external data source.
4244

4345
* [!INCLUDE[sssql16-md](../../includes/sssql16-md.md)] introduced PolyBase with support for connections to Hadoop and Azure Blob Storage.
4446
* [!INCLUDE[sssql19-md](../../includes/sssql19-md.md)] introduced additional connectors, including SQL Server, Oracle, Teradata, and MongoDB.
47+
* Other unstructured non-relational tables are also supported with PolyBase, such as delimited text files.
4548

4649
![PolyBase logical](../../relational-databases/polybase/media/polybase-logical.png "PolyBase logical")
4750

48-
PolyBase pushes some computations to the external source to optimize the overall query. For more information, see [Pushdown computations in PolyBase](polybase-pushdown-computation.md).
49-
50-
PolyBase external access is not limited to Hadoop, other unstructured non-relational tables are also supported, such as delimited text files.
51-
5251
Examples of external connectors include:
5352

5453
- [SQL Server](polybase-configure-sql-server.md)
@@ -96,7 +95,7 @@ PolyBase enables the following scenarios in [!INCLUDE[ssNoVersion](../../include
9695

9796
## Performance
9897

99-
- **Push computation to Hadoop.** The query optimizer makes a cost-based decision to push computation to Hadoop, if that will improve query performance. The query optimizer uses statistics on external tables to make the cost-based decision. Pushing computation creates MapReduce jobs and leverages Hadoop's distributed computational resources.
98+
- **Push computation to Hadoop.** PolyBase pushes some computations to the external source to optimize the overall query. The query optimizer makes a cost-based decision to push computation to Hadoop, if that will improve query performance. The query optimizer uses statistics on external tables to make the cost-based decision. Pushing computation creates MapReduce jobs and leverages Hadoop's distributed computational resources. For more information, see [Pushdown computations in PolyBase](polybase-pushdown-computation.md).
10099

101100
- **Scale compute resources.** To improve query performance, you can use [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] [PolyBase scale-out groups](../../relational-databases/polybase/polybase-scale-out-groups.md). This enables parallel data transfer between [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] instances and Hadoop nodes, and it adds compute resources for operating on the external data.
102101

0 commit comments

Comments
 (0)