You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PolyBase is a data virtualization feature for [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)].
30
30
31
-
Data virtualization allows you to use your [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)]as a data hub, directly querying data from [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)], Oracle, SAP HANA, MongoDB, Hadoop clusters, Cosmos DB using T-SQL, and without separately installing client connection software. Data virtualization allows one T-SQL query to join the data from external sources and other SQL Server instances to relational tables in an instance of SQL Server.
31
+
Data virtualization enables your [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)]to query data directly from [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)], Oracle, SAP HANA, MongoDB, Hadoop clusters, Cosmos DB using T-SQL, without separately installing client connection software. Data virtualization allows one T-SQL query to join the data from external sources and other SQL Server instances to relational tables in an instance of SQL Server.
32
32
33
-
A key use case for data virtualization is to allow the data to stay in its original location and format. You can virtualize the external data through the [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] instance, so that it can be queried in place like any other table in SQL Server. This process minimizes the need for ETL processes to facilitate data movement. This data virtualization scenario is possible with the use of PolyBase connectors.
33
+
A key use case for data virtualization is to allow the data to stay in its original location and format. You can virtualize the external data through the [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] instance, so that it can be queried in place like any other table in SQL Server. This process minimizes the need for ETL processes to facilitate data movement. This data virtualization scenario is possible with the use of PolyBase connectors.
34
34
35
35
To use PolyBase in an instance of [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)]:
36
36
37
37
1.[Install PolyBase on Windows](polybase-installation.md) or [Install PolyBase on Linux](polybase-linux-setup.md)
38
38
1. Create an [external data source](../../t-sql/statements/create-external-data-source-transact-sql.md)
39
39
1. Create an [external table](../../t-sql/statements/create-external-table-transact-sql.md)
40
40
41
+
### PolyBase connectors
42
+
41
43
The PolyBase feature provides the connection to the external data source.
42
44
43
45
*[!INCLUDE[sssql16-md](../../includes/sssql16-md.md)] introduced PolyBase with support for connections to Hadoop and Azure Blob Storage.
44
46
*[!INCLUDE[sssql19-md](../../includes/sssql19-md.md)] introduced additional connectors, including SQL Server, Oracle, Teradata, and MongoDB.
47
+
* Other unstructured non-relational tables are also supported with PolyBase, such as delimited text files.
PolyBase pushes some computations to the external source to optimize the overall query. For more information, see [Pushdown computations in PolyBase](polybase-pushdown-computation.md).
49
-
50
-
PolyBase external access is not limited to Hadoop, other unstructured non-relational tables are also supported, such as delimited text files.
51
-
52
51
Examples of external connectors include:
53
52
54
53
-[SQL Server](polybase-configure-sql-server.md)
@@ -96,7 +95,7 @@ PolyBase enables the following scenarios in [!INCLUDE[ssNoVersion](../../include
96
95
97
96
## Performance
98
97
99
-
-**Push computation to Hadoop.** The query optimizer makes a cost-based decision to push computation to Hadoop, if that will improve query performance. The query optimizer uses statistics on external tables to make the cost-based decision. Pushing computation creates MapReduce jobs and leverages Hadoop's distributed computational resources.
98
+
-**Push computation to Hadoop.**PolyBase pushes some computations to the external source to optimize the overall query. The query optimizer makes a cost-based decision to push computation to Hadoop, if that will improve query performance. The query optimizer uses statistics on external tables to make the cost-based decision. Pushing computation creates MapReduce jobs and leverages Hadoop's distributed computational resources. For more information, see [Pushdown computations in PolyBase](polybase-pushdown-computation.md).
100
99
101
100
-**Scale compute resources.** To improve query performance, you can use [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)][PolyBase scale-out groups](../../relational-databases/polybase/polybase-scale-out-groups.md). This enables parallel data transfer between [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] instances and Hadoop nodes, and it adds compute resources for operating on the external data.
0 commit comments