Skip to content

Commit a680ef8

Browse files
Merge pull request #30643 from MicrosoftDocs/WilliamDAssafMSFT-patch-1
20240513 Update polybase-pushdown-computation.md
2 parents bcfcf70 + b47ad14 commit a680ef8

1 file changed

Lines changed: 16 additions & 17 deletions

File tree

docs/relational-databases/polybase/polybase-pushdown-computation.md

Lines changed: 16 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Enable pushdown computation to improve performance of queries on yo
55
author: MikeRayMSFT
66
ms.author: mikeray
77
ms.reviewer: wiassaf, nathansc
8-
ms.date: 7/11/2023
8+
ms.date: 5/13/2024
99
ms.service: sql
1010
ms.subservice: polybase
1111
ms.topic: conceptual
@@ -37,7 +37,7 @@ This table summarizes pushdown computation support on different external data so
3737
| Data Source | Joins | Projections | Aggregations | Filters | Statistics |
3838
|------------------|--------|-------------|--------------|-----------|------------|
3939
| **Generic ODBC** | Yes | Yes | Yes | Yes | Yes |
40-
| **Oracle** | Yes | Yes | Yes | Yes | Yes |
40+
| **Oracle** | Yes\+ | Yes | Yes | Yes | Yes |
4141
| **[!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)]** | Yes | Yes | Yes | Yes | Yes |
4242
| **Teradata** | Yes | Yes | Yes | Yes | Yes |
4343
| **MongoDB\*** | **No** | Yes | Yes\*\*\* | Yes\*\*\* | Yes |
@@ -50,14 +50,16 @@ This table summarizes pushdown computation support on different external data so
5050

5151
\*\*\* Pushdown support for aggregations and filters for the MongoDB ODBC connector for SQL Server 2019 was introduced with SQL Server 2019 CU18.
5252

53+
\+ Oracle supports pushdown for joins but you might need to create statistics on the join columns to achieve pushdown.
54+
5355
> [!NOTE]
5456
> Pushdown computation can be blocked by some T-SQL syntax. For more information, review [Syntax that prevents pushdown](polybase-pushdown-computation.md#syntax-that-prevents-pushdown).
5557
5658
### Pushdown computation and Hadoop providers
5759

5860
PolyBase currently supports two Hadoop providers: Hortonworks Data Platform (HDP) and Cloudera Distributed Hadoop (CDH). There are no differences between the two features in terms of pushdown computation.
5961

60-
To use the computation pushdown functionality with Hadoop, the target Hadoop cluster must have the core components of HDFS, YARN and MapReduce, with the job history server enabled. PolyBase submits the pushdown query via MapReduce and pulls status from the job history server. Without either component, the query fails.
62+
To use the computation pushdown functionality with Hadoop, the target Hadoop cluster must have the core components of HDFS, YARN, and MapReduce, with the job history server enabled. PolyBase submits the pushdown query via MapReduce and pulls status from the job history server. Without either component, the query fails.
6163

6264
Some aggregation must occur after the data reaches [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)]. But a portion of the aggregation occurs in Hadoop. This method is common in computing aggregations in massively parallel processing systems.
6365

@@ -85,14 +87,14 @@ In many cases, PolyBase can facilitate pushdown of the join operator for the joi
8587

8688
If the join can be done at the external data source, this reduces the amount of data movement and improves the query's performance. Without join pushdown, the data from the tables to be joined must be brought locally into tempdb, then joined.
8789

88-
Note that in the case of *distributed joins* (joining a local table to an external table), unless there is some filtering criteria on the external table that is applied to the join condition, all of the data in the external table must be brought locally into `tempdb` in order to perform the join operation. For example, the following query has no filtering on the external table join condition, which will result in all of the data from the external table being read.
90+
In the case of *distributed joins* (joining a local table to an external table), unless there is a filter on the joined external table, all of the data in the external table must be brought locally into `tempdb` in order to perform the join operation. For example, the following query has no filtering on the external table join condition, which will result in all of the data from the external table being read.
8991

9092
```sql
9193
SELECT * FROM LocalTable L
9294
JOIN ExternalTable E on L.id = E.id
9395
```
9496

95-
Since the join is on `E.id` column of the external table, if a filter condition is added to that column, the filter can be pushed down thereby reducing the number of rows read from the external table
97+
Since the join is on `E.id` column of the external table, if a filter condition is added to that column, the filter can be pushed down thereby reducing the number of rows read from the external table.
9698

9799
```sql
98100
SELECT * FROM LocalTable L
@@ -115,7 +117,7 @@ SELECT * FROM SensorData WHERE Speed > 65;
115117

116118
Use predicate pushdown to improve performance for a query that selects a subset of columns from an external table.
117119

118-
In this query, [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] initiates a map-reduce job to pre-process the Hadoop delimited-text file so that only the data for the two columns, customer.name and customer.zip_code, will be copied to [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)].
120+
In this query, [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] initiates a map-reduce job to preprocess the Hadoop delimited-text file so that only the data for the two columns, customer.name and customer.zip_code, will be copied to [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)].
119121

120122
```sql
121123
SELECT customer.name, customer.zip_code
@@ -187,7 +189,7 @@ Date & time functions
187189

188190
## Syntax that prevents pushdown
189191

190-
The following T-SQL functions or syntax will prevent pushdown computation:
192+
The following T-SQL functions or syntax prevents pushdown computation:
191193

192194
- `AT TIME ZONE`
193195
- `CONCAT_WS`
@@ -212,7 +214,7 @@ Pushdown support for the `FORMAT` and `TRIM` syntax was introduced in [!INCLUDE[
212214

213215
### Filter clause with variable
214216

215-
If you are specifying a variable in a filter clause, by default this will prevent pushdown of the filter clause. For example, if you run the following query, the filter clause will not be pushed down:
217+
When specifying a variable in a filter clause, by default this prevents pushdown of the filter clause. For example, if you run the following query, the filter clause will not be pushed down:
216218

217219
```sql
218220
DECLARE @BusinessEntityID INT
@@ -223,21 +225,21 @@ WHERE BusinessEntityID = @BusinessEntityID;
223225

224226
To achieve pushdown of the variable, you need to enable query optimizer hotfixes functionality. This can be done in any of the following ways:
225227
- Instance Level: Enable trace flag 4199 as a startup parameter for the instance
226-
- Database Level: In the context of the database that has the PolyBase external objects, execute ALTER DATABASE SCOPED CONFIGURATION SET QUERY_OPTIMIZER_HOTFIXES = ON
228+
- Database Level: In the context of the database that has the PolyBase external objects, execute `ALTER DATABASE SCOPED CONFIGURATION SET QUERY_OPTIMIZER_HOTFIXES = ON`
227229
- Query level:
228-
Use query hint OPTION (QUERYTRACEON 4199) or OPTION (USE HINT ('ENABLE_QUERY_OPTIMIZER_HOTFIXES'))
230+
Use query hint `OPTION (QUERYTRACEON 4199)` or `OPTION (USE HINT ('ENABLE_QUERY_OPTIMIZER_HOTFIXES'))`
229231

230232
This limitation applies to execution of [sp_executesql](../system-stored-procedures/sp-executesql-transact-sql.md). The limitation also applies to utilization of some functions in the filter clause.
231233

232-
Note: The ability to pushdown the variable was first introduced in SQL Server 2019 CU5.
234+
The ability to pushdown the variable was first introduced in SQL Server 2019 CU5.
233235

234236
### Collation conflict
235237

236-
When working with data with different collation pushdown might not be possible, operators like `COLLATE` can also interfere with the outcome. Equal collations or binary collations are supported. For more information, see [How to tell if pushdown occurred](polybase-how-to-tell-pushdown-computation.md).
238+
Pushdown might not be possible with data with different collations. Operators like `COLLATE` can also interfere with the outcome. Equal collations or binary collations are supported. For more information, see [How to tell if pushdown occurred](polybase-how-to-tell-pushdown-computation.md).
237239

238240
## Pushdown for parquet files
239241

240-
Starting in [!INCLUDE[sssql22-md](../../includes/sssql22-md.md)], PolyBase introduced support for parquet files. SQL Server is capable of performing both row and column elimination when performing pushdown with parquet. When working with parquet files, the following operations can be pushed down:
242+
Starting in [!INCLUDE[sssql22-md](../../includes/sssql22-md.md)], PolyBase introduced support for parquet files. SQL Server is capable of performing both row and column elimination when performing pushdown with parquet. With parquet files, the following operations can be pushed down:
241243

242244
- Binary comparison operators (>, >=, <=, <) for numeric, date, and time values.
243245
- Combination of comparison operators (> AND <, >= AND <, > AND <=, <= AND >=).
@@ -293,10 +295,7 @@ WHERE Speed > 65
293295
OPTION (DISABLE EXTERNALPUSHDOWN);
294296
```
295297

296-
## Next steps
298+
## Related content
297299

298300
- For more information about PolyBase, see [Introducing data virtualization with PolyBase](polybase-guide.md)
299-
300-
## See also
301-
302301
- [How to tell if external pushdown occurred](polybase-how-to-tell-pushdown-computation.md)

0 commit comments

Comments
 (0)