Skip to content

Commit d3299b2

Browse files
authored
Merge pull request #21200 from WilliamDAssafMSFT/release-bdc-2022
20220211 bdc adjacent feature retirement
2 parents cee9f24 + 6e4b431 commit d3299b2

9 files changed

Lines changed: 72 additions & 26 deletions

docs/big-data-cluster/big-data-options.md

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: This article discusses migration strategies for SQL Server 2019 Big
55
author: WilliamDAssafMSFT
66
ms.author: wiassaf
77
ms.reviewer: dacoelho
8-
ms.date: 02/15/2022
8+
ms.date: 02/22/2022
99
ms.topic: conceptual
1010
ms.prod: sql
1111
ms.technology: big-data-cluster
@@ -23,6 +23,21 @@ The [!INCLUDE[ssbigdataclusters-ver15](../includes/ssbigdataclusters-ver15.md)]
2323

2424
On January 31, 2025, we will be retiring [!INCLUDE[ssbigdataclusters-ver15](../includes/ssbigdataclusters-ver15.md)]. All existing users of SQL Server 2019 with Software Assurance will be fully supported on the platform and the software will continue to be maintained through SQL Server cumulative updates until that time. **For more information, see [the announcement blog post](https://aka.ms/sqlserver_bigdataclusters).**
2525

26+
## Changes to PolyBase support in SQL Server
27+
28+
Related to the [!INCLUDE[ssbigdataclusters-ver15](../includes/ssbigdataclusters-ver15.md)] retirement are some features related to scale out queries.
29+
30+
The PolyBase scale-out groups feature of Microsoft SQL Server has been retired. Scale-out group functionality will be removed from the product in SQL Server 2022. In-market SQL Server 2019, 2017 and 2016 will continue to support the functionality to the end of life of those products. PolyBase data virtualization will continue to be fully supported as a scale-up feature in SQL Server.
31+
32+
Cloudera (CDP) and Hortonworks (HDP) external data sources will also be retired for all in-market versions of SQL Server and will not be included in SQL Server 2022. Moving forward, support for external data sources will be limited to product versions in mainstream support by the respective vendor. You are advised encouraged to use the new object storage integration available in SQL Server 2022. Integration with HDFS will also be added to SQL Server 2022 in a future CTP using a new webhdfs connector.
33+
34+
Connectivity to HDFS and object storage will now use publicly documented REST APIs instead of a JAVA Hadoop client. In SQL Server 2022, users will need to configure their external data sources to use new connectors when connecting to Azure Storage. The table below summarizes the change:
35+
36+
| External Data Source | From | To |
37+
|:--|:--|:--|
38+
| Azure Blob Storage | wasb[s] | abs |
39+
| ADLS Gen 2 | abfs[s] | adls |
40+
2641
## Understanding the Big Data Clusters architecture for replacement and migration options
2742

2843
To create your replacement solution for a Big Data storage and processing system, it's important to understand what [!INCLUDE[ssbigdataclusters-ver15](../includes/ssbigdataclusters-ver15.md)] provides, and its architecture can help inform your choices. The architecture of a big data cluster is as follows:
@@ -37,7 +52,7 @@ This architecture provides the following functionality mapping:
3752
|Big Data Clusters Controller | Provides management and security for the cluster. It contains the control service, the configuration store, and other cluster-level services such as Kibana, Grafana, and Elastic Search. |
3853
|Compute Pool | Provides computational resources to the cluster. It contains nodes running SQL Server on Linux pods. The pods in the compute pool are divided into SQL Compute instances for specific processing tasks. This component also provides Data Virtualization using PolyBase to query external data sources without moving or copying the data.|
3954
|Data Pool | Provides data persistence for the cluster. The data pool consists of one or more pods running SQL Server on Linux. It is used to ingest data from SQL queries or Spark jobs.|
40-
|Storage Pool | The storage pool consists of storage pool pods comprised of SQL Server on Linux, Spark, and HDFS. All the storage nodes in a SQL Server big data cluster are members of an HDFS cluster.|
55+
|Storage Pool | The storage pool consists of storage pool pods comprised of SQL Server on Linux, Spark, and HDFS. All the storage nodes in a big data cluster are members of an HDFS cluster.|
4156
| App Pool | Enables the deployment of applications on a big data cluster by providing interfaces to create, manage, and run applications.|
4257
|||
4358

@@ -174,13 +189,13 @@ SQL Server 2022 (either on-premises, in-cloud, or both) contains a new feature t
174189

175190
For your operational and even much of your analytic workloads, SQL Server can handle massive database sizes - for more information on maximum capacity specifications for SQL Server, see [Compute capacity limits by edition of SQL Server]()../sql-server/maximum-capacity-specifications-for-sql-server.md). Using multiple SQL Server Instances on separate machines with partitioned T-SQL requests allow a scale-out environment for applications.
176191

177-
Using PolyBase enables your SQL Server instance to query data with T-SQL directly from SQL Server, Oracle, Teradata, MongoDB, commercial Hadoop clusters, and Cosmos DB without separately installing client connection software. You can also use the generic ODBC connector on a Microsoft Windows-based Instance to connect to additional providers using third-party ODBC drivers. PolyBase allows T-SQL queries to join the data from external sources to relational tables in an instance of SQL Server. This allows the data to stay in its original location and format. You can virtualize the external data through the SQL Server instance, so that it can be queried in place like any other table in SQL Server. SQL Server 2022 also allows ad-hoc queries and backup/restore over Object-Store (using the S3-API) hardware or software storage options.
192+
Using PolyBase enables your SQL Server instance to query data with T-SQL directly from SQL Server, Oracle, Teradata, MongoDB, and Cosmos DB without separately installing client connection software. You can also use the generic ODBC connector on a Microsoft Windows-based Instance to connect to additional providers using third-party ODBC drivers. PolyBase allows T-SQL queries to join the data from external sources to relational tables in an instance of SQL Server. This allows the data to stay in its original location and format. You can virtualize the external data through the SQL Server instance, so that it can be queried in place like any other table in SQL Server. SQL Server 2022 also allows ad-hoc queries and backup/restore over Object-Store (using the S3-API) hardware or software storage options.
178193

179194
Two general reference architectures are to use SQL Server on a stand-alone server for structured data queries and a separate installation of a scale-out non-relational system (such as Apache Hadoop or Apache Spark) for on-premises Link to Synapse, and the other option is to use a set of containers in a Kubernetes cluster with all components for your solution.
180195

181196
### Microsoft SQL Server on Windows, Apache Spark, and Object Storage On-Premises
182197

183-
You can install SQL Server on Windows or Linux, and scale up the hardware architecture, leveraging the SQL Server 2022 object-storage query capability and the PolyBase Feature to enable queries across all data in your system.
198+
You can install SQL Server on Windows or Linux, and scale up the hardware architecture, leveraging the SQL Server 2022 object-storage query capability and the PolyBase feature to enable queries across all data in your system.
184199

185200
Installing and configuring a scale-out platform such as Apache Hadoop or Apache Spark allows for querying non-relational data at scale. Using a central set of Object-Storage systems that support the S3-API allows both SQL Server 2022 and Spark to access the same set of data across all systems.
186201

docs/big-data-cluster/index.yml

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
### YamlMime:Landing
22

33
title: Big Data Clusters
4-
summary: SQL Server Big Data Clusters is the multi-cloud, open data platform for analytics at any scale. Big Data Clusters unites SQL Server with Apache Spark to deliver the best compute engines available for analytics in a single, easy to use deployment. With these engines, Big Data Clusters is the ideal data platform for AI, ML, M/R, Streaming, BI, T-SQL, and Spark. Delivered as part of the SQL Server 2019 release, Big Data Clusters is a cloud-native solution orchestrated by Kubernetes. Our mission is to accelerate, delight, and empower our users as they quench their thirst for data driven insights.
4+
summary: SQL Server 2019 Big Data Clusters is the multi-cloud, open data platform for analytics at any scale. Big Data Clusters unites SQL Server with Apache Spark to deliver the best compute engines available for analytics in a single, easy to use deployment. With these engines, Big Data Clusters is the ideal data platform for AI, ML, M/R, Streaming, BI, T-SQL, and Spark. Delivered as part of the SQL Server 2019 release, Big Data Clusters is a cloud-native solution orchestrated by Kubernetes. Our mission is to accelerate, delight, and empower our users as they quench their thirst for data driven insights. The Microsoft SQL Server 2019 Big Data Clusters add-on will be retired. Support for SQL Server 2019 Big Data Clusters will end on January 14, 2025.
55

66
metadata:
77
title: Big Data Clusters - Learn how to manage, deploy, and use
@@ -11,7 +11,7 @@ metadata:
1111
ms.technology: big-data-cluster
1212
author: WilliamDAssafMSFT
1313
ms.author: wiassaf
14-
ms.date: 09/07/2021
14+
ms.date: 02/22/2022
1515
ms.prod: sql
1616

1717
# linkListType: architecture | concept | deploy | download | get-started | how-to-guide | learn | overview | quickstart | reference | sample | tutorial | video | whats-new
@@ -22,12 +22,16 @@ landingContent:
2222
# Card (Get started)
2323
- title: About Big Data Clusters
2424
linkLists:
25+
- linkListType: architecture
26+
links:
27+
- text: Big data options on the Microsoft SQL Server platform
28+
url: big-data-options.md
2529
- linkListType: overview
2630
links:
2731
- text: Introducing Big Data Clusters
2832
url: big-data-cluster-overview.md
2933
- text: Big Data Clusters FAQ
30-
url: big-data-cluster-faq.yml
34+
url: big-data-cluster-faq.yml
3135
- linkListType: whats-new
3236
links:
3337
- text: What's new

docs/includes/bdc-banner-retirement.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ ms.author: wiassaf
44
ms.prod: sql
55
ms.technology: big-data-cluster
66
ms.topic: include
7-
ms.date: 01/26/2022
7+
ms.date: 02/22/2022
88
---
99

1010
> [!IMPORTANT]
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
author: WilliamDAssafMSFT
3+
ms.author: wiassaf
4+
ms.prod: sql
5+
ms.technology: polybase
6+
ms.topic: include
7+
ms.date: 02/22/2022
8+
---
9+
10+
SQL Server support for HDFS Cloudera (CDP) and Hortonworks (HDP) external data sources will be retired and will not be included in SQL Server 2022. For more information, see [Big data options on the Microsoft SQL Server platform](../big-data-cluster/big-data-options.md).
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
author: WilliamDAssafMSFT
3+
ms.author: wiassaf
4+
ms.prod: sql
5+
ms.technology: polybase
6+
ms.topic: include
7+
ms.date: 02/22/2022
8+
---
9+
10+
> [!IMPORTANT]
11+
> The Microsoft SQL Server PolyBase scale-out groups will be retired. Scale-out group functionality will be removed from the product in SQL Server 2022. PolyBase data virtualization will continue to be fully supported as a scale-up feature in SQL Server. For more information, see [Big data options on the Microsoft SQL Server platform](../big-data-cluster/big-data-options.md).

docs/relational-databases/polybase/configure-scale-out-groups-windows.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: "Configure PolyBase scale-out groups on Windows"
33
description: Set up a PolyBase scale-out group to create a cluster of SQL Server instances. This improves query performance for large data sets from external sources.
4-
ms.date: 08/05/2021
4+
ms.date: 02/22/2022
55
ms.prod: sql
66
ms.technology: polybase
77
ms.topic: "tutorial"
@@ -16,6 +16,8 @@ monikerRange: ">= sql-server-2016"
1616

1717
This article describes how to set up a [PolyBase scale-out group](polybase-scale-out-groups.md) on Windows. This creates a cluster of SQL Server instances to process large data sets from external data sources, such as Hadoop or Azure Blob Storage, in a scale-out fashion for better query performance.
1818

19+
[!INCLUDE[polybase-scaleout-banner-retirement](../../includes/polybase-scaleout-banner-retirement.md)]
20+
1921
## Prerequisites
2022

2123
- More than one machine in the same domain.

docs/relational-databases/polybase/polybase-guide.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: "Introducing data virtualization with PolyBase"
33
description: PolyBase enables your SQL Server instance to process Transact-SQL queries that read data from external data sources such as Hadoop and Azure blob storage.
4-
ms.date: 03/23/2021
4+
ms.date: 02/22/2022
55
ms.prod: sql
66
ms.technology: polybase
77
ms.topic: "overview"
@@ -73,7 +73,7 @@ PolyBase provides these same functionalities for the following SQL products from
7373
- [MongoDB](polybase-configure-mongodb.md)
7474
- [Hadoop](polybase-configure-hadoop.md)*
7575

76-
\* PolyBase supports two Hadoop providers, Hortonworks Data Platform (HDP) and Cloudera Distributed Hadoop (CDH).
76+
\* PolyBase supports two Hadoop providers, Hortonworks Data Platform (HDP) and Cloudera Distributed Hadoop (CDH), through SQL Server 2019. [!INCLUDE[polybase-java-connector-banner-retirement](../../includes/polybase-java-connector-banner-retirement.md)]
7777

7878
To use PolyBase in an instance of [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)]:
7979

@@ -117,7 +117,9 @@ PolyBase enables the following scenarios in [!INCLUDE[ssNoVersion](../../include
117117

118118
- **Push computation to Hadoop.** PolyBase pushes some computations to the external source to optimize the overall query. The query optimizer makes a cost-based decision to push computation to Hadoop, if that will improve query performance. The query optimizer uses statistics on external tables to make the cost-based decision. Pushing computation creates MapReduce jobs and leverages Hadoop's distributed computational resources. For more information, see [Pushdown computations in PolyBase](polybase-pushdown-computation.md).
119119

120-
- **Scale compute resources.** To improve query performance, you can use [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] [PolyBase scale-out groups](../../relational-databases/polybase/polybase-scale-out-groups.md). This enables parallel data transfer between [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] instances and Hadoop nodes, and it adds compute resources for operating on the external data.
120+
- **Scale compute resources.** To improve query performance, you can use [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] [PolyBase scale-out groups](../../relational-databases/polybase/polybase-scale-out-groups.md). This enables parallel data transfer between [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] instances and Hadoop nodes, and it adds compute resources for operating on the external data.
121+
122+
[!INCLUDE[polybase-scaleout-banner-retirement](../../includes/polybase-scaleout-banner-retirement.md)]
121123

122124
## Next steps
123125

docs/relational-databases/polybase/polybase-scale-out-groups.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: "PolyBase scale-out groups | Microsoft Docs"
33
description: Use the PolyBase Group feature to create a cluster of SQL Server instances. This improves query performance for large data sets from external sources.
4-
ms.date: 04/23/2019
4+
ms.date: 02/22/2022
55
ms.prod: sql
66
ms.technology: polybase
77
ms.topic: conceptual
@@ -11,9 +11,8 @@ helpviewer_keywords:
1111
- "PolyBase"
1212
- "PolyBase, scale-out groups"
1313
- "scale-out PolyBase"
14-
ms.assetid: c7810135-4d63-4161-93ab-0e75e9d10ab5
15-
author: MikeRayMSFT
16-
ms.author: mikeray
14+
author: WilliamDAssafMSFT
15+
ms.author: wiassaf
1716
ms.reviewer: ""
1817
monikerRange: ">= sql-server-2016"
1918
---
@@ -23,6 +22,8 @@ monikerRange: ">= sql-server-2016"
2322

2423
A standalone SQL Server instance with PolyBase can become a performance bottleneck when dealing with massive data sets in Hadoop or Azure Blob Storage. The PolyBase Group feature allows you to create a cluster of SQL Server instances to process large data sets from external data sources, such as Hadoop or Azure Blob Storage, in a scale-out fashion for better query performance. You can now scale your SQL Server compute to meet the performance demands of your workload. PolyBase Scale-out Groups, a group of SQL Server instances, enable you to process large external data sets in a parallel processing architecture. Data loading and query performance can increase linearly as you add more SQL Server instances to the group.
2524

25+
[!INCLUDE[polybase-scaleout-banner-retirement](../../includes/polybase-scaleout-banner-retirement.md)]
26+
2627
See [Get started with PolyBase](./polybase-guide.md) and [PolyBase Guide](../../relational-databases/polybase/polybase-guide.md).
2728

2829
![Diagram showing PolyBase scale-out groups.](../../relational-databases/polybase/media/polybase-scale-out-groups.png "PolyBase scale-out groups")
@@ -39,10 +40,10 @@ A compute node contains the SQL Server instance that assists with scale-out quer
3940

4041
When querying external SQL Server, Oracle or Teradata instances, partitioned tables will benefit from scale-out reads. Each node in a PolyBase scale-out group can spin up to 8 readers to read external data. And each reader is assigned one partition to read in the external table.
4142

42-
For e.g., let's say you have an external SQL Server table with 12 monthly partitions and a 3-node PolyBase scale-out group, each node will use 4 PolyBase readers to process each of the 12 partitions. This is illustrated in the image below.
43+
For example, say you have an external SQL Server table with 12 monthly partitions and a 3-node PolyBase scale-out group, each node will use 4 PolyBase readers to process each of the 12 partitions. This is illustrated in the following image.
4344

4445
> [!NOTE]
45-
> that this is different from scale-out reads over Hadoop.
46+
> This is different from scale-out reads over Hadoop.
4647
4748
![PolyBase scale-out reads](../../relational-databases/polybase/media/polybase-scale-out-groups2.png "PolyBase scale-out groups")
4849

0 commit comments

Comments
 (0)