Merge pull request #21200 from WilliamDAssafMSFT/release-bdc-2022

hmacgregor1 · web-flow · commit d3299b25b3a1 · 2022-02-17T21:24:01.000-08:00
20220211 bdc adjacent feature retirement
diff --git a/docs/big-data-cluster/big-data-options.md b/docs/big-data-cluster/big-data-options.md
@@ -5,7 +5,7 @@ description: This article discusses migration strategies for SQL Server 2019 Big
 author: WilliamDAssafMSFT
 ms.author: wiassaf
 ms.reviewer: dacoelho
-ms.date: 02/15/2022
+ms.date: 02/22/2022
 ms.topic: conceptual
 ms.prod: sql
 ms.technology: big-data-cluster
@@ -23,6 +23,21 @@ The [!INCLUDE[ssbigdataclusters-ver15](../includes/ssbigdataclusters-ver15.md)]
 
 On January 31, 2025, we will be retiring [!INCLUDE[ssbigdataclusters-ver15](../includes/ssbigdataclusters-ver15.md)]. All existing users of SQL Server 2019 with Software Assurance will be fully supported on the platform and the software will continue to be maintained through SQL Server cumulative updates until that time. **For more information, see [the announcement blog post](https://aka.ms/sqlserver_bigdataclusters).**
 
+## Changes to PolyBase support in SQL Server 
+
+Related to the [!INCLUDE[ssbigdataclusters-ver15](../includes/ssbigdataclusters-ver15.md)] retirement are some features related to scale out queries.
+
+The PolyBase scale-out groups feature of Microsoft SQL Server has been retired. Scale-out group functionality will be removed from the product in SQL Server 2022. In-market SQL Server 2019, 2017 and 2016 will continue to support the functionality to the end of life of those products. PolyBase data virtualization will continue to be fully supported as a scale-up feature in SQL Server. 
+
+Cloudera (CDP) and Hortonworks (HDP) external data sources will also be retired for all in-market versions of SQL Server and will not be included in SQL Server 2022. Moving forward, support for external data sources will be limited to product versions in mainstream support by the respective vendor. You are advised encouraged to use the new object storage integration available in SQL Server 2022. Integration with HDFS will also be added to SQL Server 2022 in a future CTP using a new webhdfs connector. 
+
+Connectivity to HDFS and object storage will now use publicly documented REST APIs instead of a JAVA Hadoop client. In SQL Server 2022, users will need to configure their external data sources to use new connectors when connecting to Azure Storage. The table below summarizes the change: 
+
+| External Data Source | From | To |
+|:--|:--|:--|
+| Azure Blob Storage | wasb[s] | abs |
+| ADLS Gen 2 | abfs[s] | adls |
+
 ## Understanding the Big Data Clusters architecture for replacement and migration options
 
 To create your replacement solution for a Big Data storage and processing system, it's important to understand what [!INCLUDE[ssbigdataclusters-ver15](../includes/ssbigdataclusters-ver15.md)] provides, and its architecture can help inform your choices. The architecture of a big data cluster is as follows:
@@ -37,7 +52,7 @@ This architecture provides the following functionality mapping:
 |Big Data Clusters Controller | Provides management and security for the cluster. It contains the control service, the configuration store, and other cluster-level services such as Kibana, Grafana, and Elastic Search. |
 |Compute Pool | Provides computational resources to the cluster. It contains nodes running SQL Server on Linux pods. The pods in the compute pool are divided into SQL Compute instances for specific processing tasks. This component also provides Data Virtualization using PolyBase to query external data sources without moving or copying the data.|
 |Data Pool | Provides data persistence for the cluster. The data pool consists of one or more pods running SQL Server on Linux. It is used to ingest data from SQL queries or Spark jobs.|
-|Storage Pool | The storage pool consists of storage pool pods comprised of SQL Server on Linux, Spark, and HDFS. All the storage nodes in a SQL Server big data cluster are members of an HDFS cluster.|
+|Storage Pool | The storage pool consists of storage pool pods comprised of SQL Server on Linux, Spark, and HDFS. All the storage nodes in a big data cluster are members of an HDFS cluster.|
 | App Pool | Enables the deployment of applications on a big data cluster by providing interfaces to create, manage, and run applications.|
 |||
 
@@ -174,13 +189,13 @@ SQL Server 2022 (either on-premises, in-cloud, or both) contains a new feature t
 
 For your operational and even much of your analytic workloads, SQL Server can handle massive database sizes - for more information on maximum capacity specifications for SQL Server, see [Compute capacity limits by edition of SQL Server]()../sql-server/maximum-capacity-specifications-for-sql-server.md). Using multiple SQL Server Instances on separate machines with partitioned T-SQL requests allow a scale-out environment for applications. 
 
-Using PolyBase enables your SQL Server instance to query data with T-SQL directly from SQL Server, Oracle, Teradata, MongoDB, commercial Hadoop clusters, and Cosmos DB without separately installing client connection software. You can also use the generic ODBC connector on a Microsoft Windows-based Instance to connect to additional providers using third-party ODBC drivers. PolyBase allows T-SQL queries to join the data from external sources to relational tables in an instance of SQL Server. This allows the data to stay in its original location and format. You can virtualize the external data through the SQL Server instance, so that it can be queried in place like any other table in SQL Server. SQL Server 2022 also allows ad-hoc queries and backup/restore over Object-Store (using the S3-API) hardware or software storage options.
+Using PolyBase enables your SQL Server instance to query data with T-SQL directly from SQL Server, Oracle, Teradata, MongoDB, and Cosmos DB without separately installing client connection software. You can also use the generic ODBC connector on a Microsoft Windows-based Instance to connect to additional providers using third-party ODBC drivers. PolyBase allows T-SQL queries to join the data from external sources to relational tables in an instance of SQL Server. This allows the data to stay in its original location and format. You can virtualize the external data through the SQL Server instance, so that it can be queried in place like any other table in SQL Server. SQL Server 2022 also allows ad-hoc queries and backup/restore over Object-Store (using the S3-API) hardware or software storage options.
 
 Two general reference architectures are to use SQL Server on a stand-alone server for structured data queries and a separate installation of a scale-out non-relational system (such as Apache Hadoop or Apache Spark) for on-premises Link to Synapse, and the other option is to use a set of containers in a Kubernetes cluster with all components for your solution.
 
 ### Microsoft SQL Server on Windows, Apache Spark, and Object Storage On-Premises
 
-You can install SQL Server on Windows or Linux, and scale up the hardware architecture, leveraging the SQL Server 2022 object-storage query capability and the PolyBase Feature to enable queries across all data in your system.
+You can install SQL Server on Windows or Linux, and scale up the hardware architecture, leveraging the SQL Server 2022 object-storage query capability and the PolyBase feature to enable queries across all data in your system.
 
 Installing and configuring a scale-out platform such as Apache Hadoop or Apache Spark allows for querying non-relational data at scale. Using a central set of Object-Storage systems that support the S3-API allows both SQL Server 2022 and Spark to access the same set of data across all systems.
 
diff --git a/docs/big-data-cluster/index.yml b/docs/big-data-cluster/index.yml
@@ -1,7 +1,7 @@
 ### YamlMime:Landing
 
 title: Big Data Clusters
-summary: SQL Server Big Data Clusters is the multi-cloud, open data platform for analytics at any scale. Big Data Clusters unites SQL Server with Apache Spark to deliver the best compute engines available for analytics in a single, easy to use deployment. With these engines, Big Data Clusters is the ideal data platform for AI, ML, M/R, Streaming, BI, T-SQL, and Spark. Delivered as part of the SQL Server 2019 release, Big Data Clusters is a cloud-native solution orchestrated by Kubernetes. Our mission is to accelerate, delight, and empower our users as they quench their thirst for data driven insights.
+summary: SQL Server 2019 Big Data Clusters is the multi-cloud, open data platform for analytics at any scale. Big Data Clusters unites SQL Server with Apache Spark to deliver the best compute engines available for analytics in a single, easy to use deployment. With these engines, Big Data Clusters is the ideal data platform for AI, ML, M/R, Streaming, BI, T-SQL, and Spark. Delivered as part of the SQL Server 2019 release, Big Data Clusters is a cloud-native solution orchestrated by Kubernetes. Our mission is to accelerate, delight, and empower our users as they quench their thirst for data driven insights. The Microsoft SQL Server 2019 Big Data Clusters add-on will be retired. Support for SQL Server 2019 Big Data Clusters will end on January 14, 2025.
 
 metadata:
   title: Big Data Clusters - Learn how to manage, deploy, and use
@@ -11,7 +11,7 @@ metadata:
   ms.technology: big-data-cluster
   author: WilliamDAssafMSFT
   ms.author: wiassaf
-  ms.date: 09/07/2021
+  ms.date: 02/22/2022
   ms.prod: sql
 
 # linkListType: architecture | concept | deploy | download | get-started | how-to-guide | learn | overview | quickstart | reference | sample | tutorial | video | whats-new
@@ -22,12 +22,16 @@ landingContent:
   # Card (Get started)
   - title: About Big Data Clusters
     linkLists:
+      - linkListType: architecture
+        links:
+          - text: Big data options on the Microsoft SQL Server platform
+            url: big-data-options.md
       - linkListType: overview
         links:
           - text: Introducing Big Data Clusters
             url: big-data-cluster-overview.md
           - text: Big Data Clusters FAQ
-            url: big-data-cluster-faq.yml
+            url: big-data-cluster-faq.yml         
       - linkListType: whats-new
         links:
           - text: What's new
diff --git a/docs/includes/bdc-banner-retirement.md b/docs/includes/bdc-banner-retirement.md
@@ -4,7 +4,7 @@ ms.author: wiassaf
 ms.prod: sql
 ms.technology: big-data-cluster
 ms.topic: include
-ms.date: 01/26/2022
+ms.date: 02/22/2022
 ---
 
 > [!IMPORTANT]
diff --git a/docs/includes/polybase-java-connector-banner-retirement.md b/docs/includes/polybase-java-connector-banner-retirement.md
@@ -0,0 +1,10 @@
+---
+author: WilliamDAssafMSFT
+ms.author: wiassaf
+ms.prod: sql
+ms.technology: polybase
+ms.topic: include
+ms.date: 02/22/2022
+---
+
+SQL Server support for HDFS Cloudera (CDP) and Hortonworks (HDP) external data sources will be retired and will not be included in SQL Server 2022. For more information, see [Big data options on the Microsoft SQL Server platform](../big-data-cluster/big-data-options.md).
diff --git a/docs/includes/polybase-scaleout-banner-retirement.md b/docs/includes/polybase-scaleout-banner-retirement.md
@@ -0,0 +1,11 @@
+---
+author: WilliamDAssafMSFT
+ms.author: wiassaf
+ms.prod: sql
+ms.technology: polybase
+ms.topic: include
+ms.date: 02/22/2022
+---
+
+> [!IMPORTANT]
+> The Microsoft SQL Server PolyBase scale-out groups will be retired. Scale-out group functionality will be removed from the product in SQL Server 2022. PolyBase data virtualization will continue to be fully supported as a scale-up feature in SQL Server. For more information, see [Big data options on the Microsoft SQL Server platform](../big-data-cluster/big-data-options.md).
diff --git a/docs/relational-databases/polybase/configure-scale-out-groups-windows.md b/docs/relational-databases/polybase/configure-scale-out-groups-windows.md
@@ -1,7 +1,7 @@
 ---
 title: "Configure PolyBase scale-out groups on Windows"
 description: Set up a PolyBase scale-out group to create a cluster of SQL Server instances. This improves query performance for large data sets from external sources.
-ms.date: 08/05/2021
+ms.date: 02/22/2022
 ms.prod: sql
 ms.technology: polybase
 ms.topic: "tutorial"
@@ -16,6 +16,8 @@ monikerRange: ">= sql-server-2016"
 
 This article describes how to set up a [PolyBase scale-out group](polybase-scale-out-groups.md) on Windows. This creates a cluster of SQL Server instances to process large data sets from external data sources, such as Hadoop or Azure Blob Storage, in a scale-out fashion for better query performance.
 
+[!INCLUDE[polybase-scaleout-banner-retirement](../../includes/polybase-scaleout-banner-retirement.md)]
+
 ## Prerequisites
   
 - More than one machine in the same domain.  
diff --git a/docs/relational-databases/polybase/polybase-guide.md b/docs/relational-databases/polybase/polybase-guide.md
@@ -1,7 +1,7 @@
 ---
 title: "Introducing data virtualization with PolyBase"
 description: PolyBase enables your SQL Server instance to process Transact-SQL queries that read data from external data sources such as Hadoop and Azure blob storage.
-ms.date: 03/23/2021
+ms.date: 02/22/2022
 ms.prod: sql
 ms.technology: polybase
 ms.topic: "overview"
@@ -73,7 +73,7 @@ PolyBase provides these same functionalities for the following SQL products from
 - [MongoDB](polybase-configure-mongodb.md)
 - [Hadoop](polybase-configure-hadoop.md)*
 
-\* PolyBase supports two Hadoop providers, Hortonworks Data Platform (HDP) and Cloudera Distributed Hadoop (CDH).
+\* PolyBase supports two Hadoop providers, Hortonworks Data Platform (HDP) and Cloudera Distributed Hadoop (CDH), through SQL Server 2019. [!INCLUDE[polybase-java-connector-banner-retirement](../../includes/polybase-java-connector-banner-retirement.md)]
 
  To use PolyBase in an instance of [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)]:
 
@@ -117,7 +117,9 @@ PolyBase enables the following scenarios in [!INCLUDE[ssNoVersion](../../include
 
 - **Push computation to Hadoop.** PolyBase pushes some computations to the external source to optimize the overall query. The query optimizer makes a cost-based decision to push computation to Hadoop, if that will improve query performance.  The query optimizer uses statistics on external tables to make the cost-based decision. Pushing computation creates MapReduce jobs and leverages Hadoop's distributed computational resources. For more information, see [Pushdown computations in PolyBase](polybase-pushdown-computation.md). 
 
-- **Scale compute resources.** To improve query performance, you can use [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] [PolyBase scale-out groups](../../relational-databases/polybase/polybase-scale-out-groups.md). This enables parallel data transfer between [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] instances and Hadoop nodes, and it adds compute resources for operating on the external data.
+- **Scale compute resources.** To improve query performance, you can use [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] [PolyBase scale-out groups](../../relational-databases/polybase/polybase-scale-out-groups.md). This enables parallel data transfer between [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] instances and Hadoop nodes, and it adds compute resources for operating on the external data. 
+
+[!INCLUDE[polybase-scaleout-banner-retirement](../../includes/polybase-scaleout-banner-retirement.md)]
 
 ## Next steps
 
diff --git a/docs/relational-databases/polybase/polybase-scale-out-groups.md b/docs/relational-databases/polybase/polybase-scale-out-groups.md
@@ -1,7 +1,7 @@
 ---
 title: "PolyBase scale-out groups | Microsoft Docs"
 description: Use the PolyBase Group feature to create a cluster of SQL Server instances. This improves query performance for large data sets from external sources.
-ms.date: 04/23/2019
+ms.date: 02/22/2022
 ms.prod: sql
 ms.technology: polybase
 ms.topic: conceptual
@@ -11,9 +11,8 @@ helpviewer_keywords:
   - "PolyBase"
   - "PolyBase, scale-out groups"
   - "scale-out PolyBase"
-ms.assetid: c7810135-4d63-4161-93ab-0e75e9d10ab5
-author: MikeRayMSFT
-ms.author: mikeray
+author: WilliamDAssafMSFT
+ms.author: wiassaf
 ms.reviewer: ""
 monikerRange: ">= sql-server-2016"
 ---
@@ -23,6 +22,8 @@ monikerRange: ">= sql-server-2016"
 
 A standalone SQL Server instance with PolyBase can become a performance bottleneck when dealing with massive data sets in Hadoop or Azure Blob Storage. The PolyBase Group feature allows you to create a cluster of SQL Server instances to process large data sets from external data sources, such as Hadoop or Azure Blob Storage, in a scale-out fashion for better query performance. You can now scale your SQL Server compute to meet the performance demands of your workload. PolyBase Scale-out Groups, a group of SQL Server instances, enable you to process large external data sets in a parallel processing architecture. Data loading and query performance can increase linearly as you add more SQL Server instances to the group. 
   
+[!INCLUDE[polybase-scaleout-banner-retirement](../../includes/polybase-scaleout-banner-retirement.md)]
+
 See [Get started with PolyBase](./polybase-guide.md) and [PolyBase Guide](../../relational-databases/polybase/polybase-guide.md).
   
 ![Diagram showing PolyBase scale-out groups.](../../relational-databases/polybase/media/polybase-scale-out-groups.png "PolyBase scale-out groups")  
@@ -39,10 +40,10 @@ A compute node contains the SQL Server instance that assists with scale-out quer
 
 When querying external SQL Server, Oracle or Teradata instances, partitioned tables will benefit from scale-out reads. Each node in a PolyBase scale-out group can spin up to 8 readers to read external data. And each reader is assigned one partition to read in the external table. 
 
-For e.g., let's say you have an external SQL Server table with 12 monthly partitions and a 3-node PolyBase scale-out group, each node will use 4 PolyBase readers to process each of the 12 partitions. This is illustrated in the image below. 
+For example, say you have an external SQL Server table with 12 monthly partitions and a 3-node PolyBase scale-out group, each node will use 4 PolyBase readers to process each of the 12 partitions. This is illustrated in the following image. 
 
 > [!NOTE]
->  that this is different from scale-out reads over Hadoop. 
+> This is different from scale-out reads over Hadoop. 
 
 ![PolyBase scale-out reads](../../relational-databases/polybase/media/polybase-scale-out-groups2.png "PolyBase scale-out groups")
   
diff --git a/docs/t-sql/statements/create-external-data-source-transact-sql.md b/docs/t-sql/statements/create-external-data-source-transact-sql.md