Skip to content

Commit 68cd9ae

Browse files
authored
Merge pull request #17719 from Peter-Msft/bdc-product-name-standardization
Bdc product name standardization
2 parents 2ecf5d6 + 45207c8 commit 68cd9ae

7 files changed

Lines changed: 28 additions & 28 deletions

docs/big-data-cluster/big-data-cluster-overview.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,7 @@ ms.technology: big-data-cluster
1717

1818
Starting with [!INCLUDE[SQL Server 2019](../includes/sssqlv15-md.md)], [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)] allow you to deploy scalable clusters of SQL Server, Spark, and HDFS containers running on Kubernetes. These components are running side by side to enable you to read, write, and process big data from Transact-SQL or Spark, allowing you to easily combine and analyze your high-value relational data with high-volume big data.
1919

20-
[!INCLUDE[SQL Server 2019](../includes/sssqlv15-md.md)] introduces SQL Server Big Data Clusters.
21-
22-
Use SQL Server Big Data Clusters to:
20+
Use [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)] to:
2321

2422
- [Deploy scalable clusters](./deploy-get-started.md) of SQL Server, Spark, and HDFS containers running on Kubernetes.
2523
- Read, write, and process big data from Transact-SQL or Spark.
@@ -40,7 +38,7 @@ For more information about new features and known issues for latest release, see
4038

4139
### Data virtualization
4240

43-
By leveraging [SQL Server PolyBase](../relational-databases/polybase/polybase-guide.md), [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)] can query external data sources without moving or copying the data. [!INCLUDE[SQL Server 2019](../includes/sssqlv15-md.md)] introduces new connectors to data sources.
41+
By leveraging [PolyBase](../relational-databases/polybase/polybase-guide.md), [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)] can query external data sources without moving or copying the data. [!INCLUDE[SQL Server 2019](../includes/sssqlv15-md.md)] introduces new connectors to data sources.
4442

4543
![Data virtualization](media/big-data-cluster-overview/data-virtualization.png)
4644

@@ -91,7 +89,7 @@ In [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)]
9189

9290
### Big data clusters architecture
9391

94-
The following diagram shows the components of a big data cluster for SQL Server.
92+
The following diagram shows the components of a SQL Server big data cluster:
9593

9694
![Architecture overview](media/big-data-cluster-overview/architecture-diagram-overview.png)
9795

@@ -116,4 +114,4 @@ The storage pool consists of storage pool pods comprised of SQL Server on Linux,
116114
117115
## Next steps
118116

119-
For more information about deploying SQL Server Big Data Clusters, see [Get started with SQL Server Big Data Clusters](deploy-get-started.md).
117+
For more information about deploying [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)], see [Get started with [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)]](deploy-get-started.md).

docs/big-data-cluster/concept-application-deployment.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,11 @@ ms.prod: sql
1212
ms.technology: big-data-cluster
1313
---
1414

15-
# What is application deployment on a Big Data Cluster?
15+
# What is application deployment on a SQL Server big data cluster?
1616

17-
Application deployment enables the deployment of applications on the big data cluster by providing interfaces to create, manage, and run applications. Applications deployed on the big data cluster benefit from the computational power of the cluster and can access the data that is available on the cluster. This increases scalability and performance of the applications, while managing the applications where the data lives. The supported application runtimes on SQL Server Big Data Clusters are R, Python, SSIS, MLeap.
17+
[!INCLUDE[SQL Server 2019](../includes/applies-to-version/sqlserver2019.md)]
18+
19+
Application deployment enables the deployment of applications on a SQL Server big data cluster by providing interfaces to create, manage, and run applications. Applications deployed on a SQL Server big data cluster benefit from the computational power of the cluster and can access the data that is available on the cluster. This increases scalability and performance of the applications, while managing the applications where the data lives. The supported application runtimes on [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)] are R, Python, SSIS, MLeap.
1820

1921
The following sections describe the architecture and functionality of application deployment.
2022

docs/big-data-cluster/concept-compute-pool.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,11 @@ ms.prod: sql
1111
ms.technology: big-data-cluster
1212
---
1313

14-
# What are compute pools SQL Server Big Data Clusters?
14+
# What are compute pools in a SQL Server big data cluster?
1515

1616
[!INCLUDE[SQL Server 2019](../includes/applies-to-version/sqlserver2019.md)]
1717

18-
This article describes the role of *SQL Server compute pools* in SQL Server Big Data Clusters. Compute pools provide scale-out computational resources for a Big Data Cluster. They are used to offload computational work, or intermediate result sets, from the SQL Server master instance. The following sections describe the architecture, functionality and usage scenarios of a compute pool.
18+
This article describes the role of *SQL Server compute pools* in a SQL Server big data cluster. Compute pools provide scale-out computational resources for a SQL Server big data cluster. They are used to offload computational work, or intermediate result sets, from the SQL Server master instance. The following sections describe the architecture, functionality and usage scenarios of a compute pool.
1919

2020
You can also watch this 5-minute video for an introduction into compute pools:
2121

@@ -29,19 +29,19 @@ A compute pool is made of one or more compute pods running in Kubernetes. The au
2929

3030
## Scale-out groups
3131

32-
A compute pool can act as a PolyBase scale-out group for distributed queries over different external data sources such as SQL Server, Oracle, MongoDB, Teradata and HDFS. By using compute pods in Kubernetes, Big Data Clusters can automate creating and configuring compute pods for PolyBase scale-out groups.
32+
A compute pool can act as a PolyBase scale-out group for distributed queries over different external data sources such as SQL Server, Oracle, MongoDB, Teradata and HDFS. By using compute pods in Kubernetes, a SQL Server big data cluster can automate creating and configuring compute pods for PolyBase scale-out groups.
3333

3434
## Compute pool scenarios
3535

3636
Scenarios where the compute pool is used include:
3737

38-
- When queries submitted to the master instance use one or more tables located in the [Storage Pool](concept-storage-pool.md).
38+
- When queries submitted to the master instance use one or more tables located in the [storage pool](concept-storage-pool.md).
3939

40-
- When queries submitted to the master instance use one or more tables with round-robin distribution located in the [Data Pool](concept-data-pool.md).
40+
- When queries submitted to the master instance use one or more tables with round-robin distribution located in the [data pool](concept-data-pool.md).
4141

4242
- When queries submitted to the master instance use **partitioned** tables with external data sources of SQL Server, Oracle, MongoDB, and Teradata. For this scenario, the query hint OPTION (FORCE SCALEOUTEXECUTION) must be enabled.
4343

44-
- When queries submitted to the master instance use one or more tables located in [HDFS Tiering](hdfs-tiering.md).
44+
- When queries submitted to the master instance use one or more tables located in [HDFS tiering](hdfs-tiering.md).
4545

4646
Scenarios where the compute pool is **not** used include:
4747

docs/big-data-cluster/concept-controller.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ ms.technology: big-data-cluster
1515

1616
[!INCLUDE[SQL Server 2019](../includes/applies-to-version/sqlserver2019.md)]
1717

18-
The controller hosts the core logic for deploying and managing a big data cluster. It takes care of all interactions with Kubernetes, SQL Server instances that are part of the cluster and other components like HDFS and Spark.
18+
The controller hosts the core logic for deploying and managing a SQL Server big data cluster. It takes care of all interactions with Kubernetes, SQL Server instances that are part of the cluster and other components like HDFS and Spark.
1919

2020
The controller service provides the following core functionality:
2121

docs/big-data-cluster/concept-data-pool.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,19 +15,19 @@ ms.technology: big-data-cluster
1515

1616
[!INCLUDE[SQL Server 2019](../includes/applies-to-version/sqlserver2019.md)]
1717

18-
This article describes the role of *SQL Server data pools* in a [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ver15.md)]. The following sections describe the architecture, functionality, and usage scenarios of a SQL data pool.
18+
This article describes the role of *SQL Server data pools* in a SQL Server big data cluster. The following sections describe the architecture, functionality, and usage scenarios of a data pool.
1919

2020
This 5-minute video introduces data pools and shows you how to query data from data pools:
2121

2222
> [!VIDEO https://channel9.msdn.com/Shows/Data-Exposed/Querying-Data-from-Big-Data-Cluster-Data-Pool/player?WT.mc_id=dataexposed-c9-niner]
2323
2424
## Data pool architecture
2525

26-
A data pool consists of one or more SQL Server data pool instances that provide persistent SQL Server storage for the cluster. It allows for performance querying of cached data against external data sources and offloading of work. Data is ingested into the data pool using either T-SQL queries or from Spark jobs. In order to enhanced performance across large data sets, the ingested data is distributed into shards and stored across all SQL Server instances in the pool. Supported distributions methods are round robin and replicated. For read access optimization, a clustered columnstore index is created on each table in each data pool instance. A data pool serves as the scale-out data mart for SQL Big Data Clusters.
26+
A data pool consists of one or more SQL Server data pool instances that provide persistent SQL Server storage for the cluster. It allows for performance querying of cached data against external data sources and offloading of work. Data is ingested into the data pool using either T-SQL queries or from Spark jobs. In order to enhanced performance across large data sets, the ingested data is distributed into shards and stored across all SQL Server instances in the pool. Supported distributions methods are round robin and replicated. For read access optimization, a clustered columnstore index is created on each table in each data pool instance. A data pool serves as the scale-out data mart for [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)].
2727

2828
![Scale-out data mart](media/concept-data-pool/data-virtualization-improvements.png)
2929

30-
Access to the SQL server instances in the data pool is managed from the SQL Server Master instance. An external data source to the data pool is created, along with the PolyBase external tables to store the data cache. In the background, the controller creates a database in the data pool with tables that match the external tables. From the SQL Server Master instance the workflow is transparent; the controller redirects the specific external table requests to the SQL Server instances in the data pool, which may be through the Compute pool, executes queries and returns the result set. Data in the data pool can only be ingested or queried and cannot be modified. Any data refreshes would therefore require a drop of the table, followed by table recreation and subsequent data repopulation.
30+
Access to the SQL server instances in the data pool is managed from the SQL Server master instance. An external data source to the data pool is created, along with the PolyBase external tables to store the data cache. In the background, the controller creates a database in the data pool with tables that match the external tables. From the SQL Server master instance the workflow is transparent; the controller redirects the specific external table requests to the SQL Server instances in the data pool, which may be through the compute pool, executes queries and returns the result set. Data in the data pool can only be ingested or queried and cannot be modified. Any data refreshes would therefore require a drop of the table, followed by table recreation and subsequent data repopulation.
3131

3232
## Data pool scenarios
3333

docs/big-data-cluster/concept-master-instance.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ ms.technology: big-data-cluster
1515

1616
[!INCLUDE[SQL Server 2019](../includes/applies-to-version/sqlserver2019.md)]
1717

18-
This article describes the role of the *SQL Server master instance* in a big data cluster for SQL Server 2019. The master instance is a SQL Server instance running in a big data cluster to manage connectivity, scale-out queries, metadata and user databases, and machine learning services.
18+
This article describes the role of the *SQL Server master instance* in a SQL Server big data cluster. The master instance is a SQL Server instance running in a SQL Server big data cluster to manage connectivity, scale-out queries, metadata and user databases, and machine learning services.
1919

2020
The SQL Server master instance provides the following functionality:
2121

@@ -31,8 +31,8 @@ The SQL Server master instance contains the scale-out query engine that is used
3131

3232
In addition to the standard SQL Server system databases, the SQL master instance also contains the following:
3333

34-
- A metadata database that holds HDFS-table metadata
35-
- A data plane shard map
34+
- A metadata database that holds HDFS-table metadata.
35+
- A data plane shard map.
3636
- Details of external tables that provide access to the cluster data plane.
3737
- PolyBase external data sources and external tables defined in user databases.
3838

@@ -46,9 +46,9 @@ As part of a SQL Server big data cluster, machine learning services will be avai
4646

4747
### Advantages of machine learning services in a big data cluster
4848

49-
SQL Server 2019 makes it easy for big data to be joined to the dimensional data typically stored in the enterprise database. The value of the big data greatly increases when it is not just in the hands of parts of an organization, but is also included in reports, dashboards, and applications. At the same time, data scientists can continue to use the Spark/HDFS ecosystem tools and have easy, real time access to the data in the SQL Server master instance and in external data sources accessible _through_ the SQL Server master instance.
49+
[!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)] makes it easy for big data to be joined to the dimensional data typically stored in the enterprise database. The value of the big data greatly increases when it is not just in the hands of parts of an organization, but is also included in reports, dashboards, and applications. At the same time, data scientists can continue to use the Spark/HDFS ecosystem tools and have easy, real time access to the data in the SQL Server master instance and in external data sources accessible _through_ the SQL Server master instance.
5050

51-
With [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ver15.md)], you can do more with your enterprise data lakes. SQL Server developers and analysts can:
51+
With [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)], you can do more with your enterprise data lakes. SQL Server developers and analysts can:
5252

5353
* Build applications consuming data from enterprise data lakes.
5454
* Reason over all data with Transact-SQL queries.

docs/big-data-cluster/concept-storage-pool.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,23 +11,23 @@ ms.prod: sql
1111
ms.technology: big-data-cluster
1212
---
1313

14-
# What is the storage pool ([!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)])?
14+
# What is the storage pool in a SQL Server big data cluster?
1515

1616
[!INCLUDE[SQL Server 2019](../includes/applies-to-version/sqlserver2019.md)]
1717

18-
This article describes the role of the *SQL Server storage pool* in a [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ver15.md)] (BDC). The following sections describe the architecture and functionality of a SQL storage pool.
18+
This article describes the role of the *SQL Server storage pool* in a SQL Server big data cluster. The following sections describe the architecture and functionality of a storage pool.
1919

2020
## Storage pool architecture
2121

22-
The storage pool is the local HDFS (Hadoop) cluster in the SQL Server BDC ecosystem. It provides persistent storage for unstructured and semi-structured data. Data files, such as Parquet or delimited text, can be stored in the storage pool. To persist storage each pod in the pool has a Persistent Volume attached to it. The storage pool files are accessible via [PolyBase](../relational-databases/polybase/polybase-guide.md) through SQL Server or directly using an Apache Knox Gateway.
22+
The storage pool is the local HDFS (Hadoop) cluster in a SQL Server big data cluster. It provides persistent storage for unstructured and semi-structured data. Data files, such as Parquet or delimited text, can be stored in the storage pool. To persist storage each pod in the pool has a persistent volume attached to it. The storage pool files are accessible via [PolyBase](../relational-databases/polybase/polybase-guide.md) through SQL Server or directly using an Apache Knox Gateway.
2323

24-
A classical HDFS setup consists of a set of commodity-hardware computers with storage attached. The data is spread in blocks across the nodes for fault tolerance and leveraging of parallel processing. One of the nodes in the cluster functions as the Name Node and contains the metadata information about the files located in the data nodes.
24+
A classical HDFS setup consists of a set of commodity-hardware computers with storage attached. The data is spread in blocks across the nodes for fault tolerance and leveraging of parallel processing. One of the nodes in the cluster functions as the name node and contains the metadata information about the files located in the data nodes.
2525

2626
![Classic HDFS setup](media/concept-storage-pool/classic-hdfs-setup.png)
2727

2828
The storage pool consists of storage nodes that are members of a HDFS cluster. It runs one or more Kubernetes pods with each pod hosting the following containers:
2929

30-
- A Hadoop container linked to a Persistent Volume (storage). All containers of this type together form the Hadoop cluster. Within the Hadoop container is a YARN node manager process that can create on-demand Apache Spark worker processes. The Spark head node hosts the hive metastore, spark history, and YARN job history containers.
30+
- A Hadoop container linked to a persistent volume (storage). All containers of this type together form the Hadoop cluster. Within the Hadoop container is a YARN node manager process that can create on-demand Apache Spark worker processes. The Spark head node hosts the hive metastore, spark history, and YARN job history containers.
3131
- A SQL Server instance to read data from HDFS using OpenRowSet technology.
3232
- `collectd` for collecting of metrics data.
3333
- `fluentbit` for collecting of log data.

0 commit comments

Comments
 (0)