You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/big-data-cluster/big-data-cluster-overview.md
+4-6Lines changed: 4 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,9 +17,7 @@ ms.technology: big-data-cluster
17
17
18
18
Starting with [!INCLUDE[SQL Server 2019](../includes/sssqlv15-md.md)], [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)] allow you to deploy scalable clusters of SQL Server, Spark, and HDFS containers running on Kubernetes. These components are running side by side to enable you to read, write, and process big data from Transact-SQL or Spark, allowing you to easily combine and analyze your high-value relational data with high-volume big data.
19
19
20
-
[!INCLUDE[SQL Server 2019](../includes/sssqlv15-md.md)] introduces SQL Server Big Data Clusters.
21
-
22
-
Use SQL Server Big Data Clusters to:
20
+
Use [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)] to:
23
21
24
22
-[Deploy scalable clusters](./deploy-get-started.md) of SQL Server, Spark, and HDFS containers running on Kubernetes.
25
23
- Read, write, and process big data from Transact-SQL or Spark.
@@ -40,7 +38,7 @@ For more information about new features and known issues for latest release, see
40
38
41
39
### Data virtualization
42
40
43
-
By leveraging [SQL Server PolyBase](../relational-databases/polybase/polybase-guide.md), [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)] can query external data sources without moving or copying the data. [!INCLUDE[SQL Server 2019](../includes/sssqlv15-md.md)] introduces new connectors to data sources.
41
+
By leveraging [PolyBase](../relational-databases/polybase/polybase-guide.md), [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)] can query external data sources without moving or copying the data. [!INCLUDE[SQL Server 2019](../includes/sssqlv15-md.md)] introduces new connectors to data sources.
@@ -116,4 +114,4 @@ The storage pool consists of storage pool pods comprised of SQL Server on Linux,
116
114
117
115
## Next steps
118
116
119
-
For more information about deploying SQL Server Big Data Clusters, see [Get started with SQL Server Big Data Clusters](deploy-get-started.md).
117
+
For more information about deploying [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)], see [Get started with [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)]](deploy-get-started.md).
Copy file name to clipboardExpand all lines: docs/big-data-cluster/concept-application-deployment.md
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,9 +12,11 @@ ms.prod: sql
12
12
ms.technology: big-data-cluster
13
13
---
14
14
15
-
# What is application deployment on a Big Data Cluster?
15
+
# What is application deployment on a SQL Server big data cluster?
16
16
17
-
Application deployment enables the deployment of applications on the big data cluster by providing interfaces to create, manage, and run applications. Applications deployed on the big data cluster benefit from the computational power of the cluster and can access the data that is available on the cluster. This increases scalability and performance of the applications, while managing the applications where the data lives. The supported application runtimes on SQL Server Big Data Clusters are R, Python, SSIS, MLeap.
17
+
[!INCLUDE[SQL Server 2019](../includes/applies-to-version/sqlserver2019.md)]
18
+
19
+
Application deployment enables the deployment of applications on a SQL Server big data cluster by providing interfaces to create, manage, and run applications. Applications deployed on a SQL Server big data cluster benefit from the computational power of the cluster and can access the data that is available on the cluster. This increases scalability and performance of the applications, while managing the applications where the data lives. The supported application runtimes on [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)] are R, Python, SSIS, MLeap.
18
20
19
21
The following sections describe the architecture and functionality of application deployment.
Copy file name to clipboardExpand all lines: docs/big-data-cluster/concept-compute-pool.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,11 +11,11 @@ ms.prod: sql
11
11
ms.technology: big-data-cluster
12
12
---
13
13
14
-
# What are compute pools SQL Server Big Data Clusters?
14
+
# What are compute pools in a SQL Server big data cluster?
15
15
16
16
[!INCLUDE[SQL Server 2019](../includes/applies-to-version/sqlserver2019.md)]
17
17
18
-
This article describes the role of *SQL Server compute pools* in SQL Server Big Data Clusters. Compute pools provide scale-out computational resources for a Big Data Cluster. They are used to offload computational work, or intermediate result sets, from the SQL Server master instance. The following sections describe the architecture, functionality and usage scenarios of a compute pool.
18
+
This article describes the role of *SQL Server compute pools* in a SQL Server big data cluster. Compute pools provide scale-out computational resources for a SQL Server big data cluster. They are used to offload computational work, or intermediate result sets, from the SQL Server master instance. The following sections describe the architecture, functionality and usage scenarios of a compute pool.
19
19
20
20
You can also watch this 5-minute video for an introduction into compute pools:
21
21
@@ -29,19 +29,19 @@ A compute pool is made of one or more compute pods running in Kubernetes. The au
29
29
30
30
## Scale-out groups
31
31
32
-
A compute pool can act as a PolyBase scale-out group for distributed queries over different external data sources such as SQL Server, Oracle, MongoDB, Teradata and HDFS. By using compute pods in Kubernetes, Big Data Clusters can automate creating and configuring compute pods for PolyBase scale-out groups.
32
+
A compute pool can act as a PolyBase scale-out group for distributed queries over different external data sources such as SQL Server, Oracle, MongoDB, Teradata and HDFS. By using compute pods in Kubernetes, a SQL Server big data cluster can automate creating and configuring compute pods for PolyBase scale-out groups.
33
33
34
34
## Compute pool scenarios
35
35
36
36
Scenarios where the compute pool is used include:
37
37
38
-
- When queries submitted to the master instance use one or more tables located in the [Storage Pool](concept-storage-pool.md).
38
+
- When queries submitted to the master instance use one or more tables located in the [storage pool](concept-storage-pool.md).
39
39
40
-
- When queries submitted to the master instance use one or more tables with round-robin distribution located in the [Data Pool](concept-data-pool.md).
40
+
- When queries submitted to the master instance use one or more tables with round-robin distribution located in the [data pool](concept-data-pool.md).
41
41
42
42
- When queries submitted to the master instance use **partitioned** tables with external data sources of SQL Server, Oracle, MongoDB, and Teradata. For this scenario, the query hint OPTION (FORCE SCALEOUTEXECUTION) must be enabled.
43
43
44
-
- When queries submitted to the master instance use one or more tables located in [HDFS Tiering](hdfs-tiering.md).
44
+
- When queries submitted to the master instance use one or more tables located in [HDFS tiering](hdfs-tiering.md).
45
45
46
46
Scenarios where the compute pool is **not** used include:
Copy file name to clipboardExpand all lines: docs/big-data-cluster/concept-controller.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ ms.technology: big-data-cluster
15
15
16
16
[!INCLUDE[SQL Server 2019](../includes/applies-to-version/sqlserver2019.md)]
17
17
18
-
The controller hosts the core logic for deploying and managing a big data cluster. It takes care of all interactions with Kubernetes, SQL Server instances that are part of the cluster and other components like HDFS and Spark.
18
+
The controller hosts the core logic for deploying and managing a SQL Server big data cluster. It takes care of all interactions with Kubernetes, SQL Server instances that are part of the cluster and other components like HDFS and Spark.
19
19
20
20
The controller service provides the following core functionality:
[!INCLUDE[SQL Server 2019](../includes/applies-to-version/sqlserver2019.md)]
17
17
18
-
This article describes the role of *SQL Server data pools* in a [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ver15.md)]. The following sections describe the architecture, functionality, and usage scenarios of a SQL data pool.
18
+
This article describes the role of *SQL Server data pools* in a SQL Server bigdata cluster. The following sections describe the architecture, functionality, and usage scenarios of a data pool.
19
19
20
20
This 5-minute video introduces data pools and shows you how to query data from data pools:
A data pool consists of one or more SQL Server data pool instances that provide persistent SQL Server storage for the cluster. It allows for performance querying of cached data against external data sources and offloading of work. Data is ingested into the data pool using either T-SQL queries or from Spark jobs. In order to enhanced performance across large data sets, the ingested data is distributed into shards and stored across all SQL Server instances in the pool. Supported distributions methods are round robin and replicated. For read access optimization, a clustered columnstore index is created on each table in each data pool instance. A data pool serves as the scale-out data mart for SQL Big Data Clusters.
26
+
A data pool consists of one or more SQL Server data pool instances that provide persistent SQL Server storage for the cluster. It allows for performance querying of cached data against external data sources and offloading of work. Data is ingested into the data pool using either T-SQL queries or from Spark jobs. In order to enhanced performance across large data sets, the ingested data is distributed into shards and stored across all SQL Server instances in the pool. Supported distributions methods are round robin and replicated. For read access optimization, a clustered columnstore index is created on each table in each data pool instance. A data pool serves as the scale-out data mart for [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)].
27
27
28
28

29
29
30
-
Access to the SQL server instances in the data pool is managed from the SQL Server Master instance. An external data source to the data pool is created, along with the PolyBase external tables to store the data cache. In the background, the controller creates a database in the data pool with tables that match the external tables. From the SQL Server Master instance the workflow is transparent; the controller redirects the specific external table requests to the SQL Server instances in the data pool, which may be through the Compute pool, executes queries and returns the result set. Data in the data pool can only be ingested or queried and cannot be modified. Any data refreshes would therefore require a drop of the table, followed by table recreation and subsequent data repopulation.
30
+
Access to the SQL server instances in the data pool is managed from the SQL Server master instance. An external data source to the data pool is created, along with the PolyBase external tables to store the data cache. In the background, the controller creates a database in the data pool with tables that match the external tables. From the SQL Server master instance the workflow is transparent; the controller redirects the specific external table requests to the SQL Server instances in the data pool, which may be through the compute pool, executes queries and returns the result set. Data in the data pool can only be ingested or queried and cannot be modified. Any data refreshes would therefore require a drop of the table, followed by table recreation and subsequent data repopulation.
Copy file name to clipboardExpand all lines: docs/big-data-cluster/concept-master-instance.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ ms.technology: big-data-cluster
15
15
16
16
[!INCLUDE[SQL Server 2019](../includes/applies-to-version/sqlserver2019.md)]
17
17
18
-
This article describes the role of the *SQL Server master instance* in a big data cluster for SQL Server 2019. The master instance is a SQL Server instance running in a big data cluster to manage connectivity, scale-out queries, metadata and user databases, and machine learning services.
18
+
This article describes the role of the *SQL Server master instance* in a SQL Server big data cluster. The master instance is a SQL Server instance running in a SQL Server big data cluster to manage connectivity, scale-out queries, metadata and user databases, and machine learning services.
19
19
20
20
The SQL Server master instance provides the following functionality:
21
21
@@ -31,8 +31,8 @@ The SQL Server master instance contains the scale-out query engine that is used
31
31
32
32
In addition to the standard SQL Server system databases, the SQL master instance also contains the following:
33
33
34
-
- A metadata database that holds HDFS-table metadata
35
-
- A data plane shard map
34
+
- A metadata database that holds HDFS-table metadata.
35
+
- A data plane shard map.
36
36
- Details of external tables that provide access to the cluster data plane.
37
37
- PolyBase external data sources and external tables defined in user databases.
38
38
@@ -46,9 +46,9 @@ As part of a SQL Server big data cluster, machine learning services will be avai
46
46
47
47
### Advantages of machine learning services in a big data cluster
48
48
49
-
SQL Server 2019 makes it easy for big data to be joined to the dimensional data typically stored in the enterprise database. The value of the big data greatly increases when it is not just in the hands of parts of an organization, but is also included in reports, dashboards, and applications. At the same time, data scientists can continue to use the Spark/HDFS ecosystem tools and have easy, real time access to the data in the SQL Server master instance and in external data sources accessible _through_ the SQL Server master instance.
49
+
[!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)] makes it easy for big data to be joined to the dimensional data typically stored in the enterprise database. The value of the big data greatly increases when it is not just in the hands of parts of an organization, but is also included in reports, dashboards, and applications. At the same time, data scientists can continue to use the Spark/HDFS ecosystem tools and have easy, real time access to the data in the SQL Server master instance and in external data sources accessible _through_ the SQL Server master instance.
50
50
51
-
With [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ver15.md)], you can do more with your enterprise data lakes. SQL Server developers and analysts can:
51
+
With [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)], you can do more with your enterprise data lakes. SQL Server developers and analysts can:
52
52
53
53
* Build applications consuming data from enterprise data lakes.
Copy file name to clipboardExpand all lines: docs/big-data-cluster/concept-storage-pool.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,23 +11,23 @@ ms.prod: sql
11
11
ms.technology: big-data-cluster
12
12
---
13
13
14
-
# What is the storage pool ([!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ss-nover.md)])?
14
+
# What is the storage pool in a SQL Server bigdata cluster?
15
15
16
16
[!INCLUDE[SQL Server 2019](../includes/applies-to-version/sqlserver2019.md)]
17
17
18
-
This article describes the role of the *SQL Server storage pool* in a [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ver15.md)] (BDC). The following sections describe the architecture and functionality of a SQL storage pool.
18
+
This article describes the role of the *SQL Server storage pool* in a SQL Server bigdata cluster. The following sections describe the architecture and functionality of a storage pool.
19
19
20
20
## Storage pool architecture
21
21
22
-
The storage pool is the local HDFS (Hadoop) cluster in the SQL Server BDC ecosystem. It provides persistent storage for unstructured and semi-structured data. Data files, such as Parquet or delimited text, can be stored in the storage pool. To persist storage each pod in the pool has a Persistent Volume attached to it. The storage pool files are accessible via [PolyBase](../relational-databases/polybase/polybase-guide.md) through SQL Server or directly using an Apache Knox Gateway.
22
+
The storage pool is the local HDFS (Hadoop) cluster in a SQL Server big data cluster. It provides persistent storage for unstructured and semi-structured data. Data files, such as Parquet or delimited text, can be stored in the storage pool. To persist storage each pod in the pool has a persistent volume attached to it. The storage pool files are accessible via [PolyBase](../relational-databases/polybase/polybase-guide.md) through SQL Server or directly using an Apache Knox Gateway.
23
23
24
-
A classical HDFS setup consists of a set of commodity-hardware computers with storage attached. The data is spread in blocks across the nodes for fault tolerance and leveraging of parallel processing. One of the nodes in the cluster functions as the Name Node and contains the metadata information about the files located in the data nodes.
24
+
A classical HDFS setup consists of a set of commodity-hardware computers with storage attached. The data is spread in blocks across the nodes for fault tolerance and leveraging of parallel processing. One of the nodes in the cluster functions as the name node and contains the metadata information about the files located in the data nodes.
The storage pool consists of storage nodes that are members of a HDFS cluster. It runs one or more Kubernetes pods with each pod hosting the following containers:
29
29
30
-
- A Hadoop container linked to a Persistent Volume (storage). All containers of this type together form the Hadoop cluster. Within the Hadoop container is a YARN node manager process that can create on-demand Apache Spark worker processes. The Spark head node hosts the hive metastore, spark history, and YARN job history containers.
30
+
- A Hadoop container linked to a persistent volume (storage). All containers of this type together form the Hadoop cluster. Within the Hadoop container is a YARN node manager process that can create on-demand Apache Spark worker processes. The Spark head node hosts the hive metastore, spark history, and YARN job history containers.
31
31
- A SQL Server instance to read data from HDFS using OpenRowSet technology.
0 commit comments