Skip to content

Commit 081dc5a

Browse files
authored
Merge pull request #8444 from mcadenaroa/patch-2
Adding explanation note on how number of partitions (file splits)
2 parents 46c01d6 + 2a21353 commit 081dc5a

1 file changed

Lines changed: 13 additions & 14 deletions

File tree

docs/t-sql/statements/copy-into-transact-sql.md

Lines changed: 13 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Use the COPY statement in Azure Synapse Analytics for loading from
55
author: MikeRayMSFT
66
ms.author: mikeray
77
ms.reviewer: wiassaf
8-
ms.date: 01/04/2022
8+
ms.date: 01/17/2023
99
ms.service: sql
1010
ms.subservice: t-sql
1111
ms.topic: language-reference
@@ -96,8 +96,8 @@ When a column list is not specified, COPY will map columns based on the source a
9696
#### *External locations(s)*
9797
Is where the files containing the data is staged. Currently Azure Data Lake Storage (ADLS) Gen2 and Azure Blob Storage are supported:
9898

99-
- *External location* for Blob Storage: https://\<account\>.blob.core.windows.net/\<container\>/\<path\>
100-
- *External location* for ADLS Gen2: https://\<account\>.dfs.core.windows.net/\<container\>/\<path\>
99+
- *External location* for Blob Storage: `https://<account\>.blob.core.windows.net/<container\>/<path\>`
100+
- *External location* for ADLS Gen2: `https://<account\>.dfs.core.windows.net/<container\>/<path\>`
101101

102102
> [!NOTE]
103103
> The .blob endpoint is available for ADLS Gen2 as well and currently yields the best performance. Use the .blob endpoint when .dfs is not required for your authentication method.
@@ -119,7 +119,7 @@ Wildcards cards can be included in the path where
119119
120120
Multiple file locations can only be specified from the same storage account and container via a comma-separated list such as:
121121

122-
- `https://\<account\>.blob.core.windows.net/\<container\>/\<path\>`, `https://\<account\>.blob.core.windows.net\<container\>/\<path\>`
122+
- `https://<account>.blob.core.windows.net/<container\>/<path\>`, `https://<account\>.blob.core.windows.net/<container\>/<path\>`
123123

124124
#### *FILE_TYPE = { 'CSV' | 'PARQUET' | 'ORC' }*
125125
*FILE_TYPE* specifies the format of the external data.
@@ -302,7 +302,8 @@ Requires INSERT and ADMINISTER BULK OPERATIONS permissions. In [!INCLUDE[ssazure
302302
The following example is the simplest form of the COPY command, which loads data from a public storage account. For this example, the COPY statement's defaults match the format of the line item csv file.
303303

304304
```sql
305-
COPY INTO dbo.[lineitem] FROM 'https://unsecureaccount.blob.core.windows.net/customerdatasets/folder1/lineitem.csv'
305+
COPY INTO dbo.[lineitem]
306+
FROM 'https://unsecureaccount.blob.core.windows.net/customerdatasets/folder1/lineitem.csv'
306307
```
307308

308309
The default values of the COPY command are:
@@ -432,7 +433,7 @@ WITH (
432433
The COPY command will have better performance depending on your workload. For best loading performance, consider splitting your input into multiple files when loading CSV. This guidance applies to gzip compressed files as well.
433434

434435
### What is the file splitting guidance for the COPY command loading CSV files?
435-
Guidance on the number of files is outlined in the table below. Once the recommended number of files are reached, you will have better performance the larger the files. For a simple file splitting experience, refer to the following [documentation](https://techcommunity.microsoft.com/t5/azure-synapse-analytics/how-to-maximize-copy-load-throughput-with-file-splits/ba-p/1314474).
436+
Guidance on the number of files is outlined in the table below. Once the recommended number of files are reached, you will have better performance the larger the files. The number of files is determined by number of compute nodes multiplied by 60. For example, at 6000DWU we have 12 compute nodes and 12*60 = 720 partitions. For a simple file splitting experience, refer to [How to maximize COPY load throughput with file splits](https://techcommunity.microsoft.com/t5/azure-synapse-analytics/how-to-maximize-copy-load-throughput-with-file-splits/ba-p/1314474).
436437

437438
| **DWU** | **#Files** |
438439
| :-----: | :--------: |
@@ -461,18 +462,16 @@ There is no need to split Parquet and ORC files because the COPY command will au
461462
There are no limitations on the number or size of files; however, for best performance, we recommend files that are at least 4 MB.
462463

463464
### Are there any known issues with the COPY statement?
464-
If you have a Azure Synapse workspace that was created prior to 12/07/2020, you may run into a similar error message when authenticating using Managed Identity:
465-
466-
*com.microsoft.sqlserver.jdbc.SQLServerException: Managed Service Identity has not been enabled on this server. Please enable Managed Service Identity and try again.*
465+
If you have a Azure Synapse workspace that was created prior to 12/07/2020, you may run into a similar error message when authenticating using Managed Identity: `com.microsoft.sqlserver.jdbc.SQLServerException: Managed Service Identity has not been enabled on this server. Please enable Managed Service Identity and try again.`
467466

468467
Follow these steps to work around this issue by re-registering the workspace's managed identity:
469468

470-
1. Go to your Synapse workspace in the Azure portal
471-
2. Go to the Managed identities page
472-
3. If the "Allow Pipelines" option is already checked, you must uncheck this setting and save
473-
4. Check the "Allow Pipelines" option and save
469+
1. Go to your Azure Synapse workspace in the Azure portal.
470+
2. Go to the Managed identities page.
471+
3. If the "Allow Pipelines" option is already checked, you must uncheck this setting and save.
472+
4. Check the "Allow Pipelines" option and save.
474473

475474

476-
## See also
475+
## Next steps
477476

478477
[Loading overview with [!INCLUDE[ssazuresynapse-md](../../includes/ssazuresynapse-md.md)]](/azure/sql-data-warehouse/design-elt-data-loading)

0 commit comments

Comments
 (0)