You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/t-sql/statements/copy-into-transact-sql.md
+13-14Lines changed: 13 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ description: Use the COPY statement in Azure Synapse Analytics for loading from
5
5
author: MikeRayMSFT
6
6
ms.author: mikeray
7
7
ms.reviewer: wiassaf
8
-
ms.date: 01/04/2022
8
+
ms.date: 01/17/2023
9
9
ms.service: sql
10
10
ms.subservice: t-sql
11
11
ms.topic: language-reference
@@ -96,8 +96,8 @@ When a column list is not specified, COPY will map columns based on the source a
96
96
#### *External locations(s)*
97
97
Is where the files containing the data is staged. Currently Azure Data Lake Storage (ADLS) Gen2 and Azure Blob Storage are supported:
98
98
99
-
-*External location* for Blob Storage: https://\<account\>.blob.core.windows.net/\<container\>/\<path\>
100
-
-*External location* for ADLS Gen2: https://\<account\>.dfs.core.windows.net/\<container\>/\<path\>
99
+
-*External location* for Blob Storage: `https://<account\>.blob.core.windows.net/<container\>/<path\>`
100
+
-*External location* for ADLS Gen2: `https://<account\>.dfs.core.windows.net/<container\>/<path\>`
101
101
102
102
> [!NOTE]
103
103
> The .blob endpoint is available for ADLS Gen2 as well and currently yields the best performance. Use the .blob endpoint when .dfs is not required for your authentication method.
@@ -119,7 +119,7 @@ Wildcards cards can be included in the path where
119
119
120
120
Multiple file locations can only be specified from the same storage account and container via a comma-separated list such as:
*FILE_TYPE* specifies the format of the external data.
@@ -302,7 +302,8 @@ Requires INSERT and ADMINISTER BULK OPERATIONS permissions. In [!INCLUDE[ssSDW](
302
302
The following example is the simplest form of the COPY command, which loads data from a public storage account. For this example, the COPY statement's defaults match the format of the line item csv file.
303
303
304
304
```sql
305
-
COPY INTO dbo.[lineitem] FROM'https://unsecureaccount.blob.core.windows.net/customerdatasets/folder1/lineitem.csv'
The COPY command will have better performance depending on your workload. For best loading performance, consider splitting your input into multiple files when loading CSV. This guidance applies to gzip compressed files as well.
433
434
434
435
### What is the file splitting guidance for the COPY command loading CSV files?
435
-
Guidance on the number of files is outlined in the table below. Once the recommended number of files are reached, you will have better performance the larger the files. The number of files is determined by number of compute nodes multiplied by 60 - At 6000DWU we have 12 compute nodes for 12*60 = 720 partitions. For a simple file splitting experience, refer to the following [documentation](https://techcommunity.microsoft.com/t5/azure-synapse-analytics/how-to-maximize-copy-load-throughput-with-file-splits/ba-p/1314474).
436
+
Guidance on the number of files is outlined in the table below. Once the recommended number of files are reached, you will have better performance the larger the files. The number of files is determined by number of compute nodes multiplied by 60. For example, at 6000DWU we have 12 compute nodes and 12*60 = 720 partitions. For a simple file splitting experience, refer to [How to maximize COPY load throughput with file splits](https://techcommunity.microsoft.com/t5/azure-synapse-analytics/how-to-maximize-copy-load-throughput-with-file-splits/ba-p/1314474).
436
437
437
438
|**DWU**|**#Files**|
438
439
| :-----: | :--------: |
@@ -461,18 +462,16 @@ There is no need to split Parquet and ORC files because the COPY command will au
461
462
There are no limitations on the number or size of files; however, for best performance, we recommend files that are at least 4 MB.
462
463
463
464
### Are there any known issues with the COPY statement?
464
-
If you have a Azure Synapse workspace that was created prior to 12/07/2020, you may run into a similar error message when authenticating using Managed Identity:
465
-
466
-
*com.microsoft.sqlserver.jdbc.SQLServerException: Managed Service Identity has not been enabled on this server. Please enable Managed Service Identity and try again.*
465
+
If you have a Azure Synapse workspace that was created prior to 12/07/2020, you may run into a similar error message when authenticating using Managed Identity: `com.microsoft.sqlserver.jdbc.SQLServerException: Managed Service Identity has not been enabled on this server. Please enable Managed Service Identity and try again.`
467
466
468
467
Follow these steps to work around this issue by re-registering the workspace's managed identity:
469
468
470
-
1. Go to your Synapse workspace in the Azure portal
471
-
2. Go to the Managed identities page
472
-
3. If the "Allow Pipelines" option is already checked, you must uncheck this setting and save
473
-
4. Check the "Allow Pipelines" option and save
469
+
1. Go to your Azure Synapse workspace in the Azure portal.
470
+
2. Go to the Managed identities page.
471
+
3. If the "Allow Pipelines" option is already checked, you must uncheck this setting and save.
472
+
4. Check the "Allow Pipelines" option and save.
474
473
475
474
476
-
## See also
475
+
## Next steps
477
476
478
477
[Loading overview with [!INCLUDE[ssSDW](../../includes/sssdwfull-md.md)]](/azure/sql-data-warehouse/design-elt-data-loading)
0 commit comments