| title | Access external data: S3-compatible object storage - PolyBase | |
|---|---|---|
| description | The article explains how to use PolyBase on a SQL Server instance to query external data in S3-compatible object storage. Create external tables to reference the external data. | |
| ms.date | 03/05/2021 | |
| ms.metadata | seo-lt-2019 | |
| ms.prod | sql | |
| ms.technology | polybase | |
| ms.topic | conceptual | |
| ms.custom |
|
|
| author | WilliamDAssafMSFT | |
| ms.author | wiassaf | |
| ms.reviewer | hudequei | |
| monikerRange | >= sql-server-linux-ver16 || >= sql-server-ver16 |
[!INCLUDE SQL Server 2022]
This article explains how to use PolyBase to query external data in an S3-compatible object storage.
To use the S3-compatible object storage integration features, you will need the following tools and resources:
- Install the PolyBase feature for SQL Server.
- Install SQL Server Management Studio (SSMS) or Azure Data Studio.
- S3-compatible storage.
- An S3 bucket created. Buckets cannot be created or configured from SQL Server.
- A user (
Access Key ID) has been configured and the secret (Secret Key ID) and that user is known to you. You will need both to authenticate against the S3 object storage endpoint. - ListBucket permission on S3 user for browse privileges.
- ReadOnly permission on S3 user for read privileges.
- WriteOnly permission on S3 user for write privileges.
- TLS must have been configured. It is assumed that all connections will be securely transmitted over HTTPS not HTTP. The endpoint will be validated by a certificate installed on the SQL Server OS Host.
In order for the proxy user to read the content of an S3 bucket, the user will need to be allowed to perform the following actions against the S3 endpoint:
- ListBucket;
- ReadOnly;
- Enable PolyBase in
sp_configure:
exec sp_configure @configname = 'polybase enabled', @configvalue = 1
;
RECONFIGURE
;
exec sp_configure @configname = 'polybase enabled'
;- Before you create a database scoped credential, the user database must have a master key to protect the credential. For more information, see CREATE MASTER KEY.
The following sample script creates a database scoped credential s3-dc in the source user database in SQL Server. For more information, see CREATE DATABASE SCOPED CREDENTIAL (Transact-SQL).
USE [database_name];
GO
IF NOT EXISTS(SELECT * FROM sys.credentials WHERE name = 's3_dc')
BEGIN
CREATE DATABASE SCOPED CREDENTIAL s3_dc
WITH IDENTITY = 'S3 Access Key',
SECRET = '<AccessKeyID>:<SecretKeyID>' ;
END
GOVerify the new database-scoped credential with sys.database_scoped_credentials (Transact-SQL):
SELECT * FROM sys.database_scoped_credentials;The following sample script creates an external data source s3_ds in the source user database in SQL Server. The external data source references the s3_dc database scoped credential. For more information, see CREATE EXTERNAL DATA SOURCE.
CREATE EXTERNAL DATA SOURCE s3_ds
WITH
( LOCATION = 's3://<ip_address>:<port>/'
, CREDENTIAL = s3_dc
);
GOVerify the new external data source with sys.external_data_sources.
SELECT * FROM sys.external_data_sources;- SQL Server queries on an external table backed by S3-compliant object storage are limited to 1000 objects per prefix. This is because S3-compliant object listing is limited to 1000 object keys per prefix.
- For S3-compliant object storage, customers are not allowed to create their access key ID with a
:character in it. - The total URL length is limited to 259 characters. This means
s3://<hostname>/<objectkey>shouldn't exceed 259 characters. Thes3://counts towards this limit, so the path length cannot exceed 259-5 = 254 characters. - The SQL credential name is limited by 128 characters in UTF-16 format.
- The credential name created must contain the bucket name unless this credential is for a new external data source.
- Access Key ID and Secret Key ID must only contain alphanumeric values.
- To learn more about PolyBase, see Overview of SQL Server PolyBase
- Virtualize parquet file in a S3-compatible object storage with PolyBase
- For more tutorials on creating external data sources and external tables to a variety of data sources, see PolyBase Transact-SQL reference.