--- title: "Access external data: Azure Blob Storage - PolyBase" description: The article uses PolyBase on a SQL Server instance with Azure Blob Storage. PolyBase is suited for ad-hoc queries of external tables and data import/export. ms.date: 12/13/2019 ms.prod: sql ms.technology: polybase ms.topic: conceptual author: MikeRayMSFT ms.author: mikeray ms.reviewer: "" monikerRange: ">= sql-server-2016 || =sqlallproducts-allversions" ms.custom: seo-dt-2019, seo-lt-2019 --- # Configure PolyBase to access external data in Azure Blob Storage [!INCLUDE[appliesto-ss-xxxx-asdw-pdw-md-winonly](../../includes/appliesto-ss-xxxx-xxxx-xxx-md-winonly.md)] The article explains how to use PolyBase on a SQL Server instance to query external data in Azure Blob Storage. ## Prerequisites If you haven't installed PolyBase, see [PolyBase installation](polybase-installation.md). The installation article explains the prerequisites. ### Configure Azure blob storage connectivity First, configure SQL Server PolyBase to use Azure blob storage. 1. Run [sp_configure](../../relational-databases/system-stored-procedures/sp-configure-transact-sql.md) with 'hadoop connectivity' set to an Azure Blob Storage provider. To find the value for providers, see [PolyBase Connectivity Configuration](../../database-engine/configure-windows/polybase-connectivity-configuration-transact-sql.md). By Default, the Hadoop connectivity is set to 7. ```sql -- Values map to various external data sources. -- Example: value 7 stands for Hortonworks HDP 2.1 to 2.6 on Linux, -- 2.1 to 2.3 on Windows Server, and Azure blob storage sp_configure @configname = 'hadoop connectivity', @configvalue = 7; GO RECONFIGURE GO ``` 2. You must restart SQL Server using **services.msc**. Restarting SQL Server restarts these services: - SQL Server PolyBase Data Movement Service - SQL Server PolyBase Engine ![stop and start PolyBase services in services.msc](../../relational-databases/polybase/media/polybase-stop-start.png "stop and start PolyBase services in services.msc") ## Configure an external table To query the data in your Hadoop data source, you must define an external table to use in Transact-SQL queries. The following steps describe how to configure the external table. 1. Create a master key on the database. This is required to encrypt the credential secret. ```sql CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'S0me!nfo'; ``` 1. Create a database scoped credential for Azure blob storage. ```sql -- IDENTITY: any string (this is not used for authentication to Azure storage). -- SECRET: your Azure storage account key. CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential WITH IDENTITY = 'user', Secret = ''; ``` 1. Create an external data source with [CREATE EXTERNAL DATA SOURCE](../../t-sql/statements/create-external-data-source-transact-sql.md).. ```sql -- LOCATION: Azure account storage account name and blob container name. -- CREDENTIAL: The database scoped credential created above. CREATE EXTERNAL DATA SOURCE AzureStorage with ( TYPE = HADOOP, LOCATION ='wasbs://@.blob.core.windows.net', CREDENTIAL = AzureStorageCredential ); ``` 1. Create an external file format with [CREATE EXTERNAL FILE FORMAT](../../t-sql/statements/create-external-file-format-transact-sql.md). ```sql -- FORMAT TYPE: Type of format in Hadoop (DELIMITEDTEXT, RCFILE, ORC, PARQUET). CREATE EXTERNAL FILE FORMAT TextFileFormat WITH ( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE)) ``` 1. Create an external table pointing to data stored in Azure storage with [CREATE EXTERNAL TABLE](../../t-sql/statements/create-external-table-transact-sql.md). In this example, the external data contains car sensor data. ```sql -- LOCATION: path to file or directory that contains the data (relative to HDFS root). CREATE EXTERNAL TABLE [dbo].[CarSensor_Data] ( [SensorKey] int NOT NULL, [CustomerKey] int NOT NULL, [GeographyKey] int NULL, [Speed] float NOT NULL, [YearMeasured] int NOT NULL ) WITH (LOCATION='/Demo/', DATA_SOURCE = AzureStorage, FILE_FORMAT = TextFileFormat ); ``` 1. Create statistics on an external table. ```sql CREATE STATISTICS StatsForSensors on CarSensor_Data(CustomerKey, Speed) ``` ## PolyBase queries There are three functions that PolyBase is suited for: - Ad-hoc queries against external tables. - Importing data. - Exporting data. The following queries provide example with fictional car sensor data. ### Ad-hoc queries The following ad-hoc query joins relational with Hadoop data. It selects customers who drive faster than 35 mph,joining structured customer data stored in SQL Server with car sensor data stored in Hadoop. ```sql SELECT DISTINCT Insured_Customers.FirstName,Insured_Customers.LastName, Insured_Customers. YearlyIncome, CarSensor_Data.Speed FROM Insured_Customers, CarSensor_Data WHERE Insured_Customers.CustomerKey = CarSensor_Data.CustomerKey and CarSensor_Data.Speed > 35 ORDER BY CarSensor_Data.Speed DESC OPTION (FORCE EXTERNALPUSHDOWN); -- or OPTION (DISABLE EXTERNALPUSHDOWN) ``` ### Importing data The following query imports external data into SQL Server. This example imports data for fast drivers into SQL Server to do more in-depth analysis. To improve performance, it leverages Columnstore technology. ```sql SELECT DISTINCT Insured_Customers.FirstName, Insured_Customers.LastName, Insured_Customers.YearlyIncome, Insured_Customers.MaritalStatus INTO Fast_Customers from Insured_Customers INNER JOIN ( SELECT * FROM CarSensor_Data where Speed > 35 ) AS SensorD ON Insured_Customers.CustomerKey = SensorD.CustomerKey ORDER BY YearlyIncome CREATE CLUSTERED COLUMNSTORE INDEX CCI_FastCustomers ON Fast_Customers; ``` ### Exporting data The following query exports data from SQL Server to Azure Blob Storage. To do this, you first have to enable PolyBase export. The create an external table for the destination before exporting data to it. ```sql -- Enable INSERT into external table sp_configure 'allow polybase export', 1; reconfigure -- Create an external table. CREATE EXTERNAL TABLE [dbo].[FastCustomers2009] ( [FirstName] char(25) NOT NULL, [LastName] char(25) NOT NULL, [YearlyIncome] float NULL, [MaritalStatus] char(1) NOT NULL ) WITH ( LOCATION='/old_data/2009/customerdata', DATA_SOURCE = HadoopHDP2, FILE_FORMAT = TextFileFormat, REJECT_TYPE = VALUE, REJECT_VALUE = 0 ); -- Export data: Move old data to Hadoop while keeping it query-able via an external table. INSERT INTO dbo.FastCustomer2009 SELECT T.* FROM Insured_Customers T1 JOIN CarSensor_Data T2 ON (T1.CustomerKey = T2.CustomerKey) WHERE T2.YearMeasured = 2009 and T2.Speed > 40; ``` ## View PolyBase objects in SSMS In SSMS, external tables are displayed in a separate folder **External Tables**. External data sources and external file formats are in subfolders under **External Resources**. ![PolyBase objects in SSMS](media/polybase-management.png) ## Next steps Explore more ways to use and monitor PolyBase in the following articles: [PolyBase scale-out groups](../../relational-databases/polybase/polybase-scale-out-groups.md). [PolyBase troubleshooting](polybase-troubleshooting.md).