--- title: "Configure PolyBase Hadoop security" description: Explains how to configure PolyBase in Parallel Data Warehouse to connect to external Hadoop. author: mzaman1 ms.prod: sql ms.technology: data-warehouse ms.topic: conceptual ms.date: 10/26/2018 ms.author: murshedz ms.reviewer: martinle ms.custom: seo-dt-2019 --- # PolyBase configuration and security for Hadoop This article provides a reference for various configuration settings that affect APS PolyBase connectivity to Hadoop. For a walkthrough on what is PolyBase, see [What is PolyBase](configure-polybase-connectivity-to-external-data.md). > [!NOTE] > On APS, changes on XML files are needed on all compute nodes and control node. > > Take special care when modifying XML files in APS. Any missing tags or unwanted characters can invalidate the xml file hindering the usablilty of the feature. > Hadoop configuration files are located in the following path: > ``` > C:\Program Files\Microsoft SQL Server Parallel Data Warehouse\100\Hadoop\conf > ``` > Any changes to the xml files require a service restart to be effective. ## Hadoop.RPC.Protection setting A common way to secure communication in a hadoop cluster is by changing the hadoop.rpc.protection configuration to 'Privacy' or 'Integrity'. By default, PolyBase assumes the configuration is set to 'Authenticate'. To override this default, add the following property to the core-site.xml file. Changing this configuration will enable secure data transfer among the hadoop nodes and SSL connection to SQL Server. ```xml hadoop.rpc.protection ``` ## Kerberos configuration Note, when PolyBase authenticates to a Kerberos secured cluster, it expects the hadoop.rpc.protection setting is 'Authenticate' by default. This leaves the data communication between Hadoop nodes unencrypted. To use 'Privacy' or 'Integrity' settings for hadoop.rpc.protection, update the core-site.xml file on the PolyBase server. For more information, see the previous section [Connecting to Hadoop Cluster with Hadoop.rpc.protection](#rpcprotection). To connect to a Kerberos-secured Hadoop cluster using MIT KDC the following changes are needed on all APS compute nodes and control node: 1. Find the Hadoop configuration directories in the installation path of APS. Typically, the path is: ``` C:\Program Files\Microsoft SQL Server Parallel Data Warehouse\100\Hadoop\conf ``` 2. Find the Hadoop side configuration value of the configuration keys listed in the table. (On the Hadoop machine, find the files in the Hadoop configuration directory.) 3. Copy the configuration values into the value property in the corresponding files on the SQL Server machine. |**#**|**Configuration file**|**Configuration key**|**Action**| |------------|----------------|---------------------|----------| |1|core-site.xml|polybase.kerberos.kdchost|Specify the KDC hostname. For example: kerberos.your-realm.com.| |2|core-site.xml|polybase.kerberos.realm|Specify the Kerberos realm. For example: YOUR-REALM.COM| |3|core-site.xml|hadoop.security.authentication|Find the Hadoop side configuration and copy to SQL Server machine. For example: KERBEROS

**Security note:** KERBEROS must be written in upper case. If lower case, it might not be on.| |4|hdfs-site.xml|dfs.namenode.kerberos.principal|Find the Hadoop side configuration and copy to SQL Server machine. For example: hdfs/_HOST@YOUR-REALM.COM| |5|mapred-site.xml|mapreduce.jobhistory.principal|Find the Hadoop side configuration and copy to SQL Server machine. For example: mapred/_HOST@YOUR-REALM.COM| |6|mapred-site.xml|mapreduce.jobhistory.address|Find the Hadoop side configuration and copy to SQL Server machine. For example: 10.193.26.174:10020| |7|yarn-site.xml yarn.|yarn.resourcemanager.principal|Find the Hadoop side configuration and copy to SQL Server machine. For example: yarn/_HOST@YOUR-REALM.COM| **core-site.xml** ```xml polybase.kerberos.realm polybase.kerberos.kdchost hadoop.security.authentication KERBEROS ``` **hdfs-site.xml** ```xml dfs.namenode.kerberos.principal ``` **mapred-site.xml** ```xml mapreduce.jobhistory.principal mapreduce.jobhistory.address ``` **yarn-site.xml** ```xml yarn.resourcemanager.principal ``` 4. Create a database-scoped credential object to specify the authentication information for each Hadoop user. See [PolyBase T-SQL objects](../relational-databases/polybase/polybase-t-sql-objects.md). ## Hadoop Encryption Zone setup If you are using Hadoop encryption zone modify core-site.xml and hdfs-site.xml as following. Provide the ip address where KMS service is running with the corresponding port number. The default port for KMS on CDH is 16000. **core-site.xml** ```xml hadoop.security.key.provider.path kms://http@:16000/kms ``` **hdfs-site.xml** ```xml dfs.encryption.key.provider.uri kms://http@:16000/kms hadoop.security.key.provider.path kms://http@:16000/kms ```