--- title: Apache Spark and Apache Hadoop titleSuffix: Configure Apache Spark and Apache Hadoop in Big Data Clusters description: SQL Server Big Data Clusters allow Spark and HDFS solutions. Learn how to configure them. author: rajmera3 ms.author: raajmera ms.reviewer: mikeray ms.date: 02/13/2020 ms.topic: conceptual ms.prod: sql ms.technology: big-data-cluster --- # Configure Apache Spark and Apache Hadoop in Big Data Clusters In order to configure Apache Spark and Apache Hadoop in Big Data Clusters, you need to modify the cluster profile at deployment time. ## Supported Configurations Currently we have four configuration categories: - `sql` - `hdfs` - `spark` - `gateway` In Big Data Clusters, we define 3 services: `hdfs`, `spark`, `sql`. Coincidentally each service maps to the same named configuration category. All gateway configurations go to category `gateway`. For example, all configurations in service `hdfs` belong to category `hdfs`. Note that all Hadoop (core-site), HDFS and Zookeeper configurations belong to category `hdfs`; all Livy/Spark/Yarn/Hive Metastore configurations belong to category "spark". You can find all possible configurations for each at the associated Apache documentation site: - Apache Spark: https://spark.apache.org/docs/latest/configuration.html - Apache Hadoop: * HDFS HDFS-Site: https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml * HDFS Core-Site: https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/core-default.xml * Yarn: https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/ResourceModel.html - Hive: https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-MetaStore - Livy: https://github.com/cloudera/livy/blob/master/conf/livy.conf.template - Apache Knox Gateway: https://knox.apache.org/books/knox-0-14-0/user-guide.html#Gateway+Details ## Unsupported Configurations The following configurations are unsupported and cannot be changed in the context of the Big Data Cluster. | Category  | Sub-Category  | File  | Unsupported Configurations  | |-----------|----------------------------|----------------------------|-------------------------------------------------------------------------| | spark  | | | | | | yarn-site  | yarn-site.xml  | yarn.log-aggregation-enable  | | | | | yarn.log.server.url  | | | | | yarn.nodemanager.pmem-check-enabled  | | | | | yarn.nodemanager.vmem-check-enabled  | | | | | yarn.nodemanager.aux-services  | | | | | yarn.resourcemanager.address  | | | | | yarn.nodemanager.address  | | | | | yarn.client.failover-no-ha-proxy-provider  | | | | | yarn.client.failover-proxy-provider  | | | | | yarn.http.policy  | | | | | yarn.nodemanager.linux-container-executor.secure-mode.use-pool-user  | | | | | yarn.nodemanager.linux-container-executor.secure-mode.pool-user-prefix  | | | | | yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user  | | | | | yarn.acl.enable  | | | | | yarn.admin.acl  | | | | | yarn.resourcemanager.hostname  | | | | | yarn.resourcemanager.principal  | | | | | yarn.resourcemanager.keytab  | | | | | yarn.resourcemanager.webapp.spnego-keytab-file  | | | | | yarn.resourcemanager.webapp.spnego-principal  | | | | | yarn.nodemanager.principal  | | | | | yarn.nodemanager.keytab  | | | | | yarn.nodemanager.webapp.spnego-keytab-file  | | | | | yarn.nodemanager.webapp.spnego-principal  | | | | | yarn.resourcemanager.ha.enabled  | | | | | yarn.resourcemanager.cluster-id  | | | | | yarn.resourcemanager.zk-address  | | | | | yarn.resourcemanager.ha.rm-ids  | | | | | yarn.resourcemanager.hostname.*  | | | | | | | | capacity-scheduler  | capacity-scheduler.xml  | yarn.scheduler.capacity.root.acl_submit_applications  | | | | | yarn.scheduler.capacity.root.acl_administer_queue  | | | | | yarn.scheduler.capacity.root.default.acl_application_max_priority  | | | | | | | | yarn-env  | yarn-env.sh  | | | | spark-defaults-conf  | spark-defaults.conf  | spark.yarn.archive  | | | | | spark.yarn.historyServer.address  | | | | | spark.eventLog.enabled  | | | | | spark.eventLog.dir  | | | | | spark.sql.warehouse.dir  | | | | | spark.sql.hive.metastore.version  | | | | | spark.sql.hive.metastore.jars  | | | | | spark.extraListeners  | | | | | spark.metrics.conf  | | | | | spark.ssl.enabled  | | | | | spark.authenticate  | | | | | spark.network.crypto.enabled  | | | | | spark.ssl.keyStore  | | | | | spark.ssl.keyStorePassword  | | | | | | | | | | spark.ui.enabled  | | | spark-env  | spark-env.sh  | SPARK_NO_DAEMONIZE  | | | | | SPARK_DIST_CLASSPATH  | | | | | | | | spark-history-server-conf  | spark-history-server.conf  | spark.history.fs.logDirectory  | | | | | spark.ui.proxyBase  | | | | | spark.history.fs.cleaner.enabled  | | | | | spark.ssl.enabled  | | | | | spark.authenticate  | | | | | spark.network.crypto.enabled  | | | | | spark.ssl.keyStore  | | | | | spark.ssl.keyStorePassword  | | | | | spark.history.kerberos.enabled  | | | | | spark.history.kerberos.principal  | | | | | spark.history.kerberos.keytab  | | | | | spark.ui.filters  | | | | | spark.acls.enable  | | | | | spark.history.ui.acls.enable  | | | | | spark.history.ui.admin.acls  | | | | | spark.history.ui.admin.acls.groups  | | | | | | | | livy-conf  | livy.conf  | livy.keystore  | | | | | livy.keystore.password  | | | | | livy.spark.master  | | | | | livy.spark.deploy-mode  | | | | | livy.rsc.jars  | | | | | livy.repl.jars  | | | | | livy.rsc.pyspark.archives  | | | | | livy.rsc.sparkr.package  | | | | | livy.repl.enable-hive-context  | | | | | livy.superusers  | | | | | livy.server.auth.type  | | | | | livy.server.launch.kerberos.keytab  | | | | | livy.server.launch.kerberos.principal  | | | | | livy.server.auth.kerberos.principal  | | | | | livy.server.auth.kerberos.keytab  | | | | | livy.impersonation.enabled  | | | | | livy.server.access-control.enabled  | | | | | livy.server.access-control.*  | | | | | | | | livy-env  | livy-env.sh  | LIVY_SERVER_JAVA_OPTS  | | | hive-site  | hive-site.xml  | javax.jdo.option.ConnectionURL  | | | | | javax.jdo.option.ConnectionDriverName  | | | | | javax.jdo.option.ConnectionUserName  | | | | | javax.jdo.option.ConnectionPassword  | | | | | hive.metastore.uris  | | | | | hive.metastore.pre.event.listeners  | | | | | hive.security.authorization.enabled  | | | | | hive.security.metastore.authenticator.manager  | | | | | hive.security.metastore.authorization.manager  | | | | | hive.metastore.use.SSL  | | | | | hive.metastore.keystore.path  | | | | | hive.metastore.keystore.password  | | | | | hive.metastore.truststore.path  | | | | | hive.metastore.truststore.password  | | | | | hive.metastore.kerberos.keytab.file  | | | | | hive.metastore.kerberos.principal  | | | | | hive.metastore.sasl.enabled  | | | | | hive.metastore.execute.setugi  | | | | | hive.cluster.delegation.token.store.class  | | | | | | | | hive-env  | hive-env.sh  | | | | | | | | hdfs  | | | | | | core-site  | core-site.xml  | fs.defaultFS  | | | | | ha.zookeeper.quorum  | | | | | hadoop.tmp.dir  | | | | | hadoop.rpc.protection  | | | | | hadoop.security.auth_to_local  | | | | | hadoop.security.authentication  | | | | | hadoop.security.authorization  | | | | | hadoop.http.authentication.simple.anonymous.allowed  | | | | | hadoop.http.authentication.type  | | | | | hadoop.http.authentication.kerberos.principal  | | | | | hadoop.http.authentication.kerberos.keytab  | | | | | hadoop.http.filter.initializers  | | | | | hadoop.security.group.mapping.*  | | | | | | | | hadoop-env  | hadoop-env.sh  | JAVA_HOME  | | | | | HADOOP_CLASSPATH  | | | | | | | | mapred-env  | mapred-env.sh  | | | | hdfs-site  | hdfs-site.xml  | dfs.namenode.name.dir  | | | | | dfs.datanode.data.dir  | | | | | dfs.namenode.acls.enabled  | | | | | dfs.namenode.datanode.registration.ip-hostname-check  | | | | | dfs.client.retry.policy.enabled  | | | | | dfs.permissions.enabled  | | | | | dfs.nameservices  | | | | | dfs.ha.namenodes.nmnode-0  | | | | | dfs.namenode.rpc-address.nmnode-0.*  | | | | | dfs.namenode.shared.edits.dir  | | | | | dfs.ha.automatic-failover.enabled  | | | | | dfs.ha.fencing.methods  | | | | | dfs.journalnode.edits.dir  | | | | | dfs.client.failover.proxy.provider.nmnode-0  | | | | | dfs.namenode.http-address  | | | | | dfs.namenode.httpS-address  | | | | | dfs.http.policy  | | | | | dfs.encrypt.data.transfer  | | | | | dfs.block.access.token.enable  | | | | | dfs.data.transfer.protection  | | | | | dfs.encrypt.data.transfer.cipher.suites  | | | | | dfs.https.port  | | | | | dfs.namenode.keytab.file  | | | | | dfs.namenode.kerberos.principal  | | | | | dfs.namenode.kerberos.internal.spnego.principal  | | | | | dfs.datanode.data.dir.perm  | | | | | dfs.datanode.address  | | | | | dfs.datanode.http.address  | | | | | dfs.datanode.ipc.address  | | | | | dfs.datanode.https.address  | | | | | dfs.datanode.keytab.file  | | | | | dfs.datanode.kerberos.principal  | | | | | dfs.journalnode.keytab.file  | | | | | dfs.journalnode.kerberos.principal  | | | | | dfs.journalnode.kerberos.internal.spnego.principal  | | | | | dfs.web.authentication.kerberos.keytab  | | | | | dfs.web.authentication.kerberos.principal  | | | | | dfs.webhdfs.enabled  | | | | | dfs.permissions.superusergroup  | | | | | | | | hdfs-env  | hdfs-env.sh  | HADOOP_HEAPSIZE_MAX  | | | | | | | | zoo-cfg  | zoo.cfg  | secureClientPort  | | | | | clientPort  | | | | | dataDir  | | | | | dataLogDir  | | | | | 4lw.commands.whitelist  | | | | | | | | zookeeper-java-env  | java.env  | ZK_LOG_DIR  | | | | | SERVER_JVMFLAGS  | | | | | | | | zookeeper-log4j-properties  | log4j.properties (zookeeper)  | log4j.rootLogger  | | | | | log4j.appender.CONSOLE.*  | | | | | | | gateway  | | | | | | gateway-site  | gateway-site.xml  | gateway.port  | | | | | gateway.path  | | | | | gateway.gateway.conf.dir  | | | | | gateway.hadoop.kerberos.secured  | | | | | java.security.krb5.conf  | | | | | java.security.auth.login.config  | | | | | gateway.websocket.feature.enabled  | | | | | gateway.scope.cookies.feature.enabled  | | | | | ssl.exclude.protocols  | | | | | ssl.include.ciphers  | | | | | | ## Configurations via Cluster Profile In the cluster profile there are resources and services. At deployment time, we can specify configurations in one of two ways: * First, at the resource level: The following examples are the patch files for the profile: ```json { "op": "add", "path": "spec.resources.zookeeper.spec.settings", "value": { "hdfs": { "zoo-cfg.syncLimit": "6" } } } ``` Or: ```json { "op": "add", "path": "spec.resources.gateway.spec.settings", "value": { "gateway": { "gateway-site.gateway.httpclient.socketTimeout": "95s" } } } ``` * Second, at the service level. Assign multiple resources to a service, and specify configurations to the service. The following is an example of the patch file for the profile: ```json { "op": "add", "path": "spec.services.hdfs.settings", "value": { “core-site.hadoop.proxyuser.xyz.users”: “*” … } } ``` The service `hdfs` is defined as: ```json { "spec": { "services": { "hdfs": { "resources": [ "nmnode-0", "zookeeper", "storage-0", "sparkhead" ], "settings":{ "hdfs-site.dfs.replication": "3" } } } } } ``` > [!NOTE] > Resource level configurations override service level configurations. One resource can be assigned to multiple services. ## Limitations Configurations can only be specified at category level. To specify multiple configurations with the same sub-category, we cannot extract the common prefix in cluster profile. ```json { "op": "add", "path": "spec.services.hdfs.settings.core-site.hadoop", "value": { “proxyuser.xyz.users”: “*”, “proxyuser.abc.users”: “*” } } ``` ## Next steps - [`azdata` reference](reference-azdata.md) - [What are [!INCLUDE[big-data-clusters-2019](../includes/ssbigdataclusters-ver15.md)]?](big-data-cluster-overview.md)