上一篇大数据文章讲解了在单机上搭建Hadoop-Yarn 伪分布式集群的安装方法,方便大家学习,真实环境不可能只有一台机器,肯定是多节点的集群,大单位还会建设很多Hadoop集群,比如各个大部门有自己的集群,或者按热、温、冷来划分建立集群,反正都是很多台服务器安装Linux系统来搭建一个集群。
1. 准备安装包
下载地址:https://archive.apache.org/dist/hadoop/common/hadoop-2.10.0/, 下载得到:hadoop-2.10.0.tar.gz。
2. 安装
* 2.1 系统软硬件环境安装
首先安装好Linux系统,并在每台机器上安装好Java JDK环境。
集群中的其余机器同时充当DataNode和NodeManager, 它们是slave节点。
注意:我建立的HadoopYarn集群主要是用来执行Spark和Flink Job的,并不是用hdfs来存储海量数据的,
vi /etc/hosts
添加四条记录: hadoop1 hadoop2 hadoop-master hadoop3 hadoop4
master节点为: hadoop2, hadoop-master
mkdir /opt/bigdata
tar zxvf hadoop-2.10.0.tar.gz
* 2.2 创建子目录
cd hadoop-2.10.0/
mkdir -p dfs/data dfs/name
mkdir -p logs/hdfs logs/yarn
mkdir -p tmp/hdfs tmp/yarn
3. 设置环境变量
vi ~/.profile, 添加:
# 假设jdk安装在/opt/bigdata/目录下, 自己安装实际情况配置线上路径
export JAVA_HOME=/opt/bigdata/jdk1.8.0_144
export HADOOP_PREFIX=/opt/bigdata/hadoop-2.10.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_YARN_HOME=/opt/bigdata/hadoop-2.10.0
export HADOOP_CLASSPATH=`hadoop classpath`
使其生效,执行: . ~/.profile
4. 配置
所有配置文件和环境脚本文件到放到etc/hadoop/目录下, 进入配置目录:
cd hadoop-2.10.0/etc/hadoop/
这里配置文件很多,但只要修改4个.xml配置文件和2个.sh脚本文件, 以及slaves:
* 4.1 hadoop-env.sh
export HADOOP_PREFIX=/opt/bigdata/hadoop-2.10.0
export HADOOP_YARN_HOME=/opt/bigdata/hadoop-2.10.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_LOG_DIR=${HADOOP_HOME}/logs/hdfs
* 4.2 yarn-env.sh
export JAVA_HOME=/opt/bigdata/jdk1.8.0_144
* 4.3 core-site.xml
安装路径要填写实际的目录,如: /opt/bigdata/hadoop-2.10.0
* 4.4 hdfs-site.xml
安装路径要填写实际的目录,如: /opt/bigdata/hadoop-2.10.0
修改master节点域名 hadoop-master: (在每台服务器上已经配置/etc/hosts或CoreDNS里配置, 参见1.1节)
hdfs namenode web ui 地址
hdfs scondary web ui 地址
* 4.5 yarn-site.xml
安装路径要填写实际的目录,如: /opt/bigdata/hadoop-2.10.0
修改master节点地址: hadoop-master, (在每台服务器上已经配置/etc/hosts或CoreDNS里配置, 参见1.1节)
任务资源调度策略:1) CapacityScheduler: 按队列调度;2) FairScheduler: 平均分配。
很重要的配置,一定要理解原理, 然后你自己来选择启动哪一种策略。
分配给AM单个容器可申请的最小内存: MB
The minimum allocation for every container request at the RM in MBs. Memory requests lower than this will be set to the value of this
property. Additionally, a node manager that is configured to have less memory
than this value will be shut down by the resource manager.
分配给AM单个容器可申请的最大内存: MB
The maximum allocation for every container request at the RM in MBs. Memory requests higher than this will throw an
The minimum allocation for every container request at the RM in terms of virtual CPU cores. Requests lower than this will be set to the
value of this property. Additionally, a node manager that is configured to
have fewer virtual cores than this value will be shut down by the resource
The maximum allocation for every container request at the RM in terms of virtual CPU cores. Requests higher than this will throw an
****************** 必须根据服务器实际内存来修改 ************************
NodeManager节点最大可用内存, 根据实际机器上的物理内存进行配置:
max(Container) = yarn.nodemanager.resource.memory-mb / yarn.scheduler.maximum-allocation-mb
Amount of physical memory, in MB, that can be allocated for containers. If set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
automatically calculated(in case of Windows and Linux).
In other cases, the default is 8192MB.
****************** 必须根据服务器实际CPU来修改 ************************
Number of vcores that can be allocated for containers. This is used by the RM scheduler when allocating
resources for containers. This is not used to limit the number of
CPUs used by YARN containers. If it is set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
automatically determined from the hardware in case of Windows and Linux.
In other cases, number of vcores is 8 by default.
下面四项一般不需要修改,除非我们上面创建子目录tmp, logs时不是这样的名称,否则是不需要改动的。
* 4.6 capacity-scheduler.xml
The ResourceCalculator implementation to be used to compare
Resources in the scheduler.
The default i.e. DefaultResourceCalculator only uses Memory while
DominantResourceCalculator uses dominant-resource to compare
multi-dimensional resources such as Memory, CPU etc.
* 4.7 slaves: 从节点域名配置
hadoop2 (如果在master节点上同时部署slave的话)
* 4.8 配置文件分发
5. 运行集群的系统用户账号
6. 设置ssh免密登录
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
authorized_keys id_rsa id_rsa.pub known_hosts
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
scp ~/.ssh/id_rsa.pub test@
scp ~/.ssh/id_rsa.pub test@
scp ~/.ssh/id_rsa.pub test@
cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
ssh test@
ssh test@
ssh test@
ssh test@
7. 格式化hdfs
$HADOOP_PREFIX/bin/hdfs namenode -format
8. 启动hadoop NameNode daemon和DataNode daemon
* 8.1 启动/关闭HDFS NameNode:
$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode
* 8.2 启动/关闭HDFS DataNode:
在各个slave节点上执行:这里是3台机器hadoop1, hadoop3, hadoop4
$HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
$HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode
* 8.3 启动/关闭所有节点:
9. 启动ResourceManager daemon 和 NodeManager daemon
* 9.1 启动/关闭ResourceManager节点:
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager
* 9.2 启动/关闭NodeManager节点:
在各个slave节点上执行:这里是3台机器hadoop1, hadoop3, hadoop4
$HADOOP_YARN_HOME/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR start nodemanager
$HADOOP_YARN_HOME/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR stop nodemanager
* 9.3 启动/关闭standalone WebAppProxy server:
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start proxyserver
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop proxyserver
* 9.4 启动所有节点:
10. 整合脚本
test@hadoop2:/opt/bigdata/hadoop-2.10.0$ ./run.sh
usage: ./run.sh [cmd]
./run.sh namenode_format [cluster name]
./run.sh start [namenode | datanode]
./run.sh stop [namenode | datanode]
./run.sh start dfs
./run.sh stop dfs
./run.sh start [resourcemanager | nodemanager]
./run.sh stop [resourcemanager | nodemanager]
./run.sh start yarn
./run.sh stop yarn
./run.sh start proxyserver
./run.sh stop proxyserver
./run.sh set_env
hadoop.home /opt/bigdata/hadoop-2.10.0 fs.defaultFS hdfs:// The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem. io.file.buffer.size 131072 The size of buffer for use in sequence files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines how much data is buffered during read and write operations. hadoop.tmp.dir ${hadoop.home}/tmp/hdfs A base for other temporary directories.
hadoop.home /opt/bigdata/hadoop-2.10.0 dfs.namenode.name.dir ${hadoop.home}/dfs/name Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. dfs.hosts Names a file that contains a list of hosts that are permitted to connect to the namenode. The full pathname of the file must be specified. If the value is empty, all hosts are permitted. dfs.hosts.exclude Names a file that contains a list of hosts that are not permitted to connect to the namenode. The full pathname of the file must be specified. If the value is empty, no hosts are excluded. dfs.blocksize 134217728 The default block size for new files, in bytes. You can use the following suffix (case insensitive): k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, etc.), Or provide complete size in bytes (such as 134217728 for 128 MB). dfs.namenode.handler.count 10 The number of Namenode RPC server threads that listen to requests from clients. If dfs.namenode.servicerpc-address is not configured then Namenode RPC server threads listen to requests from all nodes. dfs.datanode.data.dir ${hadoop.home}/dfs/data Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. The directories should be tagged with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS storage policies. The default storage type will be DISK if the directory does not have a storage type tagged explicitly. Directories that do not exist will be created if local filesystem permission allows. dfs.replication 3 Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. dfs.permissions.enabled false If "true", enable permission checking in HDFS. If "false", permission checking is turned off, but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode, owner or group of files or directories. dfs.webhdfs.enabled true Enable WebHDFS (REST API) in Namenodes and Datanodes. dfs.http.address hadoop2:50070 hdfs namenode web ui 地址 dfs.secondary.http.address hadoop2:50090 hdfs scondary web ui 地址
hadoop.home /opt/bigdata/hadoop-2.10.0 Are acls enabled. yarn.acl.enable false ACL of who can be admin of the YARN cluster. yarn.admin.acl * Whether to enable log aggregation. Log aggregation collects each container's logs and moves these logs onto a file-system, for e.g. HDFS, after the application completes. Users can configure the "yarn.nodemanager.remote-app-log-dir" and "yarn.nodemanager.remote-app-log-dir-suffix" properties to determine where these logs are moved to. Users can access the logs via the Application Timeline Server. yarn.log-aggregation-enable false The hostname of the RM. yarn.resourcemanager.hostname hadoop2 The address of the applications manager interface in the RM. yarn.resourcemanager.address ${yarn.resourcemanager.hostname}:8032 The address of the scheduler interface. yarn.resourcemanager.scheduler.address ${yarn.resourcemanager.hostname}:8030 yarn.resourcemanager.resource-tracker.address ${yarn.resourcemanager.hostname}:8031 The address of the RM admin interface. yarn.resourcemanager.admin.address ${yarn.resourcemanager.hostname}:8033 The http address of the RM web application. If only a host is provided as the value, the webapp will be served on a random port. yarn.resourcemanager.webapp.address ${yarn.resourcemanager.hostname}:8088 A comma separated list of services where service name should only contain a-zA-Z0-9_ and can not start with numbers yarn.nodemanager.aux-services mapreduce_shuffle The class to use as the resource scheduler. yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler The minimum allocation for every container request at the RM in MBs. Memory requests lower than this will be set to the value of this property. Additionally, a node manager that is configured to have less memory than this value will be shut down by the resource manager. yarn.scheduler.minimum-allocation-mb 1024 The maximum allocation for every container request at the RM in MBs. Memory requests higher than this will throw an InvalidResourceRequestException. yarn.scheduler.maximum-allocation-mb 8192 The minimum allocation for every container request at the RM in terms of virtual CPU cores. Requests lower than this will be set to the value of this property. Additionally, a node manager that is configured to have fewer virtual cores than this value will be shut down by the resource manager. yarn.scheduler.minimum-allocation-vcores 1 The maximum allocation for every container request at the RM in terms of virtual CPU cores. Requests higher than this will throw an InvalidResourceRequestException. yarn.scheduler.maximum-allocation-vcores 4 Amount of physical memory, in MB, that can be allocated for containers. If set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true, it is automatically calculated(in case of Windows and Linux). In other cases, the default is 8192MB. NodeManager节点最大可用内存, 根据实际机器上的物理内存进行配置: NodeManager节点最大Container数量: max(Container) = yarn.nodemanager.resource.memory-mb / yarn.scheduler.maximum-allocation-mb yarn.nodemanager.resource.memory-mb -1 Ratio between virtual memory to physical memory when setting memory limits for containers. Container allocations are expressed in terms of physical memory, and virtual memory usage is allowed to exceed this allocation by this ratio. yarn.nodemanager.vmem-pmem-ratio 2.1 Number of vcores that can be allocated for containers. This is used by the RM scheduler when allocating resources for containers. This is not used to limit the number of CPUs used by YARN containers. If it is set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true, it is automatically determined from the hardware in case of Windows and Linux. In other cases, number of vcores is 8 by default. 节点服务器上yarn可以使用的虚拟的CPU个数,默认是8,推荐配置与核心个数相同。 如果节点CPU的核心个数不足8个,需要调小这个值,yarn不会智能的去检测物理核数。 如果机器性能较好,可以配置为物理核数的2倍。 yarn.nodemanager.resource.cpu-vcores 32 hadoop.tmp.dir ${hadoop.home}/tmp yarn.log.dir ${hadoop.home}/logs/yarn List of directories to store localized files in. An application's localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers' work directories, called container_${contid}, will be subdirectories of this. yarn.nodemanager.local-dirs ${hadoop.tmp.dir}/yarn/nm-local-dir Where to store container logs. An application's localized log directory will be found in ${yarn.nodemanager.log-dirs}/application_${appid}. Individual containers' log directories will be below this, in directories named container_{$contid}. Each container directory will contain the files stderr, stdin, and syslog generated by that container. yarn.nodemanager.log-dirs ${yarn.log.dir}/userlogs Time in seconds to retain user logs. Only applicable if log aggregation is disabled yarn.nodemanager.log.retain-seconds 10800 Whether physical memory limits will be enforced for containers. yarn.nodemanager.pmem-check-enabled false Whether virtual memory limits will be enforced for containers. yarn.nodemanager.vmem-check-enabled false
yarn.scheduler.capacity.maximum-applications 10000 Maximum number of applications that can be pending and running. yarn.scheduler.capacity.maximum-am-resource-percent 0.1 Maximum percent of resources in the cluster which can be used to run application masters i.e. controls number of concurrent running applications. yarn.scheduler.capacity.resource-calculator org.apache.hadoop.yarn.util.resource.DominantResourceCalculator The ResourceCalculator implementation to be used to compare Resources in the scheduler. The default i.e. DefaultResourceCalculator only uses Memory while DominantResourceCalculator uses dominant-resource to compare multi-dimensional resources such as Memory, CPU etc. yarn.scheduler.capacity.root.queues default The queues at the this level (root is the root queue). yarn.scheduler.capacity.root.default.capacity 100 Default queue target capacity. yarn.scheduler.capacity.root.default.user-limit-factor 1 Default queue user limit a percentage from 0.0 to 1.0. yarn.scheduler.capacity.root.default.maximum-capacity 100 The maximum capacity of the default queue. yarn.scheduler.capacity.root.default.state RUNNING The state of the default queue. State can be one of RUNNING or STOPPED. yarn.scheduler.capacity.root.default.acl_submit_applications * The ACL of who can submit jobs to the default queue. yarn.scheduler.capacity.root.default.acl_administer_queue * The ACL of who can administer jobs on the default queue. yarn.scheduler.capacity.root.default.acl_application_max_priority * The ACL of who can submit applications with configured priority. For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}] yarn.scheduler.capacity.root.default.maximum-application-lifetime -1 Maximum lifetime of an application which is submitted to a queue in seconds. Any value less than or equal to zero will be considered as disabled. This will be a hard time limit for all applications in this queue. If positive value is configured then any application submitted to this queue will be killed after exceeds the configured lifetime. User can also specify lifetime per application basis in application submission context. But user lifetime will be overridden if it exceeds queue maximum lifetime. It is point-in-time configuration. Note : Configuring too low value will result in killing application sooner. This feature is applicable only for leaf queue. yarn.scheduler.capacity.root.default.default-application-lifetime -1 Default lifetime of an application which is submitted to a queue in seconds. Any value less than or equal to zero will be considered as disabled. If the user has not submitted application with lifetime value then this value will be taken. It is point-in-time configuration. Note : Default lifetime can't exceed maximum lifetime. This feature is applicable only for leaf queue. yarn.scheduler.capacity.node-locality-delay 40 Number of missed scheduling opportunities after which the CapacityScheduler attempts to schedule rack-local containers. When setting this parameter, the size of the cluster should be taken into account. We use 40 as the default value, which is approximately the number of nodes in one rack. yarn.scheduler.capacity.rack-locality-additional-delay -1 Number of additional missed scheduling opportunities over the node-locality-delay ones, after which the CapacityScheduler attempts to schedule off-switch containers, instead of rack-local ones. Example: with node-locality-delay=40 and rack-locality-delay=20, the scheduler will attempt rack-local assignments after 40 missed opportunities, and off-switch assignments after 40+20=60 missed opportunities. When setting this parameter, the size of the cluster should be taken into account. We use -1 as the default value, which disables this feature. In this case, the number of missed opportunities for assigning off-switch containers is calculated based on the number of containers and unique locations specified in the resource request, as well as the size of the cluster. yarn.scheduler.capacity.queue-mappings A list of mappings that will be used to assign jobs to queues The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]* Typically this list will be used to map users to queues, for example, u:%user:%user maps all users to queues with the same name as the user. yarn.scheduler.capacity.queue-mappings-override.enable false If a queue mapping is present, will it override the value specified by the user? This can be used by administrators to place jobs in queues that are different than the one specified by the user. The default is false. yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments 1 Controls the number of OFF_SWITCH assignments allowed during a node's heartbeat. Increasing this value can improve scheduling rate for OFF_SWITCH containers. Lower values reduce "clumping" of applications on particular nodes. The default is 1. Legal values are 1-MAX_INT. This config is refreshable. yarn.scheduler.capacity.workflow-priority-mappings A list of mappings that will be used to override application priority. The syntax for this list is [workflowId]:[full_queue_name]:[priority][,next mapping]* where an application submitted (or mapped to) queue "full_queue_name" and workflowId "workflowId" (as specified in application submission context) will be given priority "priority". yarn.scheduler.capacity.workflow-priority-mappings-override.enable false If a priority mapping is present, will it override the value specified by the user? This can be used by administrators to give applications a priority that is different than the one specified by the user. The default is false.
# Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # Set Hadoop-specific environment variables here. # The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. JAVA_HOME=/opt/bigdata/jdk1.8.0_144 export JAVA_HOME=${JAVA_HOME} # The jsvc implementation to use. Jsvc is required to run secure datanodes # that bind to privileged ports to provide authentication of data transfer # protocol. Jsvc is not required if SASL is configured for authentication of # data transfer protocol using non-privileged ports. #export JSVC_HOME=${JSVC_HOME} export HADOOP_PREFIX=/opt/bigdata/hadoop-2.10.0 export HADOOP_HOME=$HADOOP_PREFIX export HADOOP_YARN_HOME=/opt/bigdata/hadoop-2.10.0 HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"} # Extra Java CLASSPATH elements. Automatically insert capacity-scheduler. for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do if [ "$HADOOP_CLASSPATH" ]; then export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f else export HADOOP_CLASSPATH=$f fi done # The maximum amount of heap to use, in MB. Default is 1000. #export HADOOP_HEAPSIZE= #export HADOOP_NAMENODE_INIT_HEAPSIZE="" # Enable extra debugging of Hadoop's JAAS binding, used to set up # Kerberos security. # export HADOOP_JAAS_DEBUG=true # Extra Java runtime options. Empty by default. # For Kerberos debugging, an extended option set logs more invormation # export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug" export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true" # Command specific options appended to HADOOP_OPTS when specified export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS" export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS" export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS" export HADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS" export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS" # The following applies to multiple commands (fs, dfs, fsck, distcp etc) export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS" # set heap args when HADOOP_HEAPSIZE is empty if [ "$HADOOP_HEAPSIZE" = "" ]; then export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS" fi #HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS" # On secure datanodes, user to run the datanode as after dropping privileges. # This **MUST** be uncommented to enable secure HDFS if using privileged ports # to provide authentication of data transfer protocol. This **MUST NOT** be # defined if SASL is configured for authentication of data transfer protocol # using non-privileged ports. export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER} # Where log files are stored. $HADOOP_HOME/logs by default. #export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER export HADOOP_LOG_DIR=${HADOOP_HOME}/logs/hdfs # Where log files are stored in the secure data environment. #export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER} ### # HDFS Mover specific parameters ### # Specify the JVM options to be used when starting the HDFS Mover. # These options will be appended to the options specified as HADOOP_OPTS # and therefore may override any similar flags set in HADOOP_OPTS # # export HADOOP_MOVER_OPTS="" ### # Router-based HDFS Federation specific parameters # Specify the JVM options to be used when starting the RBF Routers. # These options will be appended to the options specified as HADOOP_OPTS # and therefore may override any similar flags set in HADOOP_OPTS # # export HADOOP_DFSROUTER_OPTS="" ### ### # Advanced Users Only! ### # The directory where pid files are stored. /tmp by default. # NOTE: this should be set to a directory that can only be written to by # the user that will run the hadoop daemons. Otherwise there is the # potential for a symlink attack. export HADOOP_PID_DIR=${HADOOP_PID_DIR} export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR} # A string representing this instance of hadoop. $USER by default. export HADOOP_IDENT_STRING=$USER
# Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. HADOOP_YARN_USER=test YARN_CONF_DIR=$HADOOP_YARN_HOME/etc/hadoop # User for YARN daemons export HADOOP_YARN_USER=${HADOOP_YARN_USER:-yarn} # resolve links -hadoop1 hadoop3 hadoop4may be a softlink export YARN_CONF_DIR="${YARN_CONF_DIR:-$HADOOP_YARN_HOME/conf}" # some Java parameters # export JAVA_HOME=/home/y/libexec/jdk1.6.0/ export JAVA_HOME=/opt/bigdata/jdk1.8.0_144 if [ "$JAVA_HOME" != "" ]; then #echo "run java in $JAVA_HOME" JAVA_HOME=$JAVA_HOME fi if [ "$JAVA_HOME" = "" ]; then echo "Error: JAVA_HOME is not set." exit 1 fi JAVA=$JAVA_HOME/bin/java JAVA_HEAP_MAX=-Xmx1000m # For setting YARN specific HEAP sizes please use this # Parameter and set appropriately # YARN_HEAPSIZE=1000 # check envvars which might override default args if [ "$YARN_HEAPSIZE" != "" ]; then JAVA_HEAP_MAX="-Xmx""$YARN_HEAPSIZE""m" fi # Resource Manager specific parameters # Specify the max Heapsize for the ResourceManager using a numerical value # in the scale of MB. For example, to specify an jvm option of -Xmx1000m, set # the value to 1000. # This value will be overridden by an Xmx setting specified in either YARN_OPTS # and/or YARN_RESOURCEMANAGER_OPTS. # If not specified, the default value will be picked from either YARN_HEAPMAX # or JAVA_HEAP_MAX with YARN_HEAPMAX as the preferred option of the two. #export YARN_RESOURCEMANAGER_HEAPSIZE=1000 # Specify the max Heapsize for the timeline server using a numerical value # in the scale of MB. For example, to specify an jvm option of -Xmx1000m, set # the value to 1000. # This value will be overridden by an Xmx setting specified in either YARN_OPTS # and/or YARN_TIMELINESERVER_OPTS. # If not specified, the default value will be picked from either YARN_HEAPMAX # or JAVA_HEAP_MAX with YARN_HEAPMAX as the preferred option of the two. #export YARN_TIMELINESERVER_HEAPSIZE=1000 # Specify the JVM options to be used when starting the ResourceManager. # These options will be appended to the options specified as YARN_OPTS # and therefore may override any similar flags set in YARN_OPTS #export YARN_RESOURCEMANAGER_OPTS= # Node Manager specific parameters # Specify the max Heapsize for the NodeManager using a numerical value # in the scale of MB. For example, to specify an jvm option of -Xmx1000m, set # the value to 1000. # This value will be overridden by an Xmx setting specified in either YARN_OPTS # and/or YARN_NODEMANAGER_OPTS. # If not specified, the default value will be picked from either YARN_HEAPMAX # or JAVA_HEAP_MAX with YARN_HEAPMAX as the preferred option of the two. #export YARN_NODEMANAGER_HEAPSIZE=1000 # Specify the JVM options to be used when starting the NodeManager. # These options will be appended to the options specified as YARN_OPTS # and therefore may override any similar flags set in YARN_OPTS #export YARN_NODEMANAGER_OPTS= # so that filenames w/ spaces are handled correctly in loops below IFS= # default log directory & file if [ "$YARN_LOG_DIR" = "" ]; then YARN_LOG_DIR="$HADOOP_YARN_HOME/logs/yarn" fi if [ "$YARN_LOGFILE" = "" ]; then YARN_LOGFILE='yarn.log' fi # default policy file for service-level authorization if [ "$YARN_POLICYFILE" = "" ]; then YARN_POLICYFILE="hadoop-policy.xml" fi # restore ordinary behaviour unset IFS YARN_OPTS="$YARN_OPTS -Dhadoop.log.dir=$YARN_LOG_DIR" YARN_OPTS="$YARN_OPTS -Dyarn.log.dir=$YARN_LOG_DIR" YARN_OPTS="$YARN_OPTS -Dhadoop.log.file=$YARN_LOGFILE" YARN_OPTS="$YARN_OPTS -Dyarn.log.file=$YARN_LOGFILE" YARN_OPTS="$YARN_OPTS -Dyarn.home.dir=$YARN_COMMON_HOME" YARN_OPTS="$YARN_OPTS -Dyarn.id.str=$YARN_IDENT_STRING" YARN_OPTS="$YARN_OPTS -Dhadoop.root.logger=${YARN_ROOT_LOGGER:-INFO,console}" YARN_OPTS="$YARN_OPTS -Dyarn.root.logger=${YARN_ROOT_LOGGER:-INFO,console}" if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then YARN_OPTS="$YARN_OPTS -Djava.library.path=$JAVA_LIBRARY_PATH" fi YARN_OPTS="$YARN_OPTS -Dyarn.policy.file=$YARN_POLICYFILE" ### # Router specific parameters ### # Specify the JVM options to be used when starting the Router. # These options will be appended to the options specified as HADOOP_OPTS # and therefore may override any similar flags set in HADOOP_OPTS # # See ResourceManager for some examples # #export YARN_ROUTER_OPTS=
#!/bin/bash # namenode_format() { cluster_name= echo "cluster_name=${cluster_name}, format..." $HADOOP_PREFIX/bin/hdfs namenode -format $cluster_name } start() { type= if [ "$type" = "namenode" ]; then $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode elif [ "$type" = "datanode" ]; then $HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start datanode elif [ "$type" = "resourcemanager" ]; then $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager elif [ "$type" = "nodemanager" ]; then $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager elif [ "$type" = "proxyserver" ]; then $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start proxyserver elif [ "$type" = "dfs" ]; then $HADOOP_PREFIX/sbin/start-dfs.sh elif [ "$type" = "yarn" ]; then $HADOOP_PREFIX/sbin/start-yarn.sh else echo "no supported: $type" fi } stop() { type= if [ "$type" = "namenode" ]; then $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode elif [ "$type" = "datanode" ]; then $HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode elif [ "$type" = "resourcemanager" ]; then $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager elif [ "$type" = "nodemanager" ]; then $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop nodemanager elif [ "$type" = "proxyserver" ]; then $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop proxyserver elif [ "$type" = "dfs" ]; then $HADOOP_PREFIX/sbin/stop-dfs.sh elif [ "$type" = "yarn" ]; then $HADOOP_PREFIX/sbin/stop-yarn.sh else echo "no supported: $type" fi } set_env() { export HADOOP_CLASSPATH=`hadoop classpath` } usage() { echo "usage: ./run.sh [cmd]" echo " ./run.sh namenode_format [cluster name]" echo "----------------------------------------------" echo " ./run.sh start [namenode | datanode]" echo " ./run.sh stop [namenode | datanode]" echo " ./run.sh start dfs" echo " ./run.sh stop dfs" echo "----------------------------------------------" echo " ./run.sh start [resourcemanager | nodemanager]" echo " ./run.sh stop [resourcemanager | nodemanager]" echo " ./run.sh start yarn" echo " ./run.sh stop yarn" echo " ./run.sh start proxyserver" echo " ./run.sh stop proxyserver" echo "----------------------------------------------" echo " ./run.sh set_env" } if [ "$#" -lt "1" ]; then usage elif [ "" = "namenode_format" ]; then namenode_format elif [ "" = "start" ]; then start elif [ "" = "stop" ]; then stop elif [ "" = "env" ]; then set_env else usage fi
