Hadoop 日常运维操作说明
hdfs
生产环境hadoop为30台服务器组成的集群,统一安装配置,版本号为2.7.7
部署路径:/opt/hadoop
启动用户:hadoop
配置文件:
- /opt/hadoop/config/hdfs-site.xml
- /opt/hadoop/config/core-site.xml
hadoopy运行环境变量配置文件:
- hadoop-env.sh
- journalnode.env
- datanode.env
- namenode.env
hadoop系统服务配置文件:
- zkfc.service
- journalnode.service
- namenode.service
- datanode.service
存储快照文件snapshot的目录:/data/hadoop/data
运行日志输出目录:/data/hadoop/logs
Hadoop运行正常时会有下列端口
- 50010 HDFS datanode 服务端口,用于数据传输
- 50075 HDFS namenode http服务的端口
- 50020 HDFS namenode ipc服务的端口
- 50070 HDFS namenode http服务的端口,active namenode中启动
- 8020 HDFS namenode 接收Client连接的RPC端口,用于获取文件系统metadata信息。
[hadoop@hostname-2 ~]$ netstat -ln|egrep "(50010|50075|50475|50020|50070|50470|8020|8019)"
tcp 0 0 172.0.0.2:50070 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN
Hadoop官方参考文档
hadoop组件启动与停止命令
# 启动
sudo systemctl start namenode.service
sudo systemctl start datanode.service
sudo systemctl start journalnode.service
# 停止
sudo systemctl stop namenode.servicec
sudo systemctl stop datanode.servicec
sudo systemctl stop journalnode.service
# 查看启动状态
sudo systemctl status namenode.service
sudo systemctl status datanode.service
sudo systemctl status journalnode.service
# 开机时自动自动启动
sudo systemctl enable namenode.service
sudo systemctl enable datanode.service
sudo systemctl enable journalnode.service
查看hadoop组件运行状态参数
# 查看当前namenode节点
[hadoop@hostname-2 ~]$ hdfs getconf -namenodes
hostname-3 hostname-2
# 查看集群datanode节点配置文件
[hadoop@hostname-2 ~]$ hdfs getconf -includeFile
/opt/hadoop/config/slaves
# 查看datanode rpc端口
[hadoop@hostname-2 ~]$ hdfs getconf -nnRpcAddresses
hostname-3:9000
hostname-2:9000
hdfs getconf -confKey [key]
# dfsadmin
[hadoop@hostname-2 ~]$ hdfs dfsadmin -report -live
Configured Capacity: 422346469376 (393.34 GB)
Present Capacity: 317439557632 (295.64 GB)
DFS Remaining: 315510235136 (293.84 GB)
...
-------------------------------------------------
Live datanodes (3):
Name: 172.0.0.3:50010 (hostname-3)
Hostname: iZ8vbacq1jxnabyu7992d1Z
Decommission Status : Normal
...
Name: 172.0.0.1:50010 (hostname-1)
Hostname: iZ8vb2s7y1j8fqmqbmufz9Z
Decommission Status : Normal
...
Name: 172.0.0.2:50010 (iZ8vbacq1jxnabyu7992d2Z)
Hostname: iZ8vbacq1jxnabyu7992d2Z
Decommission Status : Normal
...
# haadmin 查看namenode主节点
[hadoop@hostname-2 ~]$ hdfs haadmin -getServiceState hostname-2
active
yarn
启动用户: hadoop
配置文件:
- /opt/hadoop/config/yarn-site.xml
环境变量文件:
- yarn.env
- zkfc.env
系统服务配置文件:
- yarn-nm.service
- yarn-rm.service
- zkfc.service
hadoop Yarn组件运行正常时会有下列端口
- 8030 YARN ResourceManager scheduler组件的IPC端口
- 8031 YARN ResourceManager RPC
- 8032 YARN ResourceManager RM的applications manager(ASM)端口
- 8033 YARN ResourceManager IPC
- 8088 YARN ResourceManager http服务端口
- 10020 YARN JobHistory Server IPC
- 18080 YARN JobHistory Server http服务端口
[hadoop@hostname-2 ~]$ netstat -ln|egrep "(8032|8030|8031|8033|8088)"
tcp 0 0 172.0.0.2:8088 0.0.0.0:* LISTEN
tcp 0 0 172.0.0.2:8030 0.0.0.0:* LISTEN
tcp 0 0 172.0.0.2:8031 0.0.0.0:* LISTEN
tcp 0 0 172.0.0.2:8032 0.0.0.0:* LISTEN
tcp 0 0 172.0.0.2:8033 0.0.0.0:* LISTEN
[hadoop@hostname-1 ~]$ netstat -ln|egrep "(10020|18080)"
tcp 0 0 0.0.0.0:18080 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:10020 0.0.0.0:* LISTEN
Yarn官方文档
yarn服务启动与停止
# 启动
sudo systemctl start yarn-rm.service
sudo systemctl start yarn-nm.service
# 停止
sudo systemctl stop yarn-rm.servicec
sudo systemctl stop yarn-nm.servicec
# 查看启动状态
sudo systemctl status yarn-rm.service
sudo systemctl status yarn-nm.service
# 开机时自动自动启动
sudo systemctl enable yarn-rm.service
sudo systemctl enable yarn-nm.service
yarn状态检查命令
[hadoop@hostname-2 ~]$ yarn node -list
Total Nodes:2
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
iZ8vbacq1jxnabyu7992d1Z:46719 RUNNING iZ8vbacq1jxnabyu7992d1Z:8042 0
iZ8vb2s7y1j8fqmqbmufz9Z:40138 RUNNING iZ8vb2s7y1j8fqmqbmufz9Z:8042 0
查看某节点状态
[hadoop@hostname-2 ~]$ yarn rmadmin -getServiceState hostname-2
active
请求服务执行运行状况检查。如果检查失败,RMAdmin工具将以非零退出代码退出。
[hadoop@hostname-2 ~]$ yarn rmadmin -checkHealth hostname-2 ; echo $?
0
zookeeper日常运维操作说明
生产环境zookeeper为三台服务器组成的集群,统一安装配置,版本号为3.4.14
启动用户:logmanager
部署路径:/opt/zookeeper
配置文件:/opt/zookeeper/conf/zoo.cfg
存储快照文件snapshot的目录:/data/zookeeper/data
事务日志输出目录:/data/zookeeper/logs
运行日志输出目录:/data/zookeeper/logs
zookeeper运行正常时会有3个端口,分别为2181,2888,3888。其中
- 2181为对外提供服务的端口,每个节点都会启动
- 2888为Leader和Follower交互的端口,这个端口仅再leader服务器中启动
- 3888为zookeeper组件Leader Election使用的端口,每个节点都会启动
[hadoop@hostname-3 ~]$ netstat -ln|egrep "(2181|2888|3888)"
tcp 0 0 0.0.0.0:2181 0.0.0.0:* LISTEN
tcp 0 0 172.0.0.3:2888 0.0.0.0:* LISTEN
tcp 0 0 172.0.0.3:3888 0.0.0.0:* LISTEN
zookeeper 启动与停止
# 启动
sudo systemctl start zookeeper.service
# 查看启动状态
systemctl status zookeeper.service
# 停止
sudo systemctl stop zookeeper.service
# 服务开机自启动
sudo systemctl enable zookeeper.service
查看zookeeper节点状态
方法1
[hadoop@hostname-1 ~]$ /opt/zookeeper/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Mode: leader
方法2
[hadoop@hostname-1 ~]$ echo stat | nc 127.0.0.1 2181
Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
Clients:
/127.0.0.1:60696[0](queued=0,recved=1,sent=0)
/172.0.0.2:53934[1](queued=0,recved=595720,sent=595742)
/172.0.0.3:42448[1](queued=0,recved=594837,sent=594837)
Latency min/avg/max: 0/0/137
Received: 1190603
Sent: 1190624
Connections: 3
Outstanding: 0
Zxid: 0x1240000e71a
Mode: follower
Node count: 229
测试是否启动了该Server,若回复imok表示已经启动。
[hadoop@hostname-1 ~]$ echo ruok | nc 127.0.0.1 2181
imok
kafka日常运维操作说明
生产环境kafka为三台服务器组成的集群,统一安装配置,版本号为2.11-1.10
启动用户:logmanager
部署路径:/opt/kafka
配置文件:/opt/kafka/config/server.properties
存储数据目录:/data/kafka/data
日志输出目录:/data/kafka/logs
elasticsearch运行正常时会涉及2个端口,分别为2181,9092。其中
- 2181为zookeeper提供服务的端口,kafka需要在zookeeper中存储broker和consumer信息。zookeeper记录了所有broker的存货状态,broker会想zookeeper发送心跳请求来上报自己的状态。
- 9092为kafka集群间通信地址
[hadoop@hostname-3 ~]$ netstat -ln|egrep "(2181|9092)"
tcp 0 0 0.0.0.0:2181 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:9092 0.0.0.0:* LISTEN
服务启动与停止
# 启动
sudo systemctl start kafka.service
# 查看状态
sudo systemctl status kafka.service
# 停止
sudo systemctl stop kafka.service
# 服务开机自启动
sudo sysctlctl enable kafka.service
查看当前kafka topic列表
[hadoop@hostname-1 opt]$ kafka-topics.sh -list --zookeeper 172.0.0.3:2181
EXECUTE_LOG_TOPIC
METRICBEAT_LOG
ROBOT_MAIN_PROCESS_EXECUTE_MESSAGE
__consumer_offsets
agent-status
flume-sink
logmanager-filebeat
logmanager-flume
logstash-filebeat
origin-biz-log
查看topic信息
[hadoop@hostname-1 opt]$ kafka-topics.sh --zookeeper 172.0.0.2:2181 --topic "agent-status" --describe
Topic: agent-status PartitionCount: 1 ReplicationFactor: 3 Configs:
Topic: agent-status Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
查看指定group信息
[hadoop@hostname-1 opt]$ ./kafka-consumer-groups.sh --new-consumer --bootstrap-server 192.168.52.131:9092 --group test2 --describe
查看版本
[hadoop@hostname-1 opt]$ find ./libs/ -name \*kafka_\* | head -1 | grep -o '\kafka [^\n]*'
查询集群描述
[hadoop@hostname-1 opt]$ bin/kafka-topics.sh --describe --zookeeper 127.0.0.1:2181
elasticsearch日常运维操作说明
生产环境elasticsearch为三台服务器组成的集群,统一安装配置,版本号为5.4.3
启动用户:logmanager
部署路径:/opt/elasticsearch
配置文件:/opt/elasticsearch/config/elasticsearch.yml
存储数据目录:/data/elasticsearch/data
日志输出目录:/data/elasticsearch/logs
elasticsearch运行正常时会有3个端口,分别为9200,9300。其中
- 9200为对外提供服务的端口,每个节点都会启动
- 9300为Leader和Follower交互的端口,这个端口仅再leader节点中启动
[hadoop@hostname-3 ~]$ netstat -ln|egrep "(9300|9200)"
tcp 0 0 0.0.0.0:9200 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:9300 0.0.0.0:* LISTEN
服务 启动与停止
# 启动
sudo systemctl start elasticsearch.service
# 查看启动状态
sudo systemctl status elasticsearch.service
# 停止
sudo systemctl stop elasticsearch.service
# 服务开机自启动
sudo systemctl enable elasticsearch.service
查看es版本信息
[user@hostname-1 ~]$ curl -XGET localhost:9200
{
"name" : "hostname-1",
"cluster_name" : "elastic-cyclone",
"cluster_uuid" : "-OfufJGMQfylFBm34d0SKg",
"version" : {
"number" : "5.4.3",
"build_hash" : "eed30a8",
"build_date" : "2017-06-22T00:34:03.743Z",
"build_snapshot" : false,
"lucene_version" : "6.5.1"
},
"tagline" : "You Know, for Search"
}
操作命令
调整副本数: `curl -XPUT http://localhost/yunxiaobai/_settings?pretty -d ‘{“settings”:{“index”:{“number_of_replicas”:”10″}}}’`
创建index:`curl -XPUT ‘localhost:9200/yunxiaobai?pretty’`
插入数据:`curl -XPUT ‘localhost:9200/yunxiaobai/external/1?pretty’ -d ‘ { “name”:”yunxiaobai” }’`
获取数据:`curl -XGET ‘localhost:9200/yunxiaobai/external/1?pretty’`
删除索引:`curl -XDELETE ‘localhost:9200/jiaozhenqing?pretty’`
屏蔽节点:`curl -XPUT 127.0.0.1:9200/_cluster/settings?pretty -d ‘{ “transient” :{“cluster.routing.allocation.exclude._ip” : “10.0.0.1”}}’`
删除模板:`curl -XDELETE http://127.0.0.1:9200/_template/metricbeat-6.2.4`
调整shard刷新时间:`curl -XPUT http://localhost:9200/metricbeat-6.2.4-2018.05.21/_settings?pretty -d ‘{“settings”:{“index”:{“refresh_interval”:”30s”} }}’`
提交模板配置文件:`curl -XPUT localhost:9200/_template/devops-logstore-template -d @devops-logstore.json`
查询模板: `curl -XGET localhost:9200/_template/devops-logstor-template`
查询线程池:http://localhost:9200/_cat/thread_pool/bulk?v&h=ip,name,active,rejected,completed
mysql日常运维操作说明
生产环境mysql为三台服务器组成的集群,统一安装配置,版本号为5.7
启动用户:mysql
部署路径:/usr/share/mysql
配置文件:/etc/my.cnf
存储数据目录:/var/lib/mysql/mysql
访问日志路径:/var/log/mysqld.log
二进制日志路径:/var/lib/mysql
Mysql运行正常时会有1个端口,为3306
- 3306为mysql对外提供服务的端口
[hadoop@hostname-3 ~]$ netstat -ln|egrep 3306
tcp6 0 0 :::3306 :::* LISTEN
服务启动与停止
# 启动
sudo systemctl start mysqld.service
# 查看状态
sudo systemctl status mysqld.service
# 停止
sudo systemctl stop mysqld.service
# 服务开机自启动
sudo systemctl enable mysqld.service
测试登录mysql并查看数据库信息
mysql -u 用户名 -p
# 输入密码
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
| test |
+--------------------+
spark日常运维操作说明
生产环境spark为为sprak on yarn的形式部署,spark中运行的任务由yarn调度,spark historyjob服务需要独立配置,historyjob启动是工作在18080端口。启动时spark版本号为1.7.0
启动用户:hadoop
部署路径:/opt/spark
配置文件:/opt/spark/conf/spark-conf.properties
存储数据目录:/dat/spark/data
访问日志路径:/data/spark/logs
Spark运行正常时会有2个端口,为18080和8088
- 18080为spark history server对外提供服务的端口,可用于查看历史任务记录
# 启动
systemctl start spark-history.service
# 开机自启动
systemctl enable spark-history.service
# 停止
systemctl stop spark-history.service
# 查看状态
systemctl status spark-history.service
flume日常运维操作说明
生产环境flume为多个节点独立运行,在需要运行的服务器上部署,独立安装配置,版本号为1.7.0
启动用户:logmanager
部署路径:/opt/flume
配置文件:/opt/flume/conf/flume-conf.properties
存储数据目录:/dat/flume/data
访问日志路径:/data/flume/logs
flume运行正常时会有1个端口,为4541
- 4541为flume对外提供服务的端口
[hadoop@hostname-2 ~]$ netstat -ln|egrep 4541
tcp 0 0 172.0.0.2:4541 0.0.0.0:* LISTEN
服务启动和停止
# 启动
sudo systemctl start flume.service
# 查看运行状态
sudo systemctl status flume.service
# 停止
sudo systemctl stop flume.service
# 服务开机自启动
sudo cyctemctl enable flume.service
查看master节点端口
[hadoop@hostname-2 ~]$ sudo netstat -lntp |grep 4541
tcp 0 0 172.0.0.2:4541 0.0.0.0:* LISTEN 7774/java
查看队列内消息
查看队列内消息可安装kafka-tools,dump部分topic数据查看内容。
logstash日常运维操作说明
生产环境logstash为三台服务器组成的集群,统一安装配置,版本号为2.4.1
启动用户:logmanager
部署路径:/opt/logstash
配置文件:/opt/logstash/conf/logstash.yml
存储数据目录:/dat/logstash/data
访问日志路径:/data/logstash/logs
logstash运行正常时会有2个端口,为5044和9600
- 5044为logstash对外提供服务的端口,用于接收数据
- 9600端口用于获取logstash基本信息
[hadoop@hostname-2 ~]$ netstat -ln|egrep "(9600|5044)"
tcp 0 0 0.0.0.0:5044 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:9600 0.0.0.0:* LISTEN
服务启动和停止
# 启动
sudo systemctl start logstash.service
# 查看运行状态
sudo systemctl status logstash.service
# 停止
sudo systemctl stop logstash.service
# 服务开机自启动
sudo systemctl enable logstash.service
9600端口用于获取logstash基本信息
[hadoop@hostname-2 ~]$ curl -XGET 'localhost:9600/?pretty'
{
"host" : "iZ8vbacq1jxnabyu7992d2Z",
"version" : "7.7.0",
"http_address" : "127.0.0.1:9600",
"id" : "c9662897-7c12-4eb3-a92c-772da4536730",
"name" : "logmanager",
"ephemeral_id" : "99c86cbf-182a-46c5-9cc9-05f5bd13075b",
"status" : "green",
"snapshot" : false,
"pipeline" : {
"workers" : 8,
"batch_size" : 125,
"batch_delay" : 50
},
"build_date" : "2020-05-12T04:34:14+00:00",
"build_sha" : "d8ed01157be10d78e9910f1fb21b137c5d25529e",
"build_snapshot" : false
}
Tomcat日常运维操作说明
生产环境tomcat为单节点,可通过负载均衡实现集群,版本号为8.5.60
启动用户:logmanager
部署路径:/opt/tomcat
配置文件:/opt/tomcat/conf/server.xml
存储数据目录:/dat/tomcat/data
访问日志路径:/data/tomcat/logs
logstash运行正常时会有2个端口,为8009和8080或8761
- 8009为tomcat控制台端口
- 8080为tomcat提供web服务端口
- 8761为spring注册中心euraka服务端口
[hadoop@hostname-1 conf]$ netstat -ln|egrep "(8009|8080)"
tcp 0 0 0.0.0.0:8009 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN
服务启动和停止
# 启动
sudo systemctl start tomcat.service
# 查看运行状态
sudo systemctl status tomcat.service
# 停止
sudo systemctl stop tomcat.service
# 服务开机自启动
sudo systemctl enable tomcat.service
- 4541为flume对外提供服务的端口
- 18080为spark history server对外提供服务的端口,可用于查看历史任务记录
- 3306为mysql对外提供服务的端口
- /opt/hadoop/config/yarn-site.xml
猜你喜欢
网友评论
- 搜索
- 最新文章
- 热门文章