rabbitmq 报错:
2023-11-07 16:38:52.682 [error] emulator Error in process <0.368.0> on node 'rabbit@rabbitmq-0.rabbitmq-discovery.openstack.svc.cluster.local' with exit value: {shutdown,[{mnesia_loader,handle_exit,2,[{file,"mnesia_loader.erl"},{line,963}]},{mnesia_loader,tab_receiver,5,[{file,"mnesia_loader.erl"},{line,440}]},{mnesia_loader,spawned_receiver,8,[{file,"mnesia_loader.erl"},{line,343}]}]} 2023-11-07 16:38:52.683 [error] emulator Error in process <0.367.0> on node 'rabbit@rabbitmq-0.rabbitmq-discovery.openstack.svc.cluster.local' with exit value: {badarg,[{ets,insert,[mnesia_gvar,{last_error,{{shutdown,[{mnesia_loader,handle_exit,2,[{file,"mnesia_loader.erl"},{line,963}]},{mnesia_loader,tab_receiver,5,[{file,"mnesia_loader.erl"},{line,440}]},{mnesia_loader,spawned_receiver,8,[{file,"mnesia_loader.erl"},{line,343}]}]},[{mnesia_loader,wait_on_load_complete,1,[{file,"mnesia_loader.erl"},{line,359}]},{mnesia_tm,apply_fun,3,[{file,"mnesia_tm.erl"},{line,840}]},{mnesia_tm,execute_transaction,5,[{file,"mnesia_tm.erl"},{line,816}]},{mnesia_loader,init_receiver,5,[{file,"mnesia_loader.erl"},{line,285}]},{mnesia_loader,do_get_network_copy,5,[{file,"mnesia_loader.erl"},{line,221}]},{mnesia_controller,'-load_table_fun/1-fun-4-',5,[{file,"mnesia_controller.erl"},{line,2186}]},{mnesia_controller,'-load_and_reply/2-fun-0-',2,[{file,"mnesia_controller.erl"},{line,2133}]}]}}],[]},{mnesia_lib,set,2,[{file,"mnesia_lib.erl"},{line,443}]},{mnesia_lib,fix_error,1,[{file,"mnesia_lib.erl"},{line,906}]},{mnesia_tm,return_abort,3,[{file,"mnesia_tm.erl"},{line,962}]},{mnesia_loader,init_receiver,5,[{file,"mnesia_loader.erl"},{line,285}]},{mnesia_loader,do_get_network_copy,5,[{file,"mnesia_loader.erl"},{line,221}]},{mnesia_controller,'-load_table_fun/1-fun-4-',5,[{file,"mnesia_controller.erl"},{line,2186}]},{mnesia_controller,'-load_and_reply/2-fun-0-',2,[{file,"mnesia_controller.erl"},{line,2133}]}]} 2023-11-07 16:38:52.685 [info] <0.43.0> Application mnesia exited with reason: stopped 2023-11-07 16:38:52.685 [info] <0.43.0> Application tools exited with reason: stopped 2023-11-07 16:38:52.685 [error] <0.8.0> Error description: init:do_boot/3 init:start_em/1 rabbit:start_it/1 line 465 rabbit:broker_start/1 line 341 rabbit:start_loaded_apps/2 line 586 app_utils:manage_applications/6 line 126 lists:foldl/3 line 1263 rabbit:'-handle_app_error/1-fun-0-'/3 line 709 throw:{could_not_start,ra, {ra, {{shutdown, {failed_to_start_child,ra_system_sup, {shutdown, {failed_to_start_child,ra_log_sup, {shutdown, {failed_to_start_child,ra_log_wal_sup, {shutdown, {failed_to_start_child,ra_log_wal, {{case_clause,{ok,<<>>}}, [{ra_log_wal,open_existing,1, [{file,"src/ra_log_wal.erl"},{line,556}]}, {ra_log_wal,'-recover_wal/2-lc$^0/1-0-',1, [{file,"src/ra_log_wal.erl"},{line,240}]}, {ra_log_wal,recover_wal,2, [{file,"src/ra_log_wal.erl"},{line,243}]}, {ra_log_wal,init,1, [{file,"src/ra_log_wal.erl"},{line,186}]}, {gen_batch_server,init_it,6, [{file,"src/gen_batch_server.erl"},{line,125}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,249}]}]}}}}}}}}}, {ra_app,start,[normal,[]]}}}} Log file(s) (may contain more information):BOOT FAILED =========== Error description: init:do_boot/3 init:start_em/1 rabbit:start_it/1 line 465 rabbit:broker_start/1 line 341 rabbit:start_loaded_apps/2 line 586 app_utils:manage_applications/6 line 126 lists:foldl/3 line 1263 rabbit:'-handle_app_error/1-fun-0-'/3 line 709 throw:{could_not_start,ra, {ra, {{shutdown, {failed_to_start_child,ra_system_sup, {shutdown, {failed_to_start_child,ra_log_sup, {shutdown, {failed_to_start_child,ra_log_wal_sup, {shutdown, {failed_to_start_child,ra_log_wal, {{case_clause,{ok,<<>>}}, [{ra_log_wal,open_existing,1, [{file,"src/ra_log_wal.erl"},{line,556}]}, {ra_log_wal,'-recover_wal/2-lc$^0/1-0-',1, [{file,"src/ra_log_wal.erl"},{line,240}]}, {ra_log_wal,recover_wal,2, [{file,"src/ra_log_wal.erl"},{line,243}]}, {ra_log_wal,init,1, [{file,"src/ra_log_wal.erl"},{line,186}]}, {gen_batch_server,init_it,6, [{file,"src/gen_batch_server.erl"},{line,125}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,249}]}]}}}}}}}}}, {ra_app,start,[normal,[]]}}}} Log file(s) (may contain more information): {"init terminating in do_boot",{could_not_start,ra,{ra,{{shutdown,{failed_to_start_child,ra_system_sup,{shutdown,{failed_to_start_child,ra_log_sup,{shutdown,{failed_to_start_child,ra_log_wal_sup,{shutdown,{failed_to_start_child,ra_log_wal,{{case_clause,{ok,<<>>}},[{ra_log_wal,open_existing,1,[{file,"src/ra_log_wal.erl"},{line,556}]},{ra_log_wal,'-recover_wal/2-lc$^0/1-0-',1,[{file,"src/ra_log_wal.erl"},{line,240}]},{ra_log_wal,recover_wal,2,[{file,"src/ra_log_wal.erl"},{line,243}]},{ra_log_wal,init,1,[{file,"src/ra_log_wal.erl"},{line,186}]},{gen_batch_server,init_it,6,[{file,"src/gen_batch_server.erl"},{line,125}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}}}}}}}}},{ra_app,start,[normal,[]]}}}}} init terminating in do_boot ({could_not_start,ra,{ra,{{shutdown,{_}},{ra_app,start,[_]}}}}) Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
修复方法:
(1) 找到 rabbitmq 使用的 pv,例如: rabbitmq-0 的 pod:
# kubectl get pv | grep rabbitmq-0 pvc-70ed48bf-bef8-4658-b530-1fd3a6ef5937 200Gi RWO Delete Bound openstack/rabbitmq-data-rabbitmq-0 ceph-ssd 6d17h
(2) 找到 pv 使用的信息:
# kubectl get pv pvc-70ed48bf-bef8-4658-b530-1fd3a6ef5937 -o yaml apiVersion: v1 kind: PersistentVolume metadata: annotations: kubernetes.io/createdby: rbd-dynamic-provisioner pv.kubernetes.io/bound-by-controller: "yes" pv.kubernetes.io/provisioned-by: kubernetes.io/rbd creationTimestamp: "2023-10-31T15:40:59Z" finalizers: - kubernetes.io/pv-protection name: pvc-70ed48bf-bef8-4658-b530-1fd3a6ef5937 resourceVersion: "7552" uid: 6848417a-dd4f-430c-85e5-f3234a1ac6bf spec: accessModes: - ReadWriteOnce capacity: storage: 200Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: rabbitmq-data-rabbitmq-0 namespace: openstack resourceVersion: "4704" uid: 70ed48bf-bef8-4658-b530-1fd3a6ef5937 persistentVolumeReclaimPolicy: Delete rbd: image: kubernetes-dynamic-pvc-c8a3585f-dc7b-438c-a22e-cca9d84c341f keyring: /etc/ceph/keyring monitors: - ceph-mon.ceph.svc.cluster.local:6789 pool: ssdpool secretRef: name: pvc-ceph-client-key user: admin storageClassName: ceph-ssd volumeMode: Filesystem status: phase: Bound
需要的信息:
image: kubernetes-dynamic-pvc-c8a3585f-dc7b-438c-a22e-cca9d84c341f
(3) 在 pod 节点上查看对应的物理设备
# ssh node-2 rbd showmapped | grep kubernetes-dynamic-pvc-c8a3585f-dc7b-438c-a22e-cca9d84c341f 0 ssdpool kubernetes-dynamic-pvc-c8a3585f-dc7b-438c-a22e-cca9d84c341f - /dev/rbd0
(4) 查看设备挂载目录
# ssh node-2 mount | grep rbd0 /dev/rbd0 on /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/ssdpool-image-kubernetes-dynamic-pvc-c8a3585f-dc7b-438c-a22e-cca9d84c341f type ext4 (rw,relatime,stripe=1024) /dev/rbd0 on /var/lib/kubelet/pods/3a37e264-4fd5-4cb8-844b-6b6cd4a6859c/volumes/kubernetes.io~rbd/pvc-70ed48bf-bef8-4658-b530-1fd3a6ef5937 type ext4 (rw,relatime,stripe=1024)
(5) 查找 wal 文件路径,查找的路径来自步骤 (4)
# ssh node-2 find /var/lib/kubelet/pods/3a37e264-4fd5-4cb8-844b-6b6cd4a6859c/volumes/kubernetes.io~rbd/pvc-70ed48bf-bef8-4658-b530-1fd3a6ef5937 -name "*.wal" /var/lib/kubelet/pods/3a37e264-4fd5-4cb8-844b-6b6cd4a6859c/volumes/kubernetes.io~rbd/pvc-70ed48bf-bef8-4658-b530-1fd3a6ef5937/mnesia/rabbit@rabbitmq-0.rabbitmq-discovery.openstack.svc.cluster.local/quorum/rabbit@rabbitmq-0.rabbitmq-discovery.openstack.svc.cluster.local/00000025.wal
(6) 删除 wal 文件
此步骤请慎重操作,建议将文件备份后再操作。
# ssh node-2 rm -rf /var/lib/kubelet/pods/3a37e264-4fd5-4cb8-844b-6b6cd4a6859c/volumes/kubernetes.io~rbd/pvc-70ed48bf-bef8-4658-b530-1fd3a6ef5937/mnesia/rabbit@rabbitmq-0.rabbitmq-discovery.openstack.svc.cluster.local/quorum/rabbit@rabbitmq-0.rabbitmq-discovery.openstack.svc.cluster.local/00000025.wal Warning: Permanently added 'node-2' (ED25519) to the list of known hosts.
(7) 删除 pod,重新启动 pod
# kubectl delete pods rabbitmq-0 -n openstack pod "rabbitmq-0" deleted
等待 pod 再次启动,过一会重新数据同步恢复。
猜你喜欢
网友评论
- 搜索
- 最新文章
- 热门文章