• Welcome to LiuJason's Blog!

[已解决]Failed to start The Proxmox VE cluster filesystem | /etc/pve无法访问 | 集群丢失节点

Linux笔记 Jason 5 months ago (02-23) 436 Views 0 Comments QR code of this page
文章目录[隐藏]

问题描述

Proxmox集群单节点丢失,能够ping通,但是ssh通过key连接时卡住,使用密码正常。
使用ssh -vvv查看发现在验证key的时候无任何反馈,表明故障节点存public key的地方出现了问题。

排查

Proxmox存储集群public key的地方在/etc/pve,尝试cd进入失败。
/etc/pve路径使用的是corosync进行同步,查看状态无异常,能看到5个pve节点:

[email protected] ~ # systemctl status corosync
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled
   Active: active (running) since Sun 2020-02-23 11:06:37 CET; 2min 55s ago
     Docs: man:corosync
           man:corosync.conf
           man:corosync_overview
 Main PID: 23596 (corosync)
    Tasks: 9 (limit: 4915)
   Memory: 128.8M
   CGroup: /system.slice/corosync.service
           └─23596 /usr/sbin/corosync -f

查看pve-cluster信息,发现cluster服务错误,集群丢失:

[email protected] ~ # systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
   Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enab
   Active: failed (Result: exit-code) since Sun 2020-02-23 11:09:52 CET; 4s ago
  Process: 3799 ExecStart=/usr/bin/pmxcfs (code=exited, status=255/EXCEPTION)

Feb 23 11:09:51 PVE-EU-2 systemd[1]: pve-cluster.service: Control process exited, code=e
Feb 23 11:09:51 PVE-EU-2 systemd[1]: pve-cluster.service: Failed with result 'exit-code'
Feb 23 11:09:51 PVE-EU-2 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Feb 23 11:09:52 PVE-EU-2 systemd[1]: pve-cluster.service: Service RestartSec=100ms expir
Feb 23 11:09:52 PVE-EU-2 systemd[1]: pve-cluster.service: Scheduled restart job, restart
Feb 23 11:09:52 PVE-EU-2 systemd[1]: Stopped The Proxmox VE cluster filesystem.
Feb 23 11:09:52 PVE-EU-2 systemd[1]: pve-cluster.service: Start request repeated too qui
Feb 23 11:09:52 PVE-EU-2 systemd[1]: pve-cluster.service: Failed with result 'exit-code'
Feb 23 11:09:52 PVE-EU-2 systemd[1]: Failed to start The Proxmox VE cluster filesystem.

发现问题:集群文件服务出错,也就是corosync还是有问题

解决方案

首先重启corosync,再重启pve-cluster,节点恢复

systemctl restart corosync
systemctl restart pve-cluster


This article is under CC BY-NC-SA 4.0 license.
Please quote the original link:https://www.liujason.com/article/565.html
Like (0)
发表我的评论
取消评论

表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址