• Welcome to LiuJason's Blog!

MegaCli64使用热备盘替换故障硬盘实操

Linux笔记 Jason 4 years ago (2019-11-22) 1413 Views 0 Comments

我们这台故障机器是12盘位的戴尔EMC,10盘组raid10+1盘热备,安装MegaCli64看下面这个链接:
Proxmox(Debian)安装MegaCli64管理硬件Raid阵列卡

强烈建议看看这个:MegaCli操作手册

安装完后首先查看阵列状态:

root@JS-2002:~/megacli/Linux# MegaCli64 -LDInfo -Lall -aALL
                                     
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 9.093 TB
Sector Size         : 512
Mirror Data         : 9.093 TB
State               : Degraded
Strip Size          : 64 KB
Number Of Drives per span:2
Span Depth          : 5
Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: Yes
LD has drives that support T10 power conditions: Yes
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No

Exit Code: 0x00
root@JS-2002:~/megacli/Linux# MegaCli64 -pdinfo -physdrv[:3] -a0
                                     
Enclosure Device ID: N/A
Slot Number: 3
Drive's position: DiskGroup: 0, Span: 1, Arm: 1
Enclosure position: N/A
Device Id: 3
WWN: 5000C500260EACC4
Sequence Number: 2
Media Error Count: 0
Other Error Count: 5
Predictive Failure Count: 3
Last Predictive Failure Event Seq Number: 30255
PD Type: SAS

Raw Size: 1.819 TB [0xe8e088b0 Sectors]
Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors]
Coerced Size: 1.818 TB [0xe8d00000 Sectors]
Sector Size:  0
Firmware state: Online, Spun Up
Device Firmware Level: 0008
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c500260eacc5
SAS Address(1): 0x0
Connected Port Number: 0(path0) 
Inquiry Data: SEAGATE ST32000444SS    00089WM3PSCZ            
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :29C (84.20 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : Yes

Exit Code: 0x00

然后设置这个磁盘下线,同时标记missing:

root@JS-2002:~/megacli/Linux# MegaCli64 -PDOffline -PhysDrv [:3] -a0
Adapter: 0: EnclId-N/A SlotId-3 state changed to OffLine.
Exit Code: 0x00

root@JS-2002:~/megacli/Linux# MegaCli64 -pdmarkmissing -physdrv[:3] -aAll    
EnclId-N/A SlotId-3 is marked Missing.
Exit Code: 0x00

标记这个硬盘准备移除:

root@JS-2002:~/megacli/Linux# MegaCli64 -pdprprmv -physdrv[:3] -a0
Prepare for removal Success
Exit Code: 0x00

这时候再看阵列的状态, 是Degraded:

root@JS-2002:~/megacli/Linux# MegaCli64 -LDInfo -Lall -aALL

Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 9.093 TB
Sector Size         : 512
Mirror Data         : 9.093 TB
State               : Degraded
Strip Size          : 64 KB
Number Of Drives per span:2
Span Depth          : 5
Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: Yes
LD has drives that support T10 power conditions: Yes
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No

Exit Code: 0x00

然后将"热备"盘顶上,之前没有添加热备,只是插上了而已,这里最重要的是确定Array和row的参数是啥,找了好久....
实际上Raid10是将多组raid1的磁盘组成raid0阵列,所以在我们这里10盘的Raid10实际分成了5组Raid0。也就是这里面Array后面的参数。而row就是这每个raid1小组里面的0或者1,这样以来就好理解了,只要磁盘的Span号即可:

Enclosure Device ID: N/A
Slot Number: 3
Drive's position: DiskGroup: 0, Span: 1, Arm: 1
Enclosure position: N/A
Device Id: 3

是Array1,row1,于是:

root@JS-2002:~/megacli/Linux# MegaCli64 -PdReplaceMissing -PhysDrv[:10] -Array1 -row1 -a0
Adapter: 0: Failed to replace Missing PD at Array 1, Row 1.
FW error description: 
  The specified device is in a state that doesn't support the requested command.  
Exit Code: 0x32

替换失败了,是因为这个盘作为一个普通non-raid盘存在,所以我们直接把这块盘拔掉,然后插到3号盘的位置,神奇的开始rebuild了:

Coerced Size: 1.818 TB [0xe8d00000 Sectors]
Sector Size:  0
Firmware state: Rebuild
Device Firmware Level: HPD7

搞定!
https://paste.ubuntu.com/p/dVXG3qvnGF/

注意:本段内容须“登录”后方可查看!


This article is under CC BY-NC-SA 4.0 license.
Please quote the original link:https://www.liujason.com/article/391.html
Like (0)
发表我的评论
取消评论

表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址