RAID1 via software degradato — 30 Novembre 2019

Una mattina mi trovo nelle mail di sistema una cosa del genere:

This is an automatically generated mail message from mdadm
running on libeccio

A DegradedArray event had been detected on md device /dev/md/4.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid1 sda1[0]
      487104 blocks super 1.2 [2/1] [U_]
      
md5 : active raid1 sda7[0]
      283077632 blocks super 1.2 [2/1] [U_]
      bitmap: 2/3 pages [8KB], 65536KB chunk

md4 : active raid1 sda6[0]
      4975616 blocks super 1.2 [2/1] [U_]
      
md2 : active raid1 sda3[0]
      97590272 blocks super 1.2 [2/1] [U_]
      
md1 : active raid1 sda2[0] sdb2[1]
      4390912 blocks super 1.2 [2/2] [UU]
      
md3 : active raid1 sda5[0]
      97589248 blocks super 1.2 [2/1] [U_]
      
unused devices: <none>

Oh cacchio...
una ricerca in dmssg mi dice:
 md: bind<sda2>
[    1.778287] hub 2-10:1.0: 2 ports detected
[    1.779816] md: raid1 personality registered for level 1
[    1.780237] md/raid1:md1: active with 2 out of 2 mirrors
[    1.780265] md1: detected capacity change from 0 to 4496293888
[    1.790760] md: bind<sda3>
[    1.792595] md/raid1:md2: active with 1 out of 2 mirrors
[    1.792630] md2: detected capacity change from 0 to 99932438528
[    1.829295] md: bind<sda6>
[    1.830683] md: bind<sdb7>
[    1.831317] md/raid1:md4: active with 1 out of 2 mirrors
[    1.831348] md4: detected capacity change from 0 to 5095030784
[    1.832241] md: bind<sdb1>
[    1.858850] md: bind<sda7>
[    1.859793] md: kicking non-fresh sdb7 from array!
[    1.859797] md: unbind<sdb7>
[    1.879169] md: export_rdev(sdb7)
[    1.880522] md/raid1:md5: active with 1 out of 2 mirrors
[    1.982202] md: bind<sda1>
[    2.065159] md: kicking non-fresh sdb1 from array!
[    2.065165] md: unbind<sdb1>
[    2.073287] md: bind<sda5>
[    2.073889] md: kicking non-fresh sdb5 from array!
[    2.073893] md: unbind<sdb5>
[    2.079430] md: export_rdev(sdb1)
[    2.080728] md/raid1:md0: active with 1 out of 2 mirrors
[    2.080755] md0: detected capacity change from 0 to 498794496
[    2.084402] created bitmap (3 pages) for device md5
[    2.084601] md5: bitmap initialized from disk: read 1 pages, set 3 of 4320 bits
[    2.091447] md: export_rdev(sdb5)
bla bla bla
È la prima volta che mi succede e non so come comportarmi, smartctl mi dice che i dischi sono a posto quindi almeno da quel lato va bene.
Guardo brevemente in Internet e scopro che la cosa succede se per esempio la macchina va giù male per mancanza di corrente, e la soluzione
è relativamente semplice, cacci fuori dall'array le partizioni degradate e ce le rimetti che da solo si fa il rebuild.
Quindi procedo in questo modo:
root@libeccio:/var/log# /sbin/mdadm  /dev/md0 --fail /dev/sdb1 --remove /dev/sdb1
mdadm: set device faulty failed for /dev/sdb1:  No such device
root@libeccio:/var/log# /sbin/mdadm  /dev/md0 --add /dev/sdb1
mdadm: added /dev/sdb1

e lo stesso per per gli altri md con relative partizioni.

Facendo un cat di /proc/mdstat mi trovo in questa situazione:

root@libeccio:/var/log# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sdb1[2] sda1[0]
487104 blocks super 1.2 [2/2] [UU]

md5 : active raid1 sdb7[2] sda7[0]
283077632 blocks super 1.2 [2/1] [U_]
[==>………………] recovery = 14.1% (40111104/283077632) finish=34.4min speed=117694K/sec
bitmap: 1/3 pages [4KB], 65536KB chunk

md4 : active raid1 sdb6[2] sda6[0]
4975616 blocks super 1.2 [2/1] [U_]
resync=DELAYED

md2 : active raid1 sdb3[2] sda3[0]
97590272 blocks super 1.2 [2/1] [U_]
resync=DELAYED

md1 : active raid1 sda2[0] sdb2[1]
4390912 blocks super 1.2 [2/2] [UU]

md3 : active raid1 sdb5[2] sda5[0]
97589248 blocks super 1.2 [2/1] [U_]
resync=DELAYED

unused devices: <none>

Quindi tutto sta tornando a posto, e anche per oggi ho imparato una cosa nuova.

Tagged with: DegradedArray event | md* | RAID! | rebuilding

Categorised as: Linux | Work

Comments are disabled on this post