RAID1 via software degradato —
Una mattina mi trovo nelle mail di sistema una cosa del genere:
This is an automatically generated mail message from mdadm
running on libeccio
A DegradedArray event had been detected on md device /dev/md/4.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sda1[0]
487104 blocks super 1.2 [2/1] [U_]
md5 : active raid1 sda7[0]
283077632 blocks super 1.2 [2/1] [U_]
bitmap: 2/3 pages [8KB], 65536KB chunk
md4 : active raid1 sda6[0]
4975616 blocks super 1.2 [2/1] [U_]
md2 : active raid1 sda3[0]
97590272 blocks super 1.2 [2/1] [U_]
md1 : active raid1 sda2[0] sdb2[1]
4390912 blocks super 1.2 [2/2] [UU]
md3 : active raid1 sda5[0]
97589248 blocks super 1.2 [2/1] [U_]
unused devices: <none>
Oh cacchio...
una ricerca in dmssg mi dice:
md: bind<sda2>
[ 1.778287] hub 2-10:1.0: 2 ports detected
[ 1.779816] md: raid1 personality registered for level 1
[ 1.780237] md/raid1:md1: active with 2 out of 2 mirrors
[ 1.780265] md1: detected capacity change from 0 to 4496293888
[ 1.790760] md: bind<sda3>
[ 1.792595] md/raid1:md2: active with 1 out of 2 mirrors
[ 1.792630] md2: detected capacity change from 0 to 99932438528
[ 1.829295] md: bind<sda6>
[ 1.830683] md: bind<sdb7>
[ 1.831317] md/raid1:md4: active with 1 out of 2 mirrors
[ 1.831348] md4: detected capacity change from 0 to 5095030784
[ 1.832241] md: bind<sdb1>
[ 1.858850] md: bind<sda7>
[ 1.859793] md: kicking non-fresh sdb7 from array!
[ 1.859797] md: unbind<sdb7>
[ 1.879169] md: export_rdev(sdb7)
[ 1.880522] md/raid1:md5: active with 1 out of 2 mirrors
[ 1.982202] md: bind<sda1>
[ 2.065159] md: kicking non-fresh sdb1 from array!
[ 2.065165] md: unbind<sdb1>
[ 2.073287] md: bind<sda5>
[ 2.073889] md: kicking non-fresh sdb5 from array!
[ 2.073893] md: unbind<sdb5>
[ 2.079430] md: export_rdev(sdb1)
[ 2.080728] md/raid1:md0: active with 1 out of 2 mirrors
[ 2.080755] md0: detected capacity change from 0 to 498794496
[ 2.084402] created bitmap (3 pages) for device md5
[ 2.084601] md5: bitmap initialized from disk: read 1 pages, set 3 of 4320 bits
[ 2.091447] md: export_rdev(sdb5)
bla bla bla
È la prima volta che mi succede e non so come comportarmi, smartctl mi dice che i dischi sono a posto quindi almeno da quel lato va bene.
Guardo brevemente in Internet e scopro che la cosa succede se per esempio la macchina va giù male per mancanza di corrente, e la soluzione
è relativamente semplice, cacci fuori dall'array le partizioni degradate e ce le rimetti che da solo si fa il rebuild.
Quindi procedo in questo modo:
root@libeccio:/var/log# /sbin/mdadm /dev/md0 --fail /dev/sdb1 --remove /dev/sdb1
mdadm: set device faulty failed for /dev/sdb1: No such device
root@libeccio:/var/log# /sbin/mdadm /dev/md0 --add /dev/sdb1
mdadm: added /dev/sdb1
e lo stesso per per gli altri md con relative partizioni.
Facendo un cat di /proc/mdstat mi trovo in questa situazione:
root@libeccio:/var/log# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sdb1[2] sda1[0]
487104 blocks super 1.2 [2/2] [UU]
md5 : active raid1 sdb7[2] sda7[0]
283077632 blocks super 1.2 [2/1] [U_]
[==>………………] recovery = 14.1% (40111104/283077632) finish=34.4min speed=117694K/sec
bitmap: 1/3 pages [4KB], 65536KB chunk
md4 : active raid1 sdb6[2] sda6[0]
4975616 blocks super 1.2 [2/1] [U_]
resync=DELAYED
md2 : active raid1 sdb3[2] sda3[0]
97590272 blocks super 1.2 [2/1] [U_]
resync=DELAYED
md1 : active raid1 sda2[0] sdb2[1]
4390912 blocks super 1.2 [2/2] [UU]
md3 : active raid1 sdb5[2] sda5[0]
97589248 blocks super 1.2 [2/1] [U_]
resync=DELAYED
unused devices: <none>
Quindi tutto sta tornando a posto, e anche per oggi ho imparato una cosa nuova.
Comments are disabled on this post