Petaflop: RAID: creation and monitoring hard drive and raid health

Create raid array

Create raid 6 array with 4 disks

$ mdadm --create --verbose /dev/md0 --level=6 --raid-devices=4 /dev/sd[b-e]

Save your raid configuration

mdadm --detail --scan >> /etc/mdadm/mdadm.conf

in /etc/mdadm/mdadm.conf rename
ARRAY /dev/md0 metadata=1.2 name=ion:1 UUID=aa1f85b0:a2391657:cfd38029:772c560e
to:
ARRAY /dev/md0 UUID=aa1f85b0:a2391657:cfd38029:772c560e

and recreate the initrd for the kernel and include the configuration files relevant for the MD-RAID configuration

$ sudo update-initramfs -u

Create filesystem

To create our file system for best performance we need to calculate our stride and stripe sizes.

Stride size: divide the array chunk size by the file system block size.

We find the Array chunk size by looking at /proc/mdstat or using mdadm

$ cat /proc/mdstat
... 512k chunk ...

$ mdadm --detail /dev/md{...}

...

Chunk Size : 512K

...

A block size of 4k offers best performance for ext4.

Therefore, in the above example, stride size is 512 / 4 = 128

Stripe size: multiply the stripe by the number of data disks.

In raid-6, 2 disks are used for parity, and in raid-5, 1 disk is used for parity.

In my example I have 4 disks and am using raid-6, therefore I have 2 data disks.

Therefore, I have a stripe size of 128 * 2 = 256.

Create the file system:

$ mkfs.ext4 -b 4096 -E stride=128,stripe-width=256 /dev/md0

Mount filesystem

$ sudo mkdir /mnt/raid

$ sudo chmod 1777 /mnt/raid

$ sudo mount -o noauto,rw,async -t ext4 /dev/md0 /mnt/raid

Make it permanent - add to /etc/fstab

/dev/md0 /mnt/raid ext4 defaults 1 2

Export via NFS

$ sudo apt-get install nfs-kernel-server

Mount the raid drive in the exported tree

$ sudo mkdir /export

$ sudo mkdir /export/raid

$ sudo mount --bind /mnt/raid /export/raid

Make it permanent - add to /etc/fstab

/mnt/raid /export/raid none bind 0 0

Configure /etc/exports

/export 192.168.1.0/24(rw,fsid=0,insecure,no_subtree_check,async)
/export/raid 192.168.1.0/24(rw,nohide,insecure,no_subtree_check,async)

Start the service

$ sudo service nfs-kernel-server restart

Monitoring drive health

smartmontools: monitor S.M.A.R.T. ( (Self-Monitoring, Analysis and Reporting Technology) attributes and run hard drive self-tests.

enable SMART support if it's not already on
$ for i in `ls -1 /dev/sd[a-z]`; do smartctl -s on $i; done

turn on offline data collection
$ for i in `ls -1 /dev/sd[a-z]`; do smartctl -o on $i; done

enable autosave of device vendor-specific attributes

$ for i in `ls -1 /dev/sd[a-z]`; do smartctl -S on $i; done

check the overall health for each drive

$ for i in `ls -1 /dev/sd[a-z]`; do RESULTS=`smartctl -H $i | grep result | cut -f6 -d' '`; echo $i: $RESULTS; done

If any drive doesn't show PASSED, immediately backup all your data as that drive is probably about to fail.

Configure smartd to automatically check drives
$ vim /etc/smartd.conf
DEVICESCAN -H -m root -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q

DEVICESCAN means scan for all devices
-H means monitor SMART health status
-m root means mail to root
-M exec /usr/libexec/smartmontools/smartdnotify means run the smartdnotify script to email warnings

-n standby,10,q means don't take the disk out of standby except if it's been in standby 10 times in a row, and don't report the unsuccessful attempts.

pick up changes
service smartd restart

Other options for smartd.conf:

-a \ # Implies all standard testing and reporting.
-n standby,10,q \ # Don't spin up disk if it is currently spun down
\ # unless it is 10th attempt in a row.
\ # Don't report unsuccessful attempts anyway.
-o on \ # Automatic offline tests (usually every 4 hours).
-S on \ # Attribute autosave (I don't really understand
\ # what it is for. If you can explain it to me
\ # please drop me a line.
-R 194 \ # Show real temperature in the logs.
-R 231 \ # The same as above.
-I 194 \ # Ignore temperature attribute changes
-W 3,50,50 \ # Notify if the temperature changes 3 degrees
\ # comparing to the last check or if
\ # the temperature exceeds 50 degrees.
-s (S/../.././02|L/../../1/22) \ # short test: every day 2-3am
\ # long test every Monday 10pm-2am
\ # (Long test takes a lot of time
\ # and it should be finished before
\ # daily short test starts.
\ # At 3am every day this disk will be
\ # used heavily as backup storage)
-m root \ # To whom we should send mails.
-M exec /usr/libexec/smartmontools/smartdnotify

Note: this will email root - if you don't monitor root's mails, then you may want to redirect mails sent to root to another email address

vim /etc/aliases

root: user@gmail.com

Now we want to monitor the RAID array.

add the to/from email addresses to mdadm config

$ vim /etc/mdadm.conf
MAILADDR user@gmail.com
MAILFROM user+mdadm@gmail.com

I tested this was working by running /sbin/mdadm --monitor --scan --test

gmail automatically marked the test mail as spam, so I had to create a filter to explicitly not mark emails sent from user+mdadm@gmail.com as spam (note the +mdadm part of the email address, neat gmail trick)

replace a failed drive

find the drive's serial no
$ hdparm -i /dev/sdd | grep SerialNo
Model=WDC WD2003FZEX-00Z4SA0, FwRev=01.01A01, SerialNo=WD-WMC130D78F55

fail and remove the drive from the array

$ sudo mdadm --manage /dev/md0 --fail /dev/sdd
$ sudo mdadm --manage /dev/md0 --remove /dev/sdd

remove the old drive (using the above serial no to ensure you remove the correct drive), add the new one and add it to the array
$ sudo mdadm --manage /dev/md0 --add /dev/sdd

the array should start rebuilding
$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid6 sdd[4] sde[3] sdb[1] sdc[0]
3906765824 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/3] [UU_U]
[>....................] recovery = 0.5% (11702016/1953382912) finish=970.5min speed=33344K/sec

unused devices: <none>

Tools for monitoring:
logwatch – monitors my /var/log/messages for anything out of the ordinary and mails me the output on a daily basis.
mdadm – mdadm will mail me if a disk has completely failed or the raid for some other reason fails. A complete resync is done every week.
smartd – I have smartd running “short” tests every night and long tests every second week. Reports are mailed to me.
munin – graphical and historical monitoring of performance and all stats of the server.

Petaflop

Wednesday 28 May 2014

RAID: creation and monitoring hard drive and raid health

No comments:

Post a Comment

Labels

About Me

Blog Archive