Posted by: isaraffee | August 1, 2009

Exploring RAID in OpenSUSE 11.1 Part 3

Querying RAID

To view the status of all RAID arrays on your system, type:

neptune:~ # cat /proc/mdstat

Personalities : [raid1]

md0 : active raid1 sda2[0] sda3[1]

4192896 blocks [2/2] [UU]

unused devices: <none

The personalities line tells you what RAID levels the kernel supports. In this example, you see 1 array, md0 and it is active. You can also find out the size of the RAID. 2/2 means two of two devices are in use, and UU means two up devices.

To get the more detailed information for each array, type:

neptune:~ # mdadm –detail /dev/md0

/dev/md0:

Version : 0.90

Creation Time : Tue Apr 14 09:49:06 2009

Raid Level : raid1

Array Size : 4192896 (4.00 GiB 4.29 GB)

Used Dev Size : 4192896 (4.00 GiB 4.29 GB)

Raid Devices : 2

Total Devices : 2

Preferred Minor : 0

Persistence : Superblock is persistent

Update Time : Wed Apr 22 14:40:24 2009

State : clean

Active Devices : 2

Working Devices : 2

Failed Devices : 0

Spare Devices : 0

UUID : 29452c4e:71deeb27:d2c12575:8954d9e1

Events : 0.8

Number Major Minor RaidDevice State

0 8 2 0 active sync /dev/sda2

1 8 3 1 active sync /dev/sda3

You can also use wildcards like:

neptune:~ # mdadm –examine /dev/sda*

mdadm: No md superblock detected on /dev/sda.

mdadm: No md superblock detected on /dev/sda1.

/dev/sda2:

Magic : a92b4efc

Version : 0.90.00

UUID : 29452c4e:71deeb27:d2c12575:8954d9e1

Creation Time : Tue Apr 14 09:49:06 2009

Raid Level : raid1

Used Dev Size : 4192896 (4.00 GiB 4.29 GB)

Array Size : 4192896 (4.00 GiB 4.29 GB)

Raid Devices : 2

Total Devices : 2

Preferred Minor : 0

Update Time : Fri Apr 24 10:02:25 2009

State : clean

Active Devices : 2

Working Devices : 2

Failed Devices : 0

Spare Devices : 0

Checksum : 347a6a34 – correct

Events : 8

Number Major Minor RaidDevice State

this 0 8 2 0 active sync /dev/sda2

0 0 8 2 0 active sync /dev/sda2

1 1 8 3 1 active sync /dev/sda3

/dev/sda3:

Magic : a92b4efc

Version : 0.90.00

UUID : 29452c4e:71deeb27:d2c12575:8954d9e1

Creation Time : Tue Apr 14 09:49:06 2009

Raid Level : raid1

Used Dev Size : 4192896 (4.00 GiB 4.29 GB)

Array Size : 4192896 (4.00 GiB 4.29 GB)

Raid Devices : 2

Total Devices : 2

Preferred Minor : 0

Update Time : Fri Apr 24 10:02:25 2009

State : clean

Active Devices : 2

Working Devices : 2

Failed Devices : 0

Spare Devices : 0

Checksum : 347a6a37 – correct

Events : 8

Number Major Minor RaidDevice State

this 1 8 3 1 active sync /dev/sda3

0 0 8 2 0 active sync /dev/sda2

1 1 8 3 1 active sync /dev/sda3

mdadm: No md superblock detected on /dev/sda4.

mdadm: No md superblock detected on /dev/sda5.

mdadm: No md superblock detected on /dev/sda6.

mdadm: No md superblock detected on /dev/sda7

Monitoring RAID

You can configure mdadm to sent you email an active disk fails or when It detects a degaraded array. Degraded means a new array that has not yet been populated with all of its disks, or an array with a failed disk:

But first, you must make sure that the mdadmd (mdadm’s daemon) is runnung. Simply type:

neptune:/etc/init.d # ps -ef|grep mdadmd

root 5658 5531 0 10:07 pts/1 00:00:00 grep mdadmd

This shows that the mdadmd daemon is not running. TO run it, type:

neptune:~ # cd /etc/init.d/

neptune:/etc/init.d # ./mdadmd start

Starting mdadmd done

Now let’s type the command which will monitor your RAID and will notify you via email if there is problem with your RAID.

neptune:~ # mdadm –monitor –scan –mail=root@linux.local –delay=60 /dev/md0

Removing Software RAID Configuration

To remove a device from an array you must fail the device:

neptune:~ # mdadm /dev/md0 –fail /dev/sda3

mdadm: set /dev/sda3 faulty in /dev/md0

You can then check your mail if the system alert you of the RAID failure

neptune:~ # mailx

Heirloom mailx version 12.2 01/07/07. Type ? for help.

“/var/spool/mail/root”: 3 messages 3 new

>N 1 root@linux.local Fri Apr 24 10:31 29/845 Fail event on /dev/md0:neptune

N 2 root@linux.local Fri Apr 24 10:31 29/851 Fail event on /dev/md0:neptune

N 3 root@linux.local Fri Apr 24 10:31 29/851 Fail event on /dev/md0:neptune

Yes, the email notification works.

The contents of the email is recorded as follow:

From root@linux.local Fri Apr 24 10:31:01 2009

X-Original-To: root@linux.local

Delivered-To: root@linux.local

From: mdadm monitoring <root@linux.local>

To: root@linux.local

Subject: Fail event on /dev/md0:neptune

Date: Fri, 24 Apr 2009 10:31:01 +0800 (SGT)

This is an automatically generated mail message from mdadm

running on neptune

A Fail event had been detected on md device /dev/md0.

It could be related to component device /dev/sda3.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1]

md0 : active raid1 sda2[0] sda3[1](F)

4192896 blocks [2/1] [U_]

unused devices: <none>

The email informed that /dev/sda3 has failed.

Note

To wipe out everything out and start over, you have to zero out the superblock on each device or it will continue to think it belongs to a RAID array.

neptune:~ # mdadm –zero-superblock /dev/sda3

You can still access the mounted RAID on /mnt and read the file contents.

neptune:~ # cd /mnt

neptune:/mnt # ls

lost+found test.txt

neptune:/mnt # cat test.txt

Howdy, RAID πŸ˜‰

I am back from ICT

Now let’s edit the file test.txt and see if we can restore the file content after we fixed the disk problem.

The test.txt now reads as follow:

neptune:/mnt # cat test.txt

Howdy, RAID πŸ˜‰

I am back from ICT

I am now testing failure of one of the disk and writing to RAID

Now unmount the /mnt before we stop the RAID.

neptune:~ # umount /mnt

Stopping the RAID.

neptune:~ # mdadm –stop /dev/md0

mdadm: stopped /dev/md0

Next to read one of the disk partition of the RAID:

neptune:~ # mount -t ext3 -o ro /dev/sda2 /mnt

Next, open the file test.txt to see if the newly appended line is included.

neptune:~ # cat /mnt/test.txt

Howdy, RAID πŸ˜‰

I am back from ICT

I am now testing failure of one of the disk and writing to RAID.

Yes, it is added. Now let’s unmount /mnt and then mount /dev/sda3 to see if the file is updated even after we mark the partition as fail.

Firstly, start the RAID:

neptune:~ # mdadm -A /dev/md0

mdadm: /dev/md/0 has been started with 1 drive (out of 2).

Check the status of the RAID:

md0 : active raid1 sda2[0]

4192896 blocks [2/1] [U_]

unused devices: <none>

It shows that the RAID has only one disk.

Next add the disk, /dev/sda3

neptune:~ # mdadm /dev/md0 –add /dev/sda3

mdadm: re-added /dev/sda3

This will take some time to rebuild, just when you create a new array.

To see the RAID being rebuilt, type:

neptune:~ # cat /proc/mdstat

Personalities : [raid1]

md0 : active raid1 sda3[2] sda2[0]

4192896 blocks [2/1] [U_]

[==>………………] recovery = 14.9% (625984/4192896) finish=6.4min speed=9180K/sec

unused devices: <none>

neptune:~ # cat /proc/mdstat

Personalities : [raid1]

md0 : active raid1 sda3[2] sda2[0]

4192896 blocks [2/1] [U_]

[================>….] recovery = 83.8% (3517568/4192896) finish=1.0min speed=10740K/sec

So the RAID was finally rebuilt.

neptune:~ # cat /proc/mdstat

Personalities : [raid1]

md0 : active raid1 sda3[1] sda2[0]

4192896 blocks [2/2] [UU]

Now let’s check the disk partitions /dev/sda2 and /dev/sda3 and see if the test.txt file is update.

Let’s just check on the /dev/sda3 partitions

neptune:~ # umount /mnt

neptune:~ # mount -t ext3 -o ro /dev/sda3 /mnt

neptune:~ # cat /mnt/test.txt

Howdy, RAID πŸ˜‰

I am back from ICT

From the output above, we can conclude that the failed disk will not have its file test.txt updated.

So now let’s restore the disk partition and see if the file in /dev/sda3 will be updated.

You have to stop the RAID.

neptune:~ # mdadm –stop /dev/md0

mdadm: stopped /dev/md0

neptune:~ # mount -t ext3 -o ro /dev/sda3 /mnt

neptune:~ # cat /mnt/test.txt

Howdy, RAID πŸ˜‰

I am back from ICT

I am now testing failure of one of the disk and writing to RAID.

Yes, the contents of the file are updated after the disk (/dev/sda3) was rebuilt.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: