In another article, Step by step replacing disks in Solaris volume manager, I described how to properly repace a system disk in Solaris Volume Manager. But sometime, it may not work because of metadevice replica corruption.

Here is one case I met.

After I replaced the failed disk using the procedure, everything works but rsync always failed.

Here is the 'iostat -E' output

iostat-iE.out:

sd0       Soft Errors: 0 Hard Errors: 1 Transport Errors: 2
Vendor: ATA      Product: HITACHI H7210CA3 Revision: A3CB Device Id:
id1,sd@n5000cca39cd3aebe       <----<<< Matches metadevadm -u c1t0d0.
...

Metastat output:

     Size: 20498940 blocks (9.8 GB)
     Stripe 0:
         Device     Start Block  Dbase        State Reloc Hot Spare
         c1t0d0s4          0     No     Maintenance   Yes

Device Relocation Information:
Device   Reloc  Device ID
c1t0d0   Yes    id1,sd@n5000cca39ce2ff9e <---<< Was not updated for some
...

and all partitions on the disk has maintenance status

# grep c1t0d0 metastat.out
     Invoke: metareplace d70 c1t0d0s7 <new device>
         c1t0d0s7          0     No     Maintenance   Yes
     Invoke: metareplace d50 c1t0d0s5 <new device>
         c1t0d0s5          0     No     Maintenance   Yes
     Invoke: metareplace d30 c1t0d0s3 <new device>
         c1t0d0s3          0     No     Maintenance   Yes
     Invoke: metareplace d20 c1t0d0s1 <new device>
         c1t0d0s1          0     No     Maintenance   Yes
         c1t0d0s0          0     No       Resyncing   Yes
     Invoke: metareplace d40 c1t0d0s4 <new device>
         c1t0d0s4          0     No     Maintenance   Yes
c1t0d0   Yes    id1,sd@n5000cca39ce2ff9e

 

First fix attempt

1 - # metastat -p >> /etc/lvm/md.tab (this will be used by metainit later)

2 - # metadb -d c1t0d0s6
    # metadb (verify)

3 - # metadetach -f d70 d72
    # metaclear d72

    # metadetach -f d50 d52
    # metaclear d52

# metadetach -f d30 d32 # metaclear d32 # metadetach -f d20 d22 # metaclear d22 # metadetach -f d10 d12 # metaclear d12 # metadetach -f d40 d42 # metaclear d42 4 - # metastat | grep c1t0d0 (verify no artifacts for this device) 5 - # metadb -a -c 3 c1t0d0s6 # metadb (verify) 6 - # metainit d72 7 - # metastat | grep c1t0d0 This should show d72 and c1t0d0 in the Device Relocation Information section at the bottom of the output. Should look like: Device Relocation Information: Device Reloc Device ID c1t0d0 Yes id1,sd@n5000cca39cd3aebe <----<<< Correct/current Device ID c0t0d0 Yes id1,sd@n5000c50011357708 8 - # metattach d70 d72 9 - Continue creating and attaching: # metainit d52 # metattach d50 d52 # metainit d42 # metattach d40 d42 # metainit d32 # metattach d30 d32 # metainit d22 # metattach d20 d22 # metainit d12 # metattach d10 d12 10 - # while true; do metastat | grep done; sleep 30; done (monitor sync progress) When this stops echoing output (after reaching into 90+ % done) control-c and run: # metastat | more (to verify all metadevices and components are in the 'okay' state.

 

The attempt was supposed to work, however, got stuck at step #6,

#metainit d72
d72: Concat/Stripe is setup

Even after I ran

#  metadevadm -u c1t0d0
Updating Solaris Volume Manager device relocation information for c1t0d0

 Old device reloc information:

         id1,sd@n5000cca39cd3aebe

 New device reloc information:

         id1,sd@n5000cca39cd3aebe

 

The second fix  attempt, it worked, need reboot to clear the metadevice corruption

Action-Plan:
-----------------

1 - cd /etc

--

2 - cp vfstab vfstab.svm
      cp system system.svm

--

3 - Edit vfstab and change:

#device         device          mount           FS      fsck    mount   
mount
#to mount       to fsck         point           type    pass    at boot 
options
#
fd      -       /dev/fd fd      -       no      -
/proc   -       /proc   proc    -       no      -
/dev/md/dsk/d20 -       -       swap    -       no      -
/dev/md/dsk/d10 /dev/md/rdsk/d10        /       ufs     1       no      -
/dev/md/dsk/d30 /dev/md/rdsk/d30        /usr    ufs     1       no      -
/dev/md/dsk/d40 /dev/md/rdsk/d40        /var    ufs     1       no      -
/dev/md/dsk/d70 /dev/md/rdsk/d70        /home   ufs     2       yes     -
/dev/md/dsk/d50 /dev/md/rdsk/d50        /opt    ufs     2       yes     -
/devices        -       /devices        devfs   -       no      -
sharefs -       /etc/dfs/sharetab       sharefs -       no      -
ctfs    -       /system/contract        ctfs    -       no      -
objfs   -       /system/object  objfs   -       no      -
swap    -       /tmp            tmpfs   -       yes     size=512m
swap    -       /var/run        tmpfs   -       no      size=1024m

To:

#device         device          mount           FS      fsck    mount   
mount
#to mount       to fsck         point           type    pass    at boot 
options
#
fd      -       /dev/fd fd      -       no      -
/proc   -       /proc   proc    -       no      -
#/dev/md/dsk/d20 -       -       swap    -       no      -
/dev/dsk/c0t0d0s1    -    -    swap    -    no    -
#/dev/md/dsk/d10 /dev/md/rdsk/d10        /       ufs     1       no      -
/dev/dsk/c0t0d0s0    /dev/rdsk/c0t0d0s0    /    ufs    1    no    -
#/dev/md/dsk/d30 /dev/md/rdsk/d30        /usr    ufs     1       no      -
/dev/dsk/c0t0d0s3    /dev/rdsk/c0t0d0s3    /usr/    ufs    1    no    -
#/dev/md/dsk/d40 /dev/md/rdsk/d40        /var    ufs     1       no      -
/dev/dsk/c0t0d0s4    /dev/rdsk/c0t0d0s4    /var    ufs    1    no    -
#/dev/md/dsk/d70 /dev/md/rdsk/d70        /home   ufs     2       yes     -
/dev/dsk/c0t0d0s7    /dev/rdsk/c0t0d0s7    /home    ufs    2    yes    -
#/dev/md/dsk/d50 /dev/md/rdsk/d50        /opt    ufs     2       yes     -
/dev/dsk/c0t0d0s5    /dev/rdsk/c0t0d0s5    /opt    ufs    2    yes    -
/devices        -       /devices        devfs   -       no      -
sharefs -       /etc/dfs/sharetab       sharefs -       no      -
ctfs    -       /system/contract        ctfs    -       no      -
objfs   -       /system/object  objfs   -       no      -
swap    -       /tmp            tmpfs   -       yes     size=512m
swap    -       /var/run        tmpfs   -       no      size=1024m

2x check your work. No typo's. Fields are <tab> separated by convention.

--

4 - Edit system and remove these lines:

* Begin MDD root info (do not edit)
rootdev:/pseudo/md@0:0,10,blk
* End MDD root info (do not edit)

--

5 - # init 6

--

6 - # df -k (verify on physical slices)

      # swap -l (verify on c0t0d0s1)

--

7 - # metaclear -rf -a  (metaclear all devices)

       # metastat (verify)

--

8 - # metadb -d -f c0t0d0s6

      # metadb (verify no metadb's)

--

9 - # init 6

--

10 - # metadb (should return 'there are no existing databases')

--

11 - Edit /etc/lvm/md.tab (we had saved the configuration previously)

Change:

d70 -m d71 d72 1
d71 1 1 c0t0d0s7
d72 1 1 c1t0d0s7
d50 -m d51 d52 1
d51 1 1 c0t0d0s5
d52 1 1 c1t0d0s5
d30 -m d31 d32 1
d31 1 1 c0t0d0s3
d32 1 1 c1t0d0s3
d20 -m d21 d22 1
d21 1 1 c0t0d0s1
d22 1 1 c1t0d0s1
d10 -m d11 d12 1
d11 1 1 c0t0d0s0
d12 1 1 c1t0d0s0
d40 -m d41 d42 1
d41 1 1 c0t0d0s4
d42 1 1 c1t0d0s4

To:

d70 -m d71 1
d71 1 1 c0t0d0s7
d72 1 1 c1t0d0s7
d50 -m d51 1
d51 1 1 c0t0d0s5
d52 1 1 c1t0d0s5
d30 -m d31 1
d31 1 1 c0t0d0s3
d32 1 1 c1t0d0s3
d20 -m d21 1
d21 1 1 c0t0d0s1
d22 1 1 c1t0d0s1
d10 -m d11  1
d11 1 1 c0t0d0s0
d12 1 1 c1t0d0s0
d40 -m d41 1
d41 1 1 c0t0d0s4
d42 1 1 c1t0d0s4

Note that I removed the c1t0d0 submirrors from the -m (mirror) lines.

--

12 - # metadb -a -f -c 3 c0t0d0s6

         # metadb -a -c 3 c1t0d0s6

         # metadb (verify)

--

13 -#  metainit -f -a  (create all metadevices, c0t0d0 will be in 1-way 
mirrors)

        #  metastat | more (verify)

        If you are missing any metadevice for either c0t0d0 or c1t0d0 
(you shouldn't be) run:

        # metainit -f d<#> (where d<#> is the metadevice you need to create)

--

14 - cd /etc

        # cp vfstab vfstab.c0t0d0

        # cp system system.nosvm

        # cp vfstab.svm vfstab

        # cp system.svm system

--

15 - # init 6

--

  16 - # df -k (verify on metadevices)

          # swap -l (verify on d20)

--

17 - # metattach d70 d72

         # metattach d50 d52

         # metattach d30 d32

         # metattach d20 d22

         # metattach d10 d12

         # metattach d40 d42

--

18  - # while true; do metastat | grep done; sleep 30; done (monitor the 
sync progress)

         When this stops echoing output, control-c and run metastat 
manually to verify all metadevices and components
         are in the 'Okay' state.