Shimming a disk into a VxVM disk group

We had a problem with an Oracle database on a four node Veritas Cluster Server (VCS) cluster recently. The symptom of our problem was that the database had been brought down with VCS and just wouldn’t come back up again. The problem was ultimately traced down to the Veritas Volume Manager (VxVM) disk group level.

Initial analysis

The original diagnosis was that a SAN disk (LUN) had been accidentally unmapped from the host and put back hurriedly, resulting in it disappearing from the host. Volume migrations on our IBM XIV arrays have caused things like this in the past, and regular SCSI bus scans don’t result in the LUN coming back in.

So, we thought we’d take an outage, reboot and see if the LUN would come back in. It’d been up for a couple of years and did have a lot of SAN storage presented to it (subject to relatively frequent change). Maybe the host’s SCSI subsystem was being funky. We could patch while we were at it, too.

One reboot later and still no dice — the missing LUN was nowhere to be found. I had a closer look at the disk group’s configuration. Our missing LUN was only used in one of the VxVM volumes. It was a 256GB sub-disk tacked neatly onto the end of a healthy 1.4TB sub-disk.

There was only one data plex, so no mirroring in place. There’d be no way to recover any data that was on this disk if the disk group was deemed to be borked.

So, to the immediate question: “Where is the disk?”

If we could find it, reattaching it and recovering the volume would be a simple enough task. Through sheer luck, I found a text file containing a list of all LUNs configured visible from the host from a couple of months back. Searching for the DA name `xiv1_1234` threw up the LUN’s human-friendly name (as configured on the array) along with its size and serial number.

Next stop was the array. I queried the array for a LUN of the same name. Nothing. Same size? Nothing. Serial number? Nothing.

There were only a few possibilities at this point:

The LUN has been renamed and unmapped from the host.
The LUN has been deleted.

At this stage, I turned to our storage guys. I gave them all of the details I had gathered. Turns out that the LUN had been deleted erroneously as part of a decommission of a completely different database some months earlier! And as it was a development database, there were no full backups to speak of.

However, I did think it was strange that our database hadn’t noticed that one of its LUNs had disappeared! It had been running merrily for the past four months, oblivious. The cluster software had shut down it cleanly and deported its disks without a problem. It was only when trying to bring its storage back in again did we encounter our problem.

I speculated that since the database had been running fine before it was shut down, it must not have had any of its data located on the deleted LUN. If the database had tried to issue any read or write operations to the deleted LUN, there’d have been I/O errors everywhere.

VxVM usually writes sequentially to its volumes. Our deleted LUN had been right at the end of our impacted volume. A colleague's `df` output increased my confidence in this, although it was a pretty close call!

Given all of this information, I wondered — what if i grabbed a blank LUN of the same size and tried to fool VxVM into importing the disk group by using it as a shim? Worth a shot, right? So, I had the storage guys present a new LUN and take a snapshot of the surviving disk (in case i royally borked stuff up) and got to work.

You’ve gotta destroy to rebuild…

Because our volume had only one plex, it would be impossible for VxVM to rebuild any data that it thought was on our failed sub-disk. Therefore, standard disk recovery commands such as `vxrecover` wouldn’t help us.

However, Symantec describe a method of destroying and recreating a disk group in place for the purposes of changing its unique disk group ID (DGID). We could use this method to recreate our disk group, too! Of course, we’d be making a few adjustments…

VxVM stores a lot of metadata to describe the layout of its disk groups and volumes. For example, the sub-disk configuration contains the path to the block device of the multipath device that it’s stored on, which Veritas Dynamic Multipather (VxDMP) then distributes I/O to the storage paths. The metadata completely represents these data structures — if you have all of the metadata, it’s possible to recreate a disk group that’s been destroyed, as long as nothing’s damaged them.

Because I was able to partially import the disk group, I was able to dump the configuration of that disk group (we had one good LUN, remember?).

# vxprint -g datadb_DEV -hmvps >> /var/tmp/datadb_DEV.layout

Even if I wasn’t able to do this, there’d be a good chance that `vxbackupd` would have stored some backup configuration copies from some point in `/etc/vx/cbr/bk`.

My troublesome volume was made up of two sub-disks. What a lucky situation — since I had one good sub-disk and one bad sub-disk, I was able to compare the metadata information and figure out what I had to change to make my bad sub-disk look good again…

Great! I know what I need to add, and I know what I need to change to make a good sub-disk. I just needed a few more pieces of information, first.

One thing that I didn’t have available to me already was the `dev` string for my newly assigned shim disk. I also needed the unique `guid` for the disk, too. These were easy to find, but I needed to temporarily configure the disk to find it out:

I also needed what VxVM believed to be the correct associations between Disk Access (DA) or Dynamic Multipath (DMP) names — the multipath block device — and DM names — a name assigned by humans to each disk.

Finally, I needed the public and private region offsets for each LUN in the disk group. These are required so that when we reinitialise our disks, we can tell VxVM where its disk header (private region) is, and where the data starts (the public region). Without these exact values, we run the risk of overwriting areas that used to contain data with new headers, instantly corrupting the disk. For good.

These values were simple to get from the healthy LUN (`vxdisk list`). However, for the failed LUN, I had to go through a bunch of old configuration copy backups located in `/etc/vx/cbr/bk` and hope that I’d find config for my failed disk. It was my lucky day, again.

With my information in hand, I edited the following fields within my `/var/tmp/datadb_DEV.layout` file to match my newly assigned LUN. Here are the fields that I changed or added to my file:

I now had everything I needed to destroy and recreate my diskgroup.

Put it all together and what do you get?

Ensure the disk group is deported
Reinitialise all disks with the correct public and private region offsets
Use our updated metadata to recreate our disk group
Activate our VxVM volumes and `fsck` them
Mount our filesystems
???
Profit!

After this, the troublesome volume was marked as ENABLED. The filesystem checks returned okay. Everything mounted successfully. I asked a couple of our database guys to have a look and try to open the database instance.

The database opened! So, the shim worked! VxVM had no idea that the disk had been swapped out from under it!

I have to concede that in this situation, I was extremely lucky. What were the chances of an unused (but configured) disk being deleted from an array? And what were the chances of the database not trying to write to that disk for a number of months afterwards? Most amazingly, what were the chances of being able pulling off some brazen disk butchery without a hitch?

Time for a beer!