[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LVM



I'm chiming in very late here, but I think I have something to add (esp. answer to Steve Hillman's question 2 weeks ago).

I've got my original /opt/zimbra on a 1TB LV, because I assumed, as many of you did, that I would find the online expansion features useful. That has never proven to be true over the last 18 months. Linux LVM cannot take advantage of online expansion of SAN LUNs because unlike most commercial UNIXes and Windows, currently supported versions of the Linux kernel+device-mapper are unable to rescan and use expanded LUNs on the fly (/sys/block/sdX/device/rescan fails if anything is holding the device open). You either need to pvmove the data to a larger LUN, which can be done live but kills I/O, or append a second LUN to the volume group, which increases the risk of user error and means you can no longer get a consistent snapshot across the whole volume (at least in my SAN, where snapshots are serialized and non-atomic).

Multiple volumes, each of which are raw ext3 over (statically named) multipath devices, are the way to go, and that's how I've created all my other /opt/zimbra/storeX volumes.

LVM doesn't hurt performance, but can add substantial overhead on the sysadmin's brain, especially in time of crisis. It isn't supported by Zimbra's cluster scripts, though that's easy to fix. I ended up deciding that clustering does more harm than good anyway.

John Fulton on Oct 16:

> does anyone have multiple primary and secondary mail stores on the same server?

Yes. It works great. Actually, I only have primary stores -- our Compellent SAN provides HSM at the block level, so we have no need of that at the application layer.

Steve Hillman on Oct 16:

> here's the catch: Only one can be "current" - i.e all new mail will be
> delivered to whichever volume is the 'active' one. Now that I think about
> it though, I don't know whether it's possible to move the 'active'
> primary back to a previous volume. I had been operating under the 
> assumption that you couldn't - i.e. you fill up vol1, so you add vol2
> and it becomes current, but as users delete old mail, or it migrates to
> HSM, vol1 would gradually gain space again. If you can switch back to
> making this the current one, then you can reuse that space.

Yes, you can. I've got 3 primary message stores that I use on a round-robin basis as they fill up and drain. One message store is on the same LUN/LVM as /opt/zimbra, which allows me to take a consistent snapshot by marking the unitary /opt/zimbra/store as active. The rest are on separate LUNs.

William Stearns on Oct 17:

> My best understanding is that LVM contributes essentially no overhead at all.

There can be a difference in sequential reads due to read-ahead. A two-disk /dev/mdX can be as much as twice as fast as an LV on the same /dev/mdX. Not a consideration on a SAN, though.

Tom Golson on Oct 17:

> If I've just missed the boat, and someone knows a flag to pass LVM to
> recognized physical drives by the /dev/disk-by-id/ entry

Native multipath should take care of that for you, unless you have a badly broken getuid_callout. Works for me. You might have something odd going on with multipath timeouts or priorities. When you run multipath -ll, do you see stale/dead device nodes, or do the old devices disappear entirely?

If your SAN (minus RDAC) has all paths simultaneously active without SCSI-level or EMC-style proprietary locking, as ours does, then you ought to be able to use our configuration.

Yes, LVM holds on to certain device information, which causes problems when enabling multipath for an LVM originally created as single-pathed. Not sure how/why, but vgexport/vgimport will fix it. Do this at your next downtime.

service multipathd stop
vgchange -an VG
vgexport VG
multipath -ll
For each component device X, 
  echo 1 > /sys/block/sdX/device/delete
Rescan:
for i in `ls /sys/class/scsi_host/host*/scan | grep -v host0`; do echo '- - -' > $i; done
multipath -v2
vgscan
vgimport VG
vgchange -ay VG
service multipathd start
mount

blacklist {
        # Blacklist vendors of internal SCSI/SAS/SATA drives
        device {
                vendor "FUJITSU"
        }
        device {
                vendor "SEAGATE"
        }
}
defaults {
        # Normally, we should use explicit names in multipath.conf. Make
        # sure that any exceptions stand out.
        user_friendly_names no
        polling_interval 10
        no_path_retry 300
        failback 2
        path_grouping_policy group_by_prio
        rr_weight priorities
        # Custom script to tell whether the device is on a FC or iSCSI path and
        # give higher priority to the former
        prio_callout    "/usr/local/sbin/prio_compellent.pl /dev/%n"
}
multipaths {
        multipath {
                wwid                    36000d3100008f20000000000000001d4
                alias                   zcs5-store1
        }
}
-- 
Rich Graves http://claimid.com/rcgraves
Carleton.edu Sr UNIX and Security Admin
CMC135: 507-222-7079 Cell: 952-292-6529