Hello Brian,
You might want to shoot a look over at
http://zfsonlinux.org/lustre.html or mail the list
for zfsonlinux about this. Lustre has ZFS support integrated, and understands ZFS file
systems; pulling iSCSI out of the picture. So far as failover, that's a pretty
perilous request under terms of 'automated'. Generally speaking, you'll want a
human involved however with something as hard as STONITH in the mix, you'll be looking
at a forced import of the zpool. You'll want something like drbd in the back end for
disk mirroring; if that isn't there then you'd have to deal with a replication
delay on the pool for a zfs send/receive trick.
Many options here. I'd do a little more digging.
- Brian Menges
-----Original Message-----
From: HPDD-discuss [mailto:hpdd-discuss-bounces@lists.01.org] On Behalf Of Brian Musson
Sent: Wednesday, November 05, 2014 12:15 AM
To: hpdd-discuss(a)lists.01.org
Subject: [HPDD-discuss] ZFS + Lustre + Pacemaker question
Hi everybody! This is my first post to this mailing-list.
I am setting up a simple 5 node Lustre storage system for use in my basement for
experimental purposes. I am using RHEL 6.6 and Lustre server 2.5.4. The backend is
comprised of a shared storage resource which is iSCSI. These iSCSI targets serve as the
OSTs and are allowed to mount on each of the OSS. At this time, I have a fully functional
Lustre configuration; four OSTs, four OSS nodes and one dedicated MDS/MGS.
Pacemaker is up and fully functional, too. That is, I have fencing configured with STONITH
and “pcs status” shows me everything is healthy. I can also simulate a failure and mount
the OST on the fail-node. No problems there. I just cant figure out the automatic
process.
While researching how to configure OSTs as a cluster resource, I have found many examples
that explain how to setup the “ldiskfs” for failover. However, (from what I can tell)
there seems to be a lacking document with as much information for the ZFS portion of it
all.
How I understand it:
When pacemaker detects an unstable node, fencing is performed on the offending node, while
at the same time the resource (lets say OST2) is moved to the “fail-node”. The fail-node
then mounts the OST2 and serves it.
The examples outlined here for use with the ocf:heartbeat:Filesystem seem straight
forward.
primitive resMyOST ocf:heartbeat:Filesystem \
meta target-role="stopped" \
operations $id="resMyOST-operations" \
op monitor interval="120" timeout="60" \
op start interval="0" timeout="300" \
op stop interval="0" timeout="300" \
params device="device" directory="directory"
fstype="lustre”
… How does this work for ZFS + Lustre? From what I can tell, it will try to mount the OST
on the fail-node into /mnt/lustre/foreign/ost2 and /mnt/lustre/local/ost2 on the primary
OSS, so the static configuration for “directory” doesn’t seem like it will work here. This
resource example appears to be lacking some additional steps for the ZFS piece.
Does somebody have a working example from pacemaker about how to facilitate the ZFS OST as
a resource for the cluster? Documentation is just as good.
It seems overly complicated for what I need it to do:
“if pacemaker detects a problem with an OSS, shoot it in the head and perform ‘service
lustre start <failed_ost>’ on the standby node.”
Thanks everybody in advance for taking time out of your busy day to answer my questions.
Any suggestions are welcome.
Regards,
Brian
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss
________________________________
The information contained in this message, and any attachments, may contain confidential
and legally privileged material. It is solely for the use of the person or entity to which
it is addressed. Any review, retransmission, dissemination, or action taken in reliance
upon this information by persons or entities other than the intended recipient is
prohibited. If you receive this in error, please contact the sender and delete the
material from any computer.