Hi everybody! This is my first post to this mailing-list.
I am setting up a simple 5 node Lustre storage system for use in my basement for
experimental purposes. I am using RHEL 6.6 and Lustre server 2.5.4. The backend is
comprised of a shared storage resource which is iSCSI. These iSCSI targets serve as the
OSTs and are allowed to mount on each of the OSS. At this time, I have a fully functional
Lustre configuration; four OSTs, four OSS nodes and one dedicated MDS/MGS.
Pacemaker is up and fully functional, too. That is, I have fencing configured with STONITH
and “pcs status” shows me everything is healthy. I can also simulate a failure and mount
the OST on the fail-node. No problems there. I just cant figure out the automatic
process.
While researching how to configure OSTs as a cluster resource, I have found many examples
that explain how to setup the “ldiskfs” for failover. However, (from what I can tell)
there seems to be a lacking document with as much information for the ZFS portion of it
all.
How I understand it:
When pacemaker detects an unstable node, fencing is performed on the offending node, while
at the same time the resource (lets say OST2) is moved to the “fail-node”. The fail-node
then mounts the OST2 and serves it.
The examples outlined here for use with the ocf:heartbeat:Filesystem seem straight
forward.
primitive resMyOST ocf:heartbeat:Filesystem \
meta target-role="stopped" \
operations $id="resMyOST-operations" \
op monitor interval="120" timeout="60" \
op start interval="0" timeout="300" \
op stop interval="0" timeout="300" \
params device="device" directory="directory" fstype="lustre”
… How does this work for ZFS + Lustre? From what I can tell, it will try to mount the OST
on the fail-node into /mnt/lustre/foreign/ost2 and /mnt/lustre/local/ost2 on the primary
OSS, so the static configuration for “directory” doesn’t seem like it will work here. This
resource example appears to be lacking some additional steps for the ZFS piece.
Does somebody have a working example from pacemaker about how to facilitate the ZFS OST as
a resource for the cluster? Documentation is just as good.
It seems overly complicated for what I need it to do:
“if pacemaker detects a problem with an OSS, shoot it in the head and perform ‘service
lustre start <failed_ost>’ on the standby node.”
Thanks everybody in advance for taking time out of your busy day to answer my questions.
Any suggestions are welcome.
Regards,
Brian