Hi Aamir,
Let me see if I can address some of your question inline.
Malcolm.
-----Original Message-----
From: HPDD-discuss [mailto:hpdd-discuss-bounces@lists.01.org] On Behalf
Of Aamir Rashid
Sent: Thursday, July 03, 2014 12:12 AM
To: hpdd-discuss(a)lists.01.org
Subject: [HPDD-discuss] Lustre 2.5.1. HSM
Greetings All,
I have a few basic questions regarding the HSM framework in general and
regarding the "copytool" in particular:
1) If multiple "agents" are registered with the "coordinator", are
the
HSM requests sent to all "agents" serially or in parallel? What is the
overall state of a file if some agents report success and other agents
report failure?
The coordinator acts like a job scheduler, queuing and dispatching requests to HSM agents.
Each request is sent to one of the available agents and the requests work at the
granularity of an individual file. This means that individual files are _not_ split up and
sent in parallel across multiple agents.
2) Using
https://build.hpdd.intel.com/job/lustre-
manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#lustrehsm
as a reference, section 22.3 states "Only one copytool could be run by
agent node." If I'm backing up to Tape and NFS, I cannot run a "tape
copytool" and a "posix copytool" on the same agent node? Must I have 2
separate agent nodes, one for "tape copytool", and another for "posix
copytool"?
That is correct. Each agent runs one copytool, so if multiple copytools are required, they
must each run on their own machine.
3) Using
https://build.hpdd.intel.com/job/lustre-
manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#lustrehsm
as a reference, section 22.3.1 states "A Lustre file system supports an
unlimited number of copytool instances". Does it mean that I must have
unlimited agent nodes, each of which is running a single instance of
"copytool"?
Yes, if you require an unlimited number of copytools, one must match those instances with
an unlimited supply of servers. The software does not impose a limit on the number of
instances but other considerations may force a practical limit :).
4) When a copytool crashes, is there any mechanism within the HSM
framework to restart a copytool?
No. Generally, a configuration management system (CF-Engine, Puppet, Chef) could be
employed to monitor and restart services if they fail. Intel Manager for Lustre also
monitors copytool instances and restarts them on fail.
5) Is there any load-balancing capability with the HSM framework?
The Coordinator will load balance requests across the copytool instances.
6) Is there HA capability with-in the HSM framework?
Lustre servers can be integrated into an HA cluster framework such as Pacemaker +
Corosync. The coordinator runs as a service thread on the metadata server and so will
failover if/when the MDT fails over. To provide HA in the HSM Agents, install multiple
agent servers, each with their own instance.
7) Is there a way to cluster copytool instances to get better
throughput?
Yes. Just launch multiple copytools, one per server. Each instance will automatically
register with the coordinator. The coordinator will automatically schedule work to the
pool of registered instances, similar to how a job scheduler operates.
8) Is there a later version of the HSM manual than
https://build.hpdd.intel.com/job/lustre-
manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#lustrehsm
? If so, where would I find it?
I think that's the latest publically available documentation.
Thank you for your prompt reply.
Regards,
-aamir
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss