Hi Aurelien,
Thank you for your response. Yes, I did as you stated below, but my
results are the same - all requests go to the first registered agent.
1) Is there any write-up that talks about HSM load-balancing? I have not
come across any documentation, going as far back as 2009. Furthermore,
load-balancing is a great feature to have, and products that have it, do
flaunt it. So, I find it unusual that no documentation talks about this
feature.
2) What is the load-balancing algorithm? Is it simply round-robin
between agents, or is it something more clever?
3) When multiple agents register for the *same* HSM back-end, must they
provide an explicit --archive-id, or should they let it default to ANY
(as shown below in my output from MDS)? BTW, I've tried it both ways
(with/without --archive-id and my results are the same)
4) Are there any throttles that control load-balancing? Something like
after 1G of data on agent-1, switch to agent-2 for the next 1G, and
flip-flop on 1G data transactions? Since my file sizes are small - 100M
each for testing - maybe I'm not hitting a threshold to kick start
load-balancing?
Here's some runtime data, that may help you figure out, if there's a
pilot error on my behalf. I issue two archive requests, one from each of
the 2 clients that I have, one after another. As the MDS results show
below, all requests are going to the first registered agent, even when
the first agent is busy and the second request comes in.
MDS
===
[root@hsm-mds1 lustre]# cat /proc/fs/lustre/mdt/lustrefs-MDT0000/hsm/agents
uuid=61f6bf53-2e22-d37b-45cd-fea234025701 archive_id=ANY
requests=[current:0 ok:23 errors:0]
uuid=516a502f-60d7-fa4e-09cf-ffe1ec43db3a archive_id=ANY
requests=[current:0 ok:0 errors:0]
Agent-1
=====
[root@hsm-client2 lustre_hsm]# /root/lhsmtool_posix --daemon --no-attr
--no-xattr --hsm-root /mnt/nfs/lustre_hsm1 --dry-run /mnt/lustre_hsm
[root@hsm-client gg]# lfs hsm_archive /mnt/lustre_hsm/gg/aa.5[1-9]
Agent-2
=====
[root@hsm-client lustre_hsm]# /root/lhsmtool_posix --daemon --no-attr
--no-xattr --hsm-root /mnt/nfs/lustre_hsm1 --dry-run /mnt/lustre_hsm
[root@hsm-client2 lustre_hsm]# lfs hsm_archive /mnt/lustre_hsm/gg/aa.6[1-9]
Thanks,
-aamir
On 07/08/2014 03:24 AM, DEGREMONT Aurelien wrote:
Le 07/07/2014 20:06, Aamir Rashid a écrit :
>
> Whenever I make a request to archive, the request goes to the *first*
> registered one. As you can see from above, all 11 requests went to
> the first one - thus no load balancing between the 2 agents.
Did you try to send multiple request at the same time? Sending new
requests while copytool #1 is busy?
Aurélien