The o2iblnd driver code forces peer_credits and concurrent_sends to be in a reasonable
range of each other:
if (*kiblnd_tunables.kib_concurrent_sends > *kiblnd_tunables.kib_peertxcredits
* 2)
*kiblnd_tunables.kib_concurrent_sends = *kiblnd_tunables.kib_peertxcredits
* 2;
if (*kiblnd_tunables.kib_concurrent_sends < *kiblnd_tunables.kib_peertxcredits
/ 2)
*kiblnd_tunables.kib_concurrent_sends = *kiblnd_tunables.kib_peertxcredits
/ 2;
The code above ensures that concurrent_sends cannot be larger than 2*peer_credits or
smaller than peer_credits/2. I’m not really sure why it allows concurrent_sends to be less
than peer_credits.
By changing the value of concurrent_sends after the module has loaded you’re circumventing
the above logic.
Chris Horn
On Aug 19, 2015, at 8:01 AM, Ken Jeffries
<jeffries@cray.com<mailto:jeffries@cray.com>> wrote:
Hi Martin and Craig,
This seems to be only a problem on mlx5 and not on mlx4. As Craig says the default values
(peer_credits=8 concurrent_sends=8) do work. The values peer_credits=63
concurrent_sends=16
also work but the concurrent_sends=16 can not be set via the normal .conf file in
modprobe.d/. After the modprobe ko2iblnd but before the module is used, it is possible
to chmod
/sys/module/ko2iblnd/parameters/concurrent_sends to writeable and then echo 16 into the
parameter.
These values are still well short of some generally recommended values and that is
concerning. As Martin says, it may be possible to increase other parameters to go beyond
these values.
Regards,
Ken
From: Martin Hecht <hecht@hlrs.de<mailto:hecht@hlrs.de>>
Date: Wednesday, August 19, 2015 at 6:52 AM
To: "Prescott,Craig P"
<prescott@rc.ufl.edu<mailto:prescott@rc.ufl.edu>>, Kenneth Jeffries
<jeffries@cray.com<mailto:jeffries@cray.com>>,
"hpdd-discuss@lists.01.org<mailto:hpdd-discuss@lists.01.org>"
<hpdd-discuss@ml01.01.org<mailto:hpdd-discuss@ml01.01.org>>
Subject: Re: [HPDD-discuss] o2iblnd peer_credits and concurrent_sends
Hi,
we stumbled over the peer_credits as well. It must be set to the same value on all clients
and servers.
I also heard from Cray that 63 was the maximum that works. Maybe apart from the limitation
of the lnet protocol there are further restrictions, or you have to increase other
parameters as well, in order to go beyond 63.
Martin
On 08/19/2015 03:14 AM, Prescott,Craig P wrote:
Hi Ken,
No, I never got any answers to that old post. We ended up going with the default values
back then - those have actually been ok for our scale/use case. FWIW, I have a hunch that
the problem may have been due to limitations of the Connect-IB driver we were using at the
time on the clients.
Kind of timely that you bring this issue up now, though, as we are bringing up a new file
system and already had it on our list to revisit.
Cheers,
Craig
________________________________
From: HPDD-discuss
<hpdd-discuss-bounces@ml01.01.org><mailto:hpdd-discuss-bounces@ml01.01.org> on
behalf of Ken Jeffries <jeffries@cray.com><mailto:jeffries@cray.com>
Sent: Monday, August 17, 2015 10:01 PM
To: hpdd-discuss@lists.01.org<mailto:hpdd-discuss@lists.01.org>
Subject: Re: [HPDD-discuss] o2iblnd peer_credits and concurrent_sends
Craig,
did you ever get an answer to your question? Or pick values that worked?
https://lists.01.org/pipermail/hpdd-discuss/2013-July/000358.html
Ken
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss@lists.01.org<mailto:HPDD-discuss@lists.01.org>https://lists.01.org/mailman/listinfo/hpdd-discuss
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss@lists.01.org<mailto:HPDD-discuss@lists.01.org>
https://lists.01.org/mailman/listinfo/hpdd-discuss