That is the paper I was referring to in my last comment.

Aug 18 13:53:29 prod-0064 kernel: [  151.261120] LNetError: 485:0:(o2iblnd.c:869:kiblnd_create_conn()) Can't create QP: -12, send_wr: 16191, recv_wr: 254

There’s some discussion of this error message in LU-5718 ( https://jira.hpdd.intel.com/browse/LU-5718 ).

There is clearly a bug somewhere and anyone that googles recommended settings and then tries to apply those settings to their mlx5 network will encounter it. 

There may be a bug, or there may just be a lack of documentation or knowledge about the interaction between the o2iblnd driver parameters and the mlx5 drivers. I think it’s important to understand that the “recommended settings” laid out in that paper are recommended in the context of dealing with the ping storms. The scale (client:router:OST ratio) at which the ping storm becomes an issue is not clear. Thus, it may not be necessary for *every* system to use these settings. That isn’t to say that the default settings are the best either. But perhaps small to moderately sized systems can get away with 16/16, or 32/32, etc.

Anyone that googles recommended settings needs to understand the context in which those recommendations were made.

Chris Horn

On Aug 19, 2015, at 11:49 AM, Ken Jeffries <jeffries@cray.com> wrote:

Hi Chris,

AFAICT the generally recommended values are peer_credits=126 and concurrent_sends=63. See https://cug.org/proceedings/attendee_program_cug2012/includes/files/pap166.pdf
and others. Those values if set when using mlx5 produce a non working network with errors in /var/log/messages like:

Aug 18 13:53:29 prod-0064 kernel: [  151.261120] LNetError: 485:0:(o2iblnd.c:869:kiblnd_create_conn()) Can't create QP: -12, send_wr: 16191, recv_wr: 254
Aug 18 13:54:05 prod-0064 kernel: [  187.241154] LNetError: 6:0:(o2iblnd.c:869:kiblnd_create_conn()) Can't create QP: -12, send_wr: 16191, recv_wr: 254
Aug 18 13:54:05 prod-0064 kernel: [  187.241161] LNetError: 6:0:(o2iblnd.c:869:kiblnd_create_conn()) Skipped 3 previous similar messages
Aug 18 13:54:41 prod-0064 kernel: [  223.220728] LNetError: 6:0:(o2iblnd.c:869:kiblnd_create_conn()) Can't create QP: -12, send_wr: 16191, recv_wr: 254

The 63/16 combination is the closest we could come to 126/63 and have a working network. With not being able to do any performance testing with 126/63 we are not able to directly say whether we are leaving performance on
the table. 

There is clearly a bug somewhere and anyone that googles recommended settings and then tries to apply those settings to their mlx5 network will encounter it. 

Regards,
Ken

From: Chris Horn <hornc@cray.com>
Date: Wednesday, August 19, 2015 at 11:21 AM
To: Kenneth Jeffries <jeffries@cray.com>
Cc: Martin Hecht <hecht@hlrs.de>, "Prescott,Craig P" <prescott@rc.ufl.edu>, "hpdd-discuss@lists.01.org" <hpdd-discuss@ml01.01.org>
Subject: Re: [HPDD-discuss] o2iblnd peer_credits and concurrent_sends

I don’t know that there’s a good one-size-fits all solution for how to configure the credits. The recommendations laid out in Cray’s paper are in response to an acute problem seen at large scale to deal with ping storms created by the Lustre pinger. If your system doesn’t experience that problem then the default values may be sufficient. If you have evidence that you’re leaving performance on the table, and LNet is your bottleneck, then experimenting with credits may be worthwhile.

Chris Horn

On Aug 19, 2015, at 11:17 AM, Chris Horn <hornc@cray.com> wrote:

The o2iblnd driver code forces peer_credits and concurrent_sends to be in a reasonable range of each other:

        if (*kiblnd_tunables.kib_concurrent_sends > *kiblnd_tunables.kib_peertxcredits * 2)
                *kiblnd_tunables.kib_concurrent_sends = *kiblnd_tunables.kib_peertxcredits * 2;

        if (*kiblnd_tunables.kib_concurrent_sends < *kiblnd_tunables.kib_peertxcredits / 2)
                *kiblnd_tunables.kib_concurrent_sends = *kiblnd_tunables.kib_peertxcredits / 2;

The code above ensures that concurrent_sends cannot be larger than 2*peer_credits or smaller than peer_credits/2. I’m not really sure why it allows concurrent_sends to be less than peer_credits.

By changing the value of concurrent_sends after the module has loaded you’re circumventing the above logic.

Chris Horn

On Aug 19, 2015, at 8:01 AM, Ken Jeffries <jeffries@cray.com> wrote:

Hi Martin and Craig,

This seems to be only a problem on mlx5 and not on mlx4. As Craig says the default values (peer_credits=8 concurrent_sends=8) do work. The values peer_credits=63 concurrent_sends=16
also work but the concurrent_sends=16 can not be set via the normal .conf file in modprobe.d/.   After the modprobe ko2iblnd but before the module is used, it is possible to chmod 
/sys/module/ko2iblnd/parameters/concurrent_sends to writeable and then echo 16 into the parameter.  

These values are still well short of some generally recommended values and that is concerning. As Martin says, it may be possible to increase other parameters to go beyond these values.

Regards,
Ken

From: Martin Hecht <hecht@hlrs.de>
Date: Wednesday, August 19, 2015 at 6:52 AM
To: "Prescott,Craig P" <prescott@rc.ufl.edu>, Kenneth Jeffries <jeffries@cray.com>, "hpdd-discuss@lists.01.org" <hpdd-discuss@ml01.01.org>
Subject: Re: [HPDD-discuss] o2iblnd peer_credits and concurrent_sends

Hi,

we stumbled over the peer_credits as well. It must be set to the same value on all clients and servers.
I also heard from Cray that 63 was the maximum that works. Maybe apart from the limitation of the lnet protocol there are further restrictions, or you have to increase other parameters as well, in order to go beyond 63.

Martin

On 08/19/2015 03:14 AM, Prescott,Craig P wrote:
Hi Ken,


No, I never got any answers to that old post.  We ended up going with the default values back then - those have actually been ok for our scale/use case.  FWIW, I have a hunch that the problem may have been due to limitations of the Connect-IB driver we were using at the time on the clients.


Kind of timely that you bring this issue up now, though, as we are bringing up a new file system and already had it on our list to revisit.


Cheers,

Craig



________________________________
From: HPDD-discuss <hpdd-discuss-bounces@ml01.01.org> on behalf of Ken Jeffries <jeffries@cray.com>
Sent: Monday, August 17, 2015 10:01 PM
To: hpdd-discuss@lists.01.org
Subject: Re: [HPDD-discuss] o2iblnd peer_credits and concurrent_sends

Craig,

did you ever get an answer to your question? Or pick values that worked?

https://lists.01.org/pipermail/hpdd-discuss/2013-July/000358.html

Ken



_______________________________________________
HPDD-discuss mailing list
HPDD-discuss@lists.01.orghttps://lists.01.org/mailman/listinfo/hpdd-discuss

_______________________________________________
HPDD-discuss mailing list
HPDD-discuss@lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss