The reason I brought up this question was that we thought it would be
a good idea to bind kiblnd_sd threads to cpus on nodes where NIC
are installed to.
We tried to use 'taskset -cp [cpu list] <pid of kiblnd_sd_xx_yy>
but it failed in set_cpus_allowed_ptr():
....
if (unlikely((p->flags & PF_THREAD_BOUND) && p != current)) {
ret = -EINVAL;
....
Surprisingly the PF_THREAD_BOUND was not set on the lustre clients
in my test cluster! I could not figure out why.
Furthermore, when I use
options lnet networks=o2ib(ib1)[2]
to bind all kiblnd_sd_xx_yy threads to cpu partition 2, it not only
seems to achieve what we want to do, but also the
PF_THREAD_BOUND flag was no longer set! Yup! I can use taskset
command to move the kiblnd_sd threads afterwards!
I think specifying cpu partitions is the right way to go, but just
bothered by inconsistent use of PF_THREAD_BOUND :-(
^.^
Jay
but a
On 02/12/2015 02:06 PM, Mohr Jr, Richard Frank (Rick Mohr) wrote:
> On Feb 12, 2015, at 4:46 PM, Jay Lan <jay.j.lan(a)nasa.gov>
wrote:
>
> On our production lustre client systems, the kiblnd_sd_xx_yy
> threads have PF_THREAD_BOUND set in the task_struct->flags field.
>
> On the client systems in my test environment, the PF_THREAD_BOUND
> was not set. I checked /etc/modprobe.d/lustre, and I checked
> /etc/init.d/lustre, but nothing rang.
>
> Can someone shred light on how lustre decides whether to turn on
> PF_THREAD_BOUND or not? Thanks in advance!
There seems to be some heuristics in lustre that try to determine if there needs to be
more than one cpu partition for the lustre threads. I haven’t verified this in the source
code, but I believe that in the name kiblnd_sd_xx_yy the “xx” is referring to the
partition number and the “yy” is referring to the thread number within that partition. On
one of our MDS servers, lustre by default created four partitions with threads named
kiblnd_sd_[00-03]_yy. We added the following module option:
options libcfs cpu_npartitions=1
After that, all of the threads are named kiblnd_sd_00_yy. If there is more than one
partition, the different thread pools will get bound to different cores. Maybe that same
logic is responsible for setting PF_THREAD_BOUND?
--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu