On Feb 12, 2015, at 4:46 PM, Jay Lan <jay.j.lan(a)nasa.gov>
wrote:
On our production lustre client systems, the kiblnd_sd_xx_yy
threads have PF_THREAD_BOUND set in the task_struct->flags field.
On the client systems in my test environment, the PF_THREAD_BOUND
was not set. I checked /etc/modprobe.d/lustre, and I checked
/etc/init.d/lustre, but nothing rang.
Can someone shred light on how lustre decides whether to turn on
PF_THREAD_BOUND or not? Thanks in advance!
There seems to be some heuristics in lustre that try to determine if there needs to be
more than one cpu partition for the lustre threads. I haven’t verified this in the source
code, but I believe that in the name kiblnd_sd_xx_yy the “xx” is referring to the
partition number and the “yy” is referring to the thread number within that partition. On
one of our MDS servers, lustre by default created four partitions with threads named
kiblnd_sd_[00-03]_yy. We added the following module option:
options libcfs cpu_npartitions=1
After that, all of the threads are named kiblnd_sd_00_yy. If there is more than one
partition, the different thread pools will get bound to different cores. Maybe that same
logic is responsible for setting PF_THREAD_BOUND?
--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu