Re: [mOS-devel] [Question] How can I run MPI programs?
by Min-Woo Ahn
Hello, Rolf
Thank you for your reply!
----------------------------------------------
I attached captured image of boot message before, but I think attaching
image doesn't worked well.
Here is boot message:
"
mOS: lwkcpus_mask: "2-15"
mOS-mem: There are 2 on-line NUMA domains.
mOS-mem: Designated 51539607552 bytes of LWK memory.
mOS: CPUs 0-1 will not move syscalls
mOS: CPUs 2-15 will move syscalls onto CPUs 1
mOS: These CPUs are isolated: 1-15
mOS: set unbound workqueue to 0-1 rc=0
mOS: Assigned LWK CPUs: 2-15
"
Did I configured well? My purpose was to directly map all lwk cpus to one
full-linux core. I don't know why but core 0 automatically isolated.
Therefore, I mapped core 2-15 to core 1, not core 0. (isolcpus=1,
lwkcpus=1.2-15)
Can I directly map cores 1-15 to core 0? Which configuration shows better
performance for compute-intensive workloads?
I'll try to run CORAL benchmark as you suggested. Thank you.
------------------------------------------------
Now, I have several additional questions.
1. I can see "rc={some value}" expression many times. What is rc? Is it
"return code"?
If it means return code, is there any list of explanation of each return
code?
2. As you said, to solve my previous question, I have to reboot my machine.
You mean that everytime I have to reboot when that error message appears?
Is there any method?
---------------------------------------------------
Thank you,
Minwoo
On Mon, Aug 21, 2017 at 4:55 AM, Min-Woo Ahn <minwoo.ahn(a)csl.skku.edu>
wrote:
> Hello, Rolf
>
> Thank you for your reply!
>
>
> ----------------------------------------------
> I attached captured image of boot message before, but I think attaching
> image doesn't worked well.
>
> Here is boot message:
> "
> mOS: lwkcpus_mask: "2-15"
> mOS-mem: There are 2 on-line NUMA domains.
> mOS-mem: Designated 51539607552 bytes of LWK memory.
> mOS: CPUs 0-1 will not move syscalls
> mOS: CPUs 2-15 will move syscalls onto CPUs 1
> mOS: These CPUs are isolated: 1-15
> mOS: set unbound workqueue to 0-1 rc=0
> mOS: Assigned LWK CPUs: 2-15
> "
>
> Did I configured well? My purpose was to directly map all lwk cpus to one
> full-linux core. I don't know why but core 0 automatically isolated.
> Therefore, I mapped core 2-15 to core 1, not core 0. (isolcpus=1,
> lwkcpus=1.2-15)
>
> Can I directly map cores 1-15 to core 0? Which configuration shows better
> performance for compute-intensive workloads?
>
> I'll try to run CORAL benchmark as you suggested. Thank you.
> ------------------------------------------------
>
> Now, I have several additional questions.
>
> 1. I can see "rc={some value}" expression many times. What is rc? Is it
> "return code"?
> If it means return code, is there any list of explanation of each return
> code?
>
> 2. As you said, to solve my previous question, I have to reboot my
> machine. You mean that everytime I have to reboot when that error message
> appears? Is there any method?
>
> ---------------------------------------------------
>
> Thank you,
>
> Minwoo
>
> On Mon, Aug 21, 2017 at 4:12 AM, Rolf Riesen <rolf.riesen(a)intel.com>
> wrote:
>
>> Hi Minwoo,
>>
>> thanks for trying out mOS. Unfortunately, the highly optimized,
>> pre-compiled Intel linpack binaries do not work on the mOS version
>> that is currently on github.
>>
>> These binaries try very hard to allocate just the right type of memory
>> in relation to very carefully laid out threads among the available
>> logical CPUs (hyperthreads). Some of that interferes with how mOS
>> wants to do things. For example, in your setup cores 0 and 1 are not
>> lightweight kernel cores, but linpack may still try to use them.
>>
>> You can compile linpack yourself, and that may work, but you will lose
>> the performance advantages of the optimized version. We are working on
>> a new mOS release that will let you run linpack, but it is not quite
>> ready yet to be pushed to github.
>>
>> Try some of the CORAL benchmarks
>> (https://asc.llnl.gov/CORAL-benchmarks/) we have run several of those
>> successfully on mOS.
>>
>> On Mon Aug 21, 2017 00:19:44, Min-Woo Ahn wrote:
>> > Min-Woo My server consist of 4 nodes(2 socket, 8 cores/socket each,
>> total 4(nodes)
>> > Min-Woo * 8(cores/socket) * 2(socket) = 64 cores).
>>
>> Could you send us the boot command line you used to configure mOS?
>> That would let us check whether your basic setup is correct.
>>
>> > Min-Woo ------Questions------
>> > Min-Woo 1. How can I run Linpack benchmark?
>>
>> When we release the next mOS version, I will try to include instructions
>> on how to run linpack on it. Stay tuned.
>>
>> > Min-Woo 2. What are those error messages? How can I solve it?
>> > Min-Woo -[yod:${pid}] No LWK CPUs are available. (rc=-16)
>> > Min-Woo -[yod:${pid}] No compute cores requested. (rc=-22)
>>
>> These sometimes happen when a previous mOS process has not completely
>> finished and the CPU resources are still reserved. You may have to
>> reboot.
>>
>> Thanks,
>>
>> Rolf
>>
>>
>>
>> +++-+--+----+-------+------------+--------------------+-----
>> -------------------
>> Rolf Riesen, Ph.D. Email:
>> rolf.riesen(a)intel.com
>> Software Architect Phone: +1 (503) 613-5514
>> Extreme-scale Software System Pathfinding Mobile: +1 (505) 363-6871
>>
>> Outlook users: Turn off "extra line break removal" in File > Options >
>> Mail > Message Format
>>
>
>
3 years, 6 months
[Question] How can I run MPI programs?
by Min-Woo Ahn
Hello,
I'm MS student at Computer Systems Laboratory(CSL), at South Korea.
Now I'm studying multi-kernels and trying to run some MPI benchmarks by mOS
and I have some problems.
------Environment------
My server consist of 4 nodes(2 socket, 8 cores/socket each, total 4(nodes)
* 8(cores/socket) * 2(socket) = 64 cores).
And I configured each node as 14 lwk cores(core 2-15) which move syscalls
to core 1. (see 1.JPG)
------Questions------
1. How can I run Linpack benchmark?
Before I use mOS, I used to run linpack on CentOS 7 as
$ mpiexec -np 64 -f {hostfile} {executable}
to bind 1 process per 1 core.
mOS case, I tried to run linpack as
$ mpiexec -ppn N -np N -f {hostfile} yod -R 1/N {executable}
as I saw on user's guide at github.
When N is low value such as 4, it works well.
But, if N is larger value such as 14, sometimes it works, sometimes
it shows below error messages.
Any comments or suggestions to run linpack benchmark(I want to fully
utilize lwk cores on 4 nodes)?
2. What are those error messages? How can I solve it?
-[yod:${pid}] No LWK CPUs are available. (rc=-16)
-[yod:${pid}] No compute cores requested. (rc=-22)
Thank you,
Minwoo Ahn.
3 years, 6 months