Understanding performance loss due to high number of nanosleep calls on mOS
by Avinash Maurya (RIT Student)
Hi team,
We were evaluating the performance of Apache Spark on mOS and were
experiencing slowdown on LWK. Since Spark uses Java for execution, I wrote
a small Java program to trace the reason behind this.
I used the strace utility to inspect the behavior of each thread of the
program and found a very large number of nanosleep calls on LWK. For
instance, in a particular execution, when the number of calls to
nanosleep({tv_sec=0,
tv_nsec=1000000}, NULL) was 3694 on Linux, the LWK execution
contained 33124 calls, which led the LWK execution to complete in 1m43.721s
against Linux's 1m5.871s execution time. Adding up the relative timestamps
of all nanosleeps, it contributed around 30 additional seconds to LWK
execution, thus being the major reason for slowdown of performance. Please
find the logs and the program attached.
I am not sure what can be done to improve the performance of such
Java-based programs (I tried different versions and vendors of Java and
different programs). It would be really very helpful if you could please
point me in a direction to look for debugging this. Also, I would really
appreciate it if you could share any Java-based multi-threaded experiments
that have performed better on LWK as compared to Linux.
Thanks,
Avinash
9 months, 1 week
Understanding the __split_vma error
by Avinash Maurya (RIT Student)
Hi team,
A few of our workloads showed the following error and warning when running
on the mOS LWK:
mmap: ERROR: __split_vma() failed to split 2m TLB at 7ff6b5880000
mOS-ras: msg="unmap_lwkmem_range: Partial unmapping is not supported. VMA:
[00002aaab6b4c000,00002aaab6c4a000) region:
[00002aaab6b4c000,00002aaab6c44000)" id=mOSLwkmemProcessWarning location=
jobid=
Specifically, the workloads containing these errors are run on Apache
Spark, which includes reading the input data from HDFS. If the workloads do
not read any input, we do not see these errors and they show significant
speedup on the LWK.
We use the following commands to execute these workloads:
yod -o lwksched-stats=3 -o lwkmem-prot-none-delegation-enable java
spark-jar program-name
Adding the move-syscalls-disable flag shows some performance improvement,
but we still observe the above error and warning while execution, although
they are less frequent as compared to execution without the
move-syscalls-disable flag.
I tried figuring this out from the kernel source code but couldn't really
trace back to where and why does this error occur exactly.
We observe that the workloads containing these errors suffer significant
performance degradation and would like to know if there is some workaround
for this which could be used.
Please let me know if more debugging info is required.
Thanks,
Avinash Maurya.
10 months
[FW:] mos-devel@lists.01.org post from am6429@rit.edu requires approval
by Rolf Riesen
Hello,
Avinash responded to Sharath's request with more information.
Unfortunately, the logs were to large and the mailing list refused to
send them without my approval. I'm the list owner, but can't figure out
how to log in to the dashboard and my email approval did not work.
So, let me try to forward the compressed logs...
Thanks,
Rolf
+++-+--+----+-------+------------+--------------------+------------------------
Rolf Riesen, Ph.D. Email: rolf.riesen(a)intel.com
Software Architect Phone: +1 (503) 613-5514
Extreme-scale Software System Pathfinding Mobile: +1 (505) 363-6871
Outlook users: Turn off "extra line break removal" in File > Options > Mail > Message Format
----- Forwarded message from mos-devel-owner(a)lists.01.org -----
> From: "Avinash Maurya (RIT Student)" <am6429(a)rit.edu>
> Reply-To: am6429(a)rit.edu
> Subject: Re: [mOS-devel] Understanding the __split_vma error
> Date: Thu, 7 May 2020 19:19:33 -0400
> To: "Bhat, Sharath K" <sharath.k.bhat(a)intel.com>
> CC: "mos-devel(a)lists.01.org" <mos-devel(a)lists.01.org>
>
> Thanks a lot Sharath. This information was very very helpful.
>
> We see a lot of the unmap warnings while execution but only a few split_vma
> errors.
> The programs do not crash, they complete successfully even with these
> warnings and errors.
>
> For experimentation, we ran two different workloads, namely Pi and KMeans,
> and observed a significant speedup on the Pi workload but
> equally significant slowdown on the KMeans workload.
> Please find the dmesg and strace outputs of both these workloads attached
> in this email (both contain the unmap warning and the split_vma error,
> suggesting that the slowdown is probably not because of these errors).
> We are still investigating the reason behind this slowdown and would really
> appreciate it if you could point us to some utility that could help us with
> this investigation.
>
> Please let us know if these logs seem alright or there is something that
> can be revised for better performance.
>
>
> Thanks,
> Avinash Maurya
>
>
> On Thu, May 7, 2020 at 4:32 PM Bhat, Sharath K <sharath.k.bhat(a)intel.com>
> wrote:
>
> > Hi Avinash,
> >
> >
> >
> > Were there only two error/warning prints when you ran that workload? May
> > be a full dmesg would give better picture. And we can get more details with
> > a dump_stack() placed at those points in code if there are not many of
> > these prints, ex:
> >
> >
> >
> > *diff --git a/mOS/mmap.c b/mOS/mmap.c*
> >
> > *index 7e500b44fb31..c82ee6669a49 100644*
> >
> > *--- a/mOS/mmap.c*
> >
> > *+++ b/mOS/mmap.c*
> >
> > @@ -116,6 +116,7 @@ void unmap_lwkmem_range(struct mmu_gather *tlb,
> > struct vm_area_struct *vma,
> >
> > mos_ras(MOS_LWKMEM_PROCESS_WARNING,
> >
> > "%s: Partial unmapping is not supported. VMA:
> > [%016lx,%016lx) region: [%016lx,%016lx)",
> >
> > __func__, vma->vm_start, vma->vm_end, start, end);
> >
> > + dump_stack();
> >
> > return;
> >
> > }
> >
> >
> >
> > *diff --git a/mm/mmap.c b/mm/mmap.c*
> >
> > *index 7bcf9b547ac3..ce982bb94203 100644*
> >
> > *--- a/mm/mmap.c*
> >
> > *+++ b/mm/mmap.c*
> >
> > @@ -2694,6 +2694,7 @@ int __split_vma(struct mm_struct *mm, struct
> > vm_area_struct *vma,
> >
> > if (!IS_ALIGNED(addr, SZ_2M)) {
> >
> > pr_err("ERROR: %s() failed to split 2m TLB at
> > %lx\n",
> >
> > __func__, addr);
> >
> > + dump_stack();
> >
> > return -EINVAL;
> >
> > }
> >
> >
> >
> > Regarding unmap warning,
> >
> > It doesn???t make sense to zap only a part of LWK virtual memory area in
> > contrast to Linux(which allows such actions). This is because mOS maps
> > physical memory at the same time when virtual memory map is created using
> > say mmap() and there is no further on demand paging when user reads/writes
> > that virtual address range, but a partial unmap means a virtual map that
> > could page fault when user touches unmapped part of the map which is
> > against the intended design of zero page faults. Linux allows partial
> > zapping of pages of a VMA upon madvise(,, *MADV_DONTNEED*), I wonder if
> > these workloads are using madvise to free up memory. A dump_stack() will
> > help us know the trigger of these unmap warnings. If you don???t want to
> > rebuild kernel, you can also capture strace with tracing only memory
> > syscalls ex: strace -f -e trace=memory yod <yod options> <app> <app args> this
> > could help us know better if an madvise is being issued in the same range
> > for which unmap warnings are being printed.
> >
> >
> >
> > Regarding split_vma(),
> >
> > Currently mOS doesn???t support splitting a virtual memory map in the middle
> > of a huge page so kernel returns an error when such an attempt is made from
> > the application. Does this cause application crash or just a warning? If
> > there are tons of those prints then you can try commenting it out and see
> > if that helps.
> >
> >
> >
> > Sharath
> >
> >
> >
> > *From:* Avinash Maurya (RIT Student) <am6429(a)rit.edu>
> > *Sent:* Thursday, May 7, 2020 4:02 AM
> > *To:* mos-devel(a)lists.01.org
> > *Subject:* [mOS-devel] Understanding the __split_vma error
> >
> >
> >
> > Hi team,
> >
> >
> >
> > A few of our workloads showed the following error and warning when running
> > on the mOS LWK:
> >
> >
> >
> > mmap: ERROR: __split_vma() failed to split 2m TLB at 7ff6b5880000
> >
> >
> >
> > mOS-ras: msg="unmap_lwkmem_range: Partial unmapping is not supported.
> > VMA: [00002aaab6b4c000,00002aaab6c4a000) region:
> > [00002aaab6b4c000,00002aaab6c44000)" id=mOSLwkmemProcessWarning location=
> > jobid=
> >
> >
> >
> > Specifically, the workloads containing these errors are run on Apache
> > Spark, which includes reading the input data from HDFS. If the workloads do
> > not read any input, we do not see these errors and they show significant
> > speedup on the LWK.
> >
> >
> >
> > We use the following commands to execute these workloads:
> >
> > yod -o lwksched-stats=3 -o lwkmem-prot-none-delegation-enable java
> > spark-jar program-name
> >
> > Adding the move-syscalls-disable flag shows some performance improvement,
> > but we still observe the above error and warning while execution, although
> > they are less frequent as compared to execution without the
> > move-syscalls-disable flag.
> >
> >
> >
> > I tried figuring this out from the kernel source code but couldn't really
> > trace back to where and why does this error occur exactly.
> >
> >
> >
> > We observe that the workloads containing these errors suffer significant
> > performance degradation and would like to know if there is some workaround
> > for this which could be used.
> >
> >
> >
> > Please let me know if more debugging info is required.
> >
> >
> >
> > Thanks,
> >
> > Avinash Maurya.
> >
> >
> >
> >
> >
> >
> >
----- End forwarded message -----
10 months