Oh, right, I remember that work now that you mention it. It definitely seems to me like
there'll be some efficiency-performance trade-offs in how the system handles cores
that don't have any work to do. We'd probably need some benchmarks where the
amount of parallelism grows and shrinks with time in order to analyze those trade-offs,
though. Mergesort's parallelism is pretty much monotonically-decreasing over time, so
probably isn't a great test for that.
From: ocr-dev-bounces(a)lists.01.org [mailto:firstname.lastname@example.org] On Behalf Of
Sent: Thursday, January 31, 2013 2:50 PM
To: Technical discussion about OCR
Subject: Re: [OCR-dev] Fwd: Execution time breakdowns
On Jan 31, 2013, at 3:14 PM, "Carter, Nicholas P"
The OCR version has basically the same problem once you get high enough in the merge tree
that there are fewer ready tasks than processors. I could easily believe that the growth
in cycles in the steal code is due to cores spinning trying to find work. It might be
interesting to look into ways to eliminate/reduce that spinning because it has the
potential to waste a lot of energy. Some hardware support might be a big help there.
Justin also raised this issue last spring. Angelina (with Justin) looked into power
savings by allowing the user to statically turn on/off busy stealing and showed power
usage improvements on an Ivy Bridge machine. Our goal was to integrate the power savings
with the hierarchical place tree policies per place to add some dynamicity to power
efficiency policies, if my memory serves me right. If there is interest on that front, we
can look into salvaging those works.
Additionally, I recall Vivek talking about a SISAL parallel merge algorithm he worked on,
but this also is dependent on the reliability of my memory.
OCR-dev mailing list