On Jan 31, 2013, at 3:14 PM, "Carter, Nicholas P"
The OCR version has basically the same problem once you get high enough in the merge tree
that there are fewer ready tasks than processors. I could easily believe that the growth
in cycles in the steal code is due to cores spinning trying to find work. It might be
interesting to look into ways to eliminate/reduce that spinning because it has the
potential to waste a lot of energy. Some hardware support might be a big help there.
Justin also raised this issue last spring. Angelina (with Justin) looked into power
savings by allowing the user to statically turn on/off busy stealing and showed power
usage improvements on an Ivy Bridge machine. Our goal was to integrate the power savings
with the hierarchical place tree policies per place to add some dynamicity to power
efficiency policies, if my memory serves me right. If there is interest on that front, we
can look into salvaging those works.
Additionally, I recall Vivek talking about a SISAL parallel merge algorithm he worked on,
but this also is dependent on the reliability of my memory.