I'm forwarding Nicholas' email since I don't remember the admin password to
approve it myself and my previous fwd with the result has been blocked too… Can someone
approve it ?
One concern we have is that although the timing gets bad, most of the time is spent trying
to steal which would indicate there's not enough work. Could it be a granularity
Another thing is that every run seems to break around 16 cores/workers, we're
wondering if there could be something hard-coded in ocrInit or somewhere else.
Nicholas, can you send us the code you're using for both pthread and ocr so that we
can have a look ?
Benoit’s question about execution time breakdowns got me thinking
about how to script Vtune to generate the sort of data he was looking for, and it turned
out to not be too hard. (Meaning that it took a while to figure out but is pretty easy
once you know how.)
I wrote some scripts to sweep over the different array and chunk sizes, generating
execution times by function, and other scripts to process the data and plot the fraction
of execution time spent in each of the 10 functions that were the biggest contributors to
execution time across the sweep. Hopefully, they’ll provide some data about where to look
when the time comes for performance tuning. Also, these scripts should be pretty easily
portable to other programs.