Fwd: Execution time breakdowns
by Vincent Cavé
Hi,
I'm forwarding Nicholas' email since I don't remember the admin password to approve it myself.
One concern we have is that although the timing gets bad, most of the time is spent trying to steal which would indicate there's not enough work. Could it be a granularity problem ?
Another thing is that every run seems to break around 16 cores/workers, we're wondering if there could be something hard-coded in ocrInit or somewhere else.
Nicholas, can you send us the code you're using for both pthread and ocr so that we can have a look ?
Best,
Vincent
Begin forwarded message:
> From: ocr-dev-owner(a)lists.01.org
> Subject: OCR-dev post from nicholas.p.carter(a)intel.com requires approval
> Date: January 29, 2013 9:45:27 AM CST
> To: ocr-dev-owner(a)lists.01.org
>
> As list administrator, your authorization is requested for the
> following mailing list posting:
>
> List: OCR-dev(a)lists.01.org
> From: nicholas.p.carter(a)intel.com
> Subject: Execution time breakdowns
> Reason: Message body is too big: 1549440 bytes with a limit of 1024 KB
>
> At your convenience, visit:
>
> https://lists.01.org/mailman/admindb/ocr-dev
>
> to approve or deny the request.
>
> From: "Carter, Nicholas P" <nicholas.p.carter(a)intel.com>
> Subject: Execution time breakdowns
> Date: January 29, 2013 4:03:52 PM CST
> To: "ocr-dev(a)lists.01.org" <ocr-dev(a)lists.01.org>
>
>
> Benoit’s question about execution time breakdowns got me thinking about how to script Vtune to generate the sort of data he was looking for, and it turned out to not be too hard. (Meaning that it took a while to figure out but is pretty easy once you know how.)
>
> I wrote some scripts to sweep over the different array and chunk sizes, generating execution times by function, and other scripts to process the data and plot the fraction of execution time spent in each of the 10 functions that were the biggest contributors to execution time across the sweep. Hopefully, they’ll provide some data about where to look when the time comes for performance tuning. Also, these scripts should be pretty easily portable to other programs.
>
> -Nick
>
>
>
> From: ocr-dev-request(a)lists.01.org
> Subject: confirm 8706aff9a3fdfd1a5cc5268df732bcc44da4b2c6
> Date: January 29, 2013 9:45:27 AM CST
>
>
> If you reply to this message, keeping the Subject: header intact,
> Mailman will discard the held message. Do this if the message is
> spam. If you reply to this message and include an Approved: header
> with the list password in it, the message will be approved for posting
> to the list. The Approved: header can also appear in the first line
> of the body of the reply.
>
9 years, 1 month
Limit on number of dependencies an EDT can have?
by Carter, Nicholas P
Hello,
In the current OCR code, is there a limit on the number of dependencies an EDT can have? I'm noticing Heisenbugs when I create tasks with more than 64 dependencies, and Ganesh mentioned that he thought he remembered Romain saying that the dependency list was kept as a bit-vector for speed.
Here's the situation I'm seeing: I'm trying to parallelize the merging of long subsequences in my merge sort, so, when the merger EDT sees that it's merging sequences that are longer than some chunk size, it creates N mergelet tasks that each perform part of the merge, and a merge_phi task that detects when the mergelets have finished and signals the task that was waiting on the original merger that it can proceed. All told, merge_phi has N+3 dependencies, including its data inputs. For N >= 64, the program occasionally fails, but the failures aren't reproducible. For N smaller than that, it seems to work, but I'm still running some tests.
If OCR only tracks 64 dependencies, that'd explain what I'm seeing, since the failures are very timing-sensitive, and could easily be due to the merge_phi task firing before all of the tasks it depends on have completed.
-Nick
9 years, 2 months
Seeing non-unique GUIDs
by Carter, Nicholas P
I'm seeing some cases where ocrEventCreate seems to be generating repeat GUID values, which then causes crashes when I attempt to satisfy the events that I'm creating.
The scenario is that I have an EDT (call it foo) that needs to create a variable number of EDTs (call them bar). It also creates a single EDT (baz) that is dependent on all of the bar EDTs completing. Initially, I tried something similar to:
ocrGuid_t bar_guid;
for ( i= ...) {
ocrEventCreate(&(bar_guid), OCR_EVENT_STICKY_T, true);
ocrAddDependency(bar_guid, baz_guid, i)
}
When I did this, each invocation of foo would create the same sequence of GUIDs for bar_guid. Within each foo EDT, the GUIDs would be unique, but the sequence would repeat the next time foo executed, which eventually causes the program to crash with assertion violations.
As a work-around, Rob suggested that I malloc a new variable to hold the guid on each iteration, something like:
ocrGuid_t bar_guid_p;
for ( i= ...) {
bar_guid_p = new ocrGuid_t;
ocrEventCreate(bar_guid_p, OCR_EVENT_STICKY_T, true);
ocrAddDependency(*bar_guid_p, baz_guid, i)
}
This seems to generate unique GUIDs every time, but adds the overhead of a malloc/free call pair to each of the bar EDTs.
I'm not sure of why it's happening, but it looks like allocating the guid variable on the stack is somehow confusing the GUID generator into generating the same GUIDs on each call to foo.
-Nick
9 years, 2 months