https://bugs.freedesktop.org/show_bug.cgi?id=77272
Patrick Ohly <patrick.ohly(a)gmx.de> changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|medium |highest
--- Comment #2 from Patrick Ohly <patrick.ohly(a)gmx.de> ---
Here's a potential (and not 100% correct!) algorithm for transferring a
complete address book:
uint16 used = GetSize() # not the same as maximum offset!
uint16 start = choose_start()
uint16 chunksize = choose_chunk_size()
uint16 i
for (i = start; i < used; i += chunksize) {
PullAll( Offset = i, MaxCount = chunksize)
}
for (i = 0; i < start; i += chunksize) {
PullAll( Offset = i, MaxCount = min(chunksize, start - 1)
}
Note that GetSize() is specified as returning the number of entries in
the selected phonebook object that are actually used (i.e. indexes that
correspond to non-NULL entries). This is relevant if contacts get
deleted after starting the session. In that case, the algorithm above
will not necessarily read all contacts. Here's an example:
offsets #0 till #99, with contacts #10 till #19 deleted
chunksize = 10
GetSize() = 90
=> this will request offsets #0 till #89, missing contacts #90 till #99
I think this can be fixed with an additional PullAll, leading to:
for (i = start; i < used; i += chunksize) {
PullAll( Offset = i, MaxCount = chunksize)
}
PullAll(Offset = i) # not MaxCount!
for (i = 0; i < start; i += chunksize) {
PullAll( Offset = i, MaxCount = min(chunksize, start - 1)
}
The additional PullAll() is meant to read all contacts at the end which
would not be covered otherwise.
Now the other problem: MaxCount means "read chunksize contacts
starting at #i". Therefore the algorithm above will end up reading contacts
multiple times occasionally. Example:
offsets #0 till #99, with contact #0 deleted
chunksize = 10
GetSize() = 98
PullAll(Offset = 0, MaxCount = 10) => returns 10 contacts #1 till #10
(inclusive)
PullAll(Offset = 10, MaxCount = 10) => returns 10 contacts #10 till #19
=> contact #10 appears twice in the result
The duplicate cannot be filtered out easily because the UID is not
reliable. This could be addressed by keeping a hash of each contact and
discarding those who are exact matches for already seen contacts. It's easier
to accept the duplicate and remove it during the next sync.
There are two more aspects that I chose to ignore above: how to
implement the choice of start offset and chunk size.
Start offset could be random (no persistent state needed) or could
continue where the last sync left off. The latter will require a write
after each PullAll() (in case of unexpected shutdowns), even if nothing
ever changes. Is that acceptable? Probably not. I prefer choosing
randomly.
The chunk size depends on the size of the average contact. Make it too
small, and we end up generating lots of individual transfers. Make it
too large (say 1000), and we still have chunks that never transfer
completely. We could tune the chunk size so that on average, each transfer has
a certain size in bytes. TODO: how large?
Once we have such a target size in bytes, perhaps we can let the
algorithm adjust the chunk size dynamically: start small (100?), then
increase or decrease depending on the observed size of the returned
contacts.
--
You are receiving this mail because:
You are on the CC list for the bug.