Thread
Thread Index
Message
On 2/9/10, Thomas Klausner <tk%giga.or.at@localhost> wrote:
> Hi Alan!
>
>> The profile of zip_open() is dominated by memchr() - searching for
>> the magic number of the central directory trailer at the end of the
>> file.
>>
>> The search is necessary due to the variable length comment field,
>> which can be up to 64k. However in most zip files, the comment
>> is empty (or at least much smaller than 64k). So it is more
>> efficient to search backwards for the magic number from the end of the
>> file, rather than searching forwards from EOF-64k.
>>
>> This is the same method as used in the "unzip" program.
>>
>> The optimisation reduces libzip overhead from 60% to 40% when
>> extracting metadata from an FB2 zipped e-book file using the FB2
>> plugin from libextractor-mini.
>
> That's interesting.
>
> My understanding of the code is that it searches the whole 64k either
> way (per default and with your patch), taking the best identified
> match.
>
> Could the speed you see somehow come from using the handcoded memrchr
> instead of the memchr from libc?
> Thomas
No. I think you're right in the first instance, and my "optimisation"
is the result of a bug.
Now I see it, I like the paranoia in this approach, which can cope
with a comment (or trailing garbage?) that looks like an
end-of-central-directory. But it seems undesirable for
libextractor-mini, which is being used on low-powered ebook readers to
show metadata for multiple epub files (which use zip compression).
Do you think the paranoia is necessary?
If so, maybe it would make sense to overload ZIP_CHECKCONS to mean
"there should be no trailing garbage, and the comment text shouldn't
contain control characters, so you can use the faster approach".
Alan
Made by MHonArc.
|