libzip: libzip-discuss: Working with larger archives with filecount > 65535 [PATCH]

Thread

Thread Index

Message

From: Konstantin Isakov <dragonroot%gmail.com@localhost>
To: libzip-discuss%nih.at@localhost
Subject: Working with larger archives with filecount > 65535 [PATCH]
Date: Fri, 1 May 2009 19:05:15 +0400

First of all, hello everybody!

I've been using libzip for some time, and my decision to write to you all on this nice day was motivated by mine today's fruitless attempts to work with some larger archives created by the usual "zip" program (Info-ZIP based).

Turned out, there's a limit of 65535 files in the original ZIP spec. Turned out as well that it doesn't stop the "zip" program from packing more, and the "unzip" program from extracting all that back. Why?

Well, first I thought about Zip64 extensions, but no, that programs don't support them. Instead, the "unzip" program just doesn't use that 2-byte 'nentry' field from the end-of-central-dir record, and just reads all the directory til it ends. See, its size is 4-byte, and turns out that the whole 65535-limit doesn't really exist that way.

To make things even more funky, the same "zip" program doesn't properly saturate on 65535 when writing that 'nentry' value if there're more than 65535 files. It just wraps around. So in an archive with 66000 files, the record says I have less than 500, making itself even less useful.

After discovering all that, I went on to patch libzip sources to make it behave more or less the same. In a file zip_open.c, at the end of function _zip_readcdir(),
I replaced the original 'for( i=0;i<nentry;++i)' dirent-read loop with the following code:

------
    for (i=0; ; i++) {
      if ( i >= nentry ) {
        /* The original array won't hold -- reallocate */
        cd->entry = realloc( cd->entry, ( i + 1 ) * sizeof(*(cd->entry)) );
      }
     
    if ((_zip_dirent_read(cd->entry+i, fp, bufp, eocd-cdp, 0,
                  error)) < 0) {
    break;
      }
    }

  cd->nentry = i;
------

Now it just reads entries until it bangs over something that doesn't look like an entry anymore. Very simple, and all 66000 files are there now.

So, if authors of libzip are reading this, could you please consider adding something like that into stock libzip? The majority of zip archives are created by InfoZip clones, and that's the way it behaves, allowing to add any number of files without any Zip64 extensions. And I don't see a lot of room for regressions -- each direntry has a magic id, so there's no chance of an error.

Thanks for reading, and have a good time :)

--
King's Sorrel rocks!

Made by MHonArc.