summaryrefslogtreecommitdiff
path: root/apt-pkg
Commit message (Collapse)AuthorAgeFilesLines
* Switch performance critical code to use APT::StringViewJulian Andres Klode2016-01-0714-111/+349
| | | | | | This improves performance of the cache generation on my ARM platform (4x Cortex A15) by about 10% to 20% from 2.35-2.50 to 2.1 seconds.
* Introduce internal APT::StringView classJulian Andres Klode2016-01-071-0/+112
| | | | | | | The class APT::StringView implements a drop-in replacement for a subset of C++17 std::string_view() features. It will be dropped at a later point and may not be used in public interfaces.
* acquire: Allow parallelizing methods without hostsJulian Andres Klode2016-01-071-2/+22
| | | | | | The maximum parallelization soft limit is the number of CPU cores * 2 on systems defining _SC_NPROCESSORS_ONLN. The hard limit in all cases is Acquire::QueueHost::Limit.
* CopyFile: Use 64 * 1024 instead of 64000 as buffer sizeJulian Andres Klode2016-01-071-7/+9
| | | | | | | | This is a multiple of the page size and thus results in less page faults, speeding up copying. Also, while we're at at, unify all uses of that size in a constant variable APT_BUFFER_SIZE.
* FileFd: (native) LZ4 supportJulian Andres Klode2016-01-074-1/+171
| | | | | Implement native support for LZ4 compression, using the official lz4 library.
* Increase APT::Cache-HashTableSize default to 50503Julian Andres Klode2016-01-031-1/+1
| | | | | | | | | | | This drop the hash table utilization from a high 98% to acceptable 74% on unstable, and the average bucket length from 4.6 to 1.8. This improves performance by about 5%, while increasing the size of the cache by 0.2 out of 38MB, that is 0.5%. 48481 is a nice number
* apt-cache: stats: Show a table utilization as percentageJulian Andres Klode2016-01-031-2/+2
| | | | Gbp-Dch: ignore
* Change compressor costs to be 100 apartJulian Andres Klode2016-01-031-9/+9
| | | | | | | | This will give us the freedom to insert more compressors at positions in between. Also change the cost of uncompressed to 0, as that really has no overhead, and the values do not really mean much.
* simple_compressor: Provide some accessors for end and freeJulian Andres Klode2016-01-031-0/+3
| | | | | | This makes code easier to read, and somewhat more correct. Gbp-Dch: ignore
* simple_buffer: Allow buffer size to changeJulian Andres Klode2016-01-031-2/+18
| | | | Gbp-Dch: ignore
* properly parse comments in apt_preferences and deb822-style sourcesDavid Kalnischkies2016-01-023-7/+5
| | | | | | | | | | apt_preferences and deb822-style sources used the specialized class pkgUserTagSection to deal with comments before/after a given stanza, but it couldn't deal with comments in the stanza at all. codesearch suggests that nobody else does and a vastely superior way of working with potentially commented files is implemented now, so we can officially discourage the use of the old incomplete hack class.
* support comments in debian/control parsingDavid Kalnischkies2016-01-021-4/+10
| | | | | | | | | Now (55153bf94ff28a23318e79aa48242244c4d82b3c) that pkgTagFile can be told to deal with all sorts of comments we can use this mode to parse dsc (as by catch) and debian/control files properly even in the wake of multiline fields spliced with comments like Build-Depends. Closes: 806775
* add optional support for comments in pkgTagFileDavid Kalnischkies2016-01-022-43/+229
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | APT usually deals with perfectly formatted files generated automatically be other programs – and as it has to parse multiple MBs of such files it tries to be fast rather than forgiving. This was always a problem if we reused this parser for files with a deb822 syntax which are mostly written by hand however, like apt_preferences or the deb822-style sources as these can include stray newlines and more importantly comments all over the place. As a stopgap we had pkgUserTagSection which deals at least with comments before and after a given stanza, but comments in between weren't really supported and now that we support parsing debian/control for e.g. build-dep we face the full comment problem e.g. with comments inbetween multi-line fields (like Build-Depends). We can't easily deal with this on the pkgTagSection level as the interface gives access to 'raw' char-pointers for performance reasons so we would need to optionally add a buffer here on which we could remove comments to hand out pointers into this buffer instead. The interface is quite large already and supports writing stanzas as well, which does not support comments at all either. So while in future it might make sense to have a parser setup which deals with and keeps comments in this commit we opt for the simpler solution for now: We officially declare that pkgTagSection does not support comments and instead expect the caller to deal with them, which in our case is pkgTagFile: pkgTagFile is extended with an additional mode which can deal with comments by dropping them from the buffer which will later form the input of pkgTagSection. The actual implementation is slightly more complex than this sentence suggests at first on one hand to have good performance and on the other to allow jumping directly to stanzas with offsets collected in a previous run (like our cache generation does it for example).
* Do not sync the cache fileJulian Andres Klode2015-12-291-2/+0
| | | | Integrity is taken care of by the checksum now.
* Add support for calculating hashes over the entire cacheJulian Andres Klode2015-12-293-5/+43
|
* pkgCacheGenerator: Allow passing down an already created cacheJulian Andres Klode2015-12-293-5/+19
| | | | | If we already have opened a cache, there is no point in having to open it again.
* pkgTagSection::Scan: Fix read of uninitialized valueJulian Andres Klode2015-12-291-1/+1
| | | | | We ignored the boundary of the buffer we were reading in while scanning for spaces.
* strutl.cc: Add declarations for the compat _ascii() functionsJulian Andres Klode2015-12-291-0/+2
| | | | | | This shuts up gcc Gbp-Dch: ignore
* Turn tolower_ascii() and isspace_ascii() into inline functionsJulian Andres Klode2015-12-292-11/+21
| | | | | | | | | | To preserve compatibility, the new inline functions have _inline as a suffix, and a macro defines the old names to refer to the inline variants. The old functions are still preserved for binary compatibility. Also simplify the implementation of both functions.
* Switch to DJB hashing and use prime number as table sizeJulian Andres Klode2015-12-291-7/+7
| | | | | | On my testing system, consisting of unstable and experimental, this reduces the average chain from 6.5 to 4.5, and the longest chain from 17 to 15.
* BufferedFileFdPrivate: Make InternalFlush() save against errorsJulian Andres Klode2015-12-281-8/+8
| | | | | | | | | | | Previously, if flush errored inside the loop, data could have already been written to the wrapped descriptor without having been removed from the buffer. Also try to work around EINTR here. A better solution might be to have the individual privates detect an interrupt and return 0 in such a case, instead of relying on errno being untouched in between the syscall and the return from InternalWrite.
* aptconfiguration: Set default compression level to 6Julian Andres Klode2015-12-281-5/+5
| | | | | | | | | | | | | | | | | | Since commit 7a68effcb904b4424b54a30e448b6f2560cd1078, the xz and lzma compressors read the level of compression they shall use. A default of -9 is too much for them, this will use 674 MB, according to the xz manual page. Level -6 on the other hand only needs 94 MB memory for compression. This causes autopkgtest failures in the test-compressed-indexes test, as not enough memory exists to proceed. Change the other compression levels to 6 as well: The gzip and bzip2 FileFd backends do not read them, and use their code's default level which is 6, so do the same for external methods.
* BufferedWriter: flushing: Check for written < size instead of <=Julian Andres Klode2015-12-281-3/+1
| | | | | This avoids some issues with InternalWrite returning 0 because it just cannot write stuff at the moment.
* deal with empty values properly in deb822 parserDavid Kalnischkies2015-12-271-1/+3
| | | | | | | | Regression introduced in 8710a36a01c0cb1648926792c2ad05185535558e, but such fields are unlikely in practice as it is just as simple to not have a field at all with the same result of not having a value. Closes: 808102
* allow repositories to forbid arch:all for specific index targetsDavid Kalnischkies2015-12-275-3/+30
| | | | | | | | | | | | | | | | | | | | | | | | Debian has a Packages file for arch:all already, but the arch:any files contain arch:all packages as well, so downloading it would be a total waste of resources. Getting this solved is on the list of things to do, but it is also the hardest part – for index targets like Contents the situation is much easier and less server/client implementations are involved so we might not want to stall them. A repository can now declare via: No-Support-for-Architecture-all: Packages that even if an arch:all Packages exists, it shouldn't be downloaded, so that support for Contents files can be added now. See also 1dd20368486820efb6ef4476ad739e967174bec4 for the implementation of downloading arch:all index targets, which this is limiting. The field uses the name of the target from the apt configuration for simplicity and is negative by design as this field is intended to be supported/needed only for a "short" time (one or two Debian releases). While this commit theoretically supports any target, its expected to only see "Packages" as a value in reality.
* pkgcachegen.h: Hack around unordered_map not existing before C++11Julian Andres Klode2015-12-271-0/+5
| | | | | This is for public users only, which cannot use the class at all, except for the static methods.
* FileFd: Add a buffered writing modeJulian Andres Klode2015-12-272-0/+153
| | | | | This is somewhat experimental right now, and might not work for everyone, so it is on an opt-in basis.
* FildFd: Introduce a Flush() function and call it from Close()Julian Andres Klode2015-12-272-0/+16
| | | | The flush function can be used for buffered writers.
* FileFdPrivate: Add getter and setter for fieldsJulian Andres Klode2015-12-271-9/+42
| | | | | We will soon implement a buffered writing decorator and we will need to forward attribute changes to those.
* fileutl: simple_buffer: Add write() and full() methodsJulian Andres Klode2015-12-271-0/+11
| | | | | | These can be used to implement write buffering Gbp-Dch: ignore
* fileutl: simple_buffer: Mark accessors as constJulian Andres Klode2015-12-271-2/+3
| | | | | | Suggested by David. Gbp-Dch: ignore
* FileFdPrivate: Extract SimpleBuffer and mark it as hiddenJulian Andres Klode2015-12-271-21/+24
| | | | Gbp-Dch: ignore
* ParseDepends: Mark branches for build-dep parsing as unlikelyJulian Andres Klode2015-12-271-2/+2
| | | | | | We do not see those branches at all during normal mode of operation (that is, during cache generation), so tell the compiler about it.
* debListParser: Do not validate Description-md5 for correctness twiceJulian Andres Klode2015-12-271-2/+4
| | | | | The Set() method returns false if the input is no hex number, so simply use that.
* Hex2Digit: Do not use isxdigit()Niels Thykier2015-12-271-4/+9
| | | | | | | We directly check if we are a hex digit in HexDigit, so use that information. [jak@debian.org: Commit message wording]
* debListParser: ParseDepends: Only query native arch if neededJulian Andres Klode2015-12-271-1/+2
| | | | | | This makes the code parsing architecture lists slower, but on the other hand, improves the more generic case of reading dependencies from Packages files.
* pkgcachegen: Use std::unordered_map instead of std::mapJulian Andres Klode2015-12-272-7/+7
| | | | | std::unordered_map is faster than std::map in our use case, reducing cache generation time by about 10% in my benchmark.
* Convert most callers of isspace() to isspace_ascii()Julian Andres Klode2015-12-275-32/+32
| | | | | This converts all callers that read machine-generated data, callers that might work with user input are not converted.
* Introduce isspace_ascii() for use by parsersJulian Andres Klode2015-12-272-0/+19
| | | | This is like isspace(), but ignores the current locale.
* Refactor InternalReadLine to not unroll Size == 0 caseJulian Andres Klode2015-12-261-5/+4
| | | | | | There is not much point and this is more readable. Gbp-Dch: ignore
* Change InternalReadLine to always use buffer.read() return valueJulian Andres Klode2015-12-261-12/+8
| | | | | | | | | This is mostly a documentation issue, as the size we want to read is always less than or equal to the size of the buffer, so the return value will be the same as the size argument. Nonetheless, people wondered about it, and it seems clearer to just always use the return value.
* Get rid of memmove() in our read bufferingJulian Andres Klode2015-12-261-76/+57
| | | | | | This further improves our performance, and rred on uncompressed files now spents 78% of its time in writing. Which means that we should really look at buffering those.
* Use a hardcoded buffer size of 4096 to fix performanceJulian Andres Klode2015-12-261-4/+2
| | | | | | | | | | | | | | The code uses memmove() to move parts of the buffer to the front when the buffer is only partially read. By simply reading one page at a time, the maximum size of bytes that must be moved has a hard limit, and performance improves: In one test case, consisting of a 430 MB Contents file, and a 75K PDiff, applying the PDiff previously took about 48 seconds and now completes in 2 seconds. Further speed up can be achieved by buffering writes, they account for about 60% of the run-time now.
* Mark all FileFdPrivate classes as hidden1.1.6Julian Andres Klode2015-12-241-6/+6
| | | | Gbp-Dch: ignore
* fix new[] vs delete mismatch introduced by b3db9d81David Kalnischkies2015-12-231-7/+7
| | | | | | | | And as we are at it lets fix the 'style' issue I introduced with the filefd changes as well. Reported-By: gcc -fsanitize's & cppcheck Git-Dch: Ignore
* use a dynamic buffer for ReadLineDavid Kalnischkies2015-12-231-15/+22
| | | | | | | | | We don't need the buffer that often - only for ReadLine - as it is only occasionally used, so it is actually more efficient to allocate it if needed instead of statically by default. It also allows the caller to influence the buffer size instead of hardcoding it. Git-Dch: Ignore
* implement a buffer system for FileFd::ReadLineDavid Kalnischkies2015-12-231-27/+140
| | | | | | | | | | | | | | | The default implementation of ReadLine was very naive by just reading each character one-by-one. That is kinda okay for libraries implementing compression as they have internal buffers (but still not great), but while working with files directly or via a pipe as there is no buffer there so all those reads are in fact system calls. This commit introduces an internal buffer in the FileFd implementation which is only used by ReadLine. The more low-level Read and all other actions remain unbuffered – they just changed to deal with potential "left-overs" in the buffer correctly. Closes: 808579
* parse xz-compression level from configurationDavid Kalnischkies2015-12-221-2/+28
| | | | | | | If we use the library to compress xz, still try to understand and pick up the arguments we would have used to call xz to figure out which level the user wants us to use instead of defaulting to level 6 (which is the default level of xz).
* follow dpkg and xz and use CRC64 for xz compressionDavid Kalnischkies2015-12-221-1/+1
| | | | | | | | dpkg switched from CRC32 to CRC64 in 777915108d9d36d022dc4fc4151a615fc95e5032 with the message: | This is the default CRC used by the xz command-line tool, align with | it and switch from CRC32 to CRC64. It should provide slightly better | detection against damaged data, at a negligible speed difference.
* shuffle compressor-specific code into private subclassesDavid Kalnischkies2015-12-222-635/+692
| | | | | | | | | | | | | | This isn't implementing any new features, it is "just" moving code around from FileFd methods which decided on each call how to handle the request by including all logic for all possible compressor backends in the method body to a model in which backend-specifics are implemented in a FileFdPrivate subclass. This avoids a big chunk of #ifdef's and should make it a tiny bit more obvious which backend uses which code. The execution of the idea is slightly uglified by the need to preserve ABI and API which causes liberal befriending. Git-Dch: Ignore