summaryrefslogtreecommitdiff
path: root/apt-pkg/pkgcachegen.h
Commit message (Collapse)AuthorAgeFilesLines
* Avoid undefined pointer arithmetic while growing mmapDavid Kalnischkies2021-02-041-1/+1
| | | | | | | | | The undefined behaviour sanitizer complains with: runtime error: addition of unsigned offset to 0x… overflowed to 0x… Compilers and runtime do the right thing in any case and it is a codepath that can (and ideally should) be avoided for speed reasons alone, but fixing it can't hurt (too much).
* Do not require libxxhash-dev for including pkgcachegen.hJulian Andres Klode2020-12-171-1/+3
|
* Use XXH3 for cache, hash table hashingJulian Andres Klode2020-12-151-5/+3
| | | | | | XXH3 is faster than both our CRC32c implementation as well as DJB hash for hash table hashing, so meh, let's switch to it.
* Make map_pointer<T> typesafeJulian Andres Klode2020-02-241-1/+1
| | | | | | | | | | | Instead of just using uint32_t, which would allow you to assign e.g. a map_pointer<Version> to a map_pointer<Package>, use our own smarter struct that has strict type checking. We allow creating a map_pointer from a nullptr, and we allow comparing map_pointer to nullptr, which also deals with comparisons against 0 which are often used, as 0 will be implictly converted to nullptr.
* Wrap AllocateInMap with a templated versionJulian Andres Klode2020-02-241-1/+4
|
* Replace map_pointer_t with map_pointer<T>Julian Andres Klode2020-02-241-8/+8
| | | | | | This is a first step to a type safe cache, adding typing information everywhere. Next, we'll replace map_pointer<T> implementation with a type safe one.
* Use a 32-bit djb VersionHash instead of CRC-16Julian Andres Klode2020-02-181-3/+3
|
* Remove includes of (md5|sha1|sha2).h headersJulian Andres Klode2020-01-141-1/+0
| | | | Remove it everywhere, except where it is still needed.
* Avoid extra out-of-cache hash table deduplication for package namesJulian Andres Klode2020-01-081-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were de-duplicating package name strings in StoreString, but also deduplicating most of them by them being in groups, so we had extra hash table lookups that could be avoided in NewGroup(). To continue deduplicating names across binary packages and source packages, insert groups for source packages as well. This is also a good first step in allowing efficient lookup of packages by source package - we can extend Group later by a list of SourceVersion objects, or alternatively, simply add a by-source chain into pkgCache::Version. This change improves performance by about 10% (913 to 814 ms), while having no significant overhead on the cache size: --- before +++ after @@ -1,7 +1,7 @@ -Total package names: 109536 (2.191 k) -Total package structures: 118689 (4.748 k) +Total package names: 119642 (2.393 k) +Total package structures: 118687 (4.747 k) Normal packages: 83309 - Pure virtual packages: 3365 + Pure virtual packages: 3363 Single virtual packages: 17811 Mixed virtual packages: 1973 Missing: 12231 @@ -10,21 +10,21 @@ Total distinct descriptions: 149291 (3.583 k) Total dependencies: 484135/156650 (12,2 M) Total ver/file relations: 57421 (1.378 k) Total Desc/File relations: 18219 (437 k) -Total Provides mappings: 29963 (719 k) +Total Provides mappings: 29959 (719 k) Total globbed strings: 226993 (5.332 k) Total slack space: 26,8 k -Total space accounted for: 38,1 M +Total space accounted for: 38,3 M Total buckets in PkgHashTable: 50503 - Unused: 5727 - Used: 44776 - Utilization: 88.6601% - Average entries: 2.65073 + Unused: 5728 + Used: 44775 + Utilization: 88.6581% + Average entries: 2.65074 Longest: 60 Shortest: 1 Total buckets in GrpHashTable: 50503 - Unused: 5727 - Used: 44776 - Utilization: 88.6601% - Average entries: 2.44631 - Longest: 10 + Unused: 4649 + Used: 45854 + Utilization: 90.7946% + Average entries: 2.60919 + Longest: 11 Shortest: 1
* Make APT::StringView publicJulian Andres Klode2019-06-111-20/+0
|
* pkgcachegen: Remove deprecated functionsJulian Andres Klode2019-02-261-4/+0
|
* Reformat and sort all includes with clang-formatJulian Andres Klode2017-07-121-2/+2
| | | | | | | | | | | | | This makes it easier to see which headers includes what. The changes were done by running git grep -l '#\s*include' \ | grep -E '.(cc|h)$' \ | xargs sed -i -E 's/(^\s*)#(\s*)include/\1#\2 include/' To modify all include lines by adding a space, and then running ./git-clang-format.sh.
* Drop cacheiterators.h includeJulian Andres Klode2017-07-121-1/+0
| | | | | Including cacheiterators.h before pkgcache.h fails because pkgcache.h depends on cacheiterators.h.
* Do not use MD5SumValue for Description_md5()Julian Andres Klode2016-11-221-5/+11
| | | | | | | | | | | Our profile says we spend about 5% of the time transforming the hex digits into the binary format used by HashsumValue, all for comparing them against the other strings. That makes no sense at all. According to callgrind, this reduces the overall instruction count from 5,3 billion to 5 billion in my example, which roughly matches the 5%.
* deal better with (very) small apt::cache-start valuesDavid Kalnischkies2016-01-271-0/+1
| | | | | | | | | It is a bit academic to support values which aren't big enough to fit even the hashtables without resizing, but cleaning up ensures that we do the right thing (aka not segfaulting) even if something goes wrong in these deep layers. You still can't have very very small values through… Git-Dch: Ignore
* convert Version() and Architecture() to APT::StringViewDavid Kalnischkies2016-01-261-8/+10
| | | | | | Part of hidden classes, so conversion is abi-free. Git-Dch: Ignore
* remove unused Description methods in listparsersDavid Kalnischkies2016-01-261-1/+0
| | | | | | | These virtual methods are implemented in hidden classes, so we can drop them without breaking the ABI. Git-Dch: Ignore
* Delete copy constructor and operator= for DynamicJulian Andres Klode2016-01-261-0/+5
| | | | | | | | This would mess up reference counting and should not be allowed (it could be implemented correctly, but it would not be efficient and we do not need it). Gbp-Dch: ignore
* Pass the old map size to ReMap()Julian Andres Klode2016-01-231-1/+1
| | | | | | | This allows us to check if a value to be remapped was inside the cache or not, which will become useful at a later point. Gbp-Dch: ignore
* pkgCacheGenerator::hash: Do not call tolower_ascii()Julian Andres Klode2016-01-081-1/+1
| | | | Gbp-Dch: ignore
* pkgCacheGenerator::StoreString: Get rid of std::stringJulian Andres Klode2016-01-081-5/+29
| | | | | | | | | Instead of storing a string -> map_stringitem_t mapping, create our own data type that can point to either a normal string or a string inside the cache. This avoids the creation of any string and improves performance slightly (about 4%).
* pkgCacheGenerator: Use StringView for toStringJulian Andres Klode2016-01-081-2/+7
| | | | | | This removes some minor overhead. Gbp-Dch: ignore
* Switch performance critical code to use APT::StringViewJulian Andres Klode2016-01-071-9/+15
| | | | | | This improves performance of the cache generation on my ARM platform (4x Cortex A15) by about 10% to 20% from 2.35-2.50 to 2.1 seconds.
* pkgCacheGenerator: Allow passing down an already created cacheJulian Andres Klode2015-12-291-0/+2
| | | | | If we already have opened a cache, there is no point in having to open it again.
* pkgcachegen.h: Hack around unordered_map not existing before C++11Julian Andres Klode2015-12-271-0/+5
| | | | | This is for public users only, which cannot use the class at all, except for the static methods.
* pkgcachegen: Use std::unordered_map instead of std::mapJulian Andres Klode2015-12-271-5/+5
| | | | | std::unordered_map is faster than std::map in our use case, reducing cache generation time by about 10% in my benchmark.
* add messages to our deprecation warnings in libaptDavid Kalnischkies2015-11-271-2/+2
| | | | Git-Dch: Ignore
* Drop C++11 elements from headersJulian Andres Klode2015-08-111-1/+0
|
* eliminate dead file-provides code in cache generationDavid Kalnischkies2015-08-101-13/+2
| | | | | | | | | | The code was never active in production, it just sits there collecting dust and given that it is never tested probably doesn't even work anymore the way it was supposed to be (whatever that was exactly in the first place). So just remove it before I have to "fix" it again next time. Git-Dch: Ignore
* elimate duplicated code in pkgIndexFile subclassesDavid Kalnischkies2015-08-101-8/+9
| | | | | | | | Trade deduplication of code for a bunch of new virtuals, so it is actually visible how the different indexes behave cleaning up the interface at large in the process. Git-Dch: Ignore
* add volatile sources support in libapt-pkgDavid Kalnischkies2015-08-101-1/+0
| | | | | | | | | | | | | | | | | | | Sources are usually defined in sources.list (and co) and are pretty stable, but once in a while a frontend might want to add an additional "source" like a local .deb file to install this package (No support for 'real' sources being added this way as this is a multistep process). We had a hack in place to allow apt-get and apt to pull this of for a short while now, but other frontends are either left in the cold by this and/or the code for it looks dirty with FIXMEs plastering it and has on top of this also some problems (like including these 'volatile' sources in the srcpkgcache.bin file). So the biggest part in this commit is actually the rewrite of the cache generation as it is now potentially a three step process. The biggest problem with adding support now through is that this makes a bunch of previously mostly unusable by externs and therefore hidden classes public, so a bit of further tuneing on this now public API is in order…
* just-in-time creation for (explicit) negative depsDavid Kalnischkies2015-08-101-4/+1
| | | | | | | | | | | | | | Now that we deal with provides in a more dynamic fashion the last remaining problem is explicit dependencies like 'Conflicts: foo' which have to apply to all architectures, but creating them all at the same time requires us to know all architectures ending up in the cache which isn't needed to be the same set as all foreign architectures. The effect is visible already now through as this prevents the creation of a bunch of virtual packages for arch:all packages and as such also many dependencies, just not very visible if you don't look at the stats… Git-Dch Ignore
* just-in-time creation for (implicit) ProvidesDavid Kalnischkies2015-08-101-0/+4
| | | | | | | | | | | | | | | | Expecting the worst is easy to code, but has its disadvantages e.g. by creating package structures which otherwise would have never existed. By creating the provides instead at the time a package structure is added we are well prepared for the introduction of partial architectures, massive amounts of M-A:foreign (and :allowed) and co as far as provides are concerned at least. We have something relatively similar for dependencies already. Many tests are added for both M-A states and the code cleaned to properly support implicit provides for foreign architectures and architectures we 'just' happen to parse. Git-Dch: Ignore
* hide implicit deps in apt-cache again by defaultDavid Kalnischkies2015-08-101-7/+8
| | | | | | | | | | | | | | Before MultiArch implicits weren't a thing, so they were hidden by default by definition. Adding them for MultiArch solved many problems, but having no reliable way of detecting which dependency (and provides) is implicit or not causes problems everytime we want to output dependencies without confusing our observers with unneeded implementation details. The really notworthy point here is actually that we keep now a better record of how a dependency came to be so that we can later reason about it more easily, but that is hidden so deep down in the library internals that change is more the problems it solves than the change itself.
* make all d-pointer * const pointersDavid Kalnischkies2015-08-101-2/+2
| | | | | | | | | | | | | | Doing this disables the implicit copy assignment operator (among others) which would cause hovac if used on the classes as it would just copy the pointer, not the data the d-pointer points to. For most of the classes we don't need a copy assignment operator anyway and in many classes it was broken before as many contain a pointer of some sort. Only for our Cacheset Container interfaces we define an explicit copy assignment operator which could later be implemented to copy the data from one d-pointer to the other if we need it. Git-Dch: Ignore
* apply various style suggestions by cppcheckDavid Kalnischkies2015-08-101-1/+1
| | | | | | | Some of them modify the ABI, but given that we prepare a big one already, these few hardly count for much. Git-Dch: Ignore
* add d-pointer, virtual destructors and de-inline de/constructorsDavid Kalnischkies2015-06-161-9/+8
| | | | | | | | To have a chance to keep the ABI for a while we need all three to team up. One of them missing and we might loose, so ensuring that they are available is a very tedious but needed task once in a while. Git-Dch: Ignore
* populate the Architecture field for PackageFilesDavid Kalnischkies2015-06-151-1/+1
| | | | | | | | | | | This is mainly visible in the policy, so that you can now pin by b= and let it only effect Packages files of this architecture and hence the packages coming from it (which do not need to be from this architecture, but very likely are in a normal repository setup). If you should pin by architecture in this way is a different question… Closes: 687255
* store Release files data in the CacheDavid Kalnischkies2015-06-121-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | We used to read the Release file for each Packages file and store the data in the PackageFile struct even through potentially many Packages (and Translation-*) files could use the same data. The point of the exercise isn't the duplicated data through. Having the Release files as first-class citizens in the Cache allows us to properly track their state as well as allows us to use the information also for files which aren't in the cache, but where we know to which Release file they belong (Sources are an example for this). This modifies the pkgCache structs, especially the PackagesFile struct which depending on how libapt users access the data in these structs can mean huge breakage or no visible change. As a single data point: aptitude seems to be fine with this. Even if there is breakage it is trivial to fix in a backportable way while avoiding breakage for everyone would be a huge pain for us. Note that not all PackageFile structs have a corresponding ReleaseFile. In particular the dpkg/status file as well as *.deb files have not. As these have only a Archive property need, the Component property takes over this duty and the ReleaseFile remains zero. This is also the reason why it isn't needed nor particularily recommended to change from PackagesFile to ReleaseFile blindly. Sticking with the earlier is usually the better option.
* mark internal interfaces as hiddenDavid Kalnischkies2014-11-081-5/+5
| | | | | | | We have a bunch of classes which are of no use for the outside world, but were still exported and so needed to preserve ABI/API. Marking them as hidden to not export them any longer is a big API break in theory, but in practice nobody is using them – as if they would its a bug.
* use a abi version check similar to the gcc checkDavid Kalnischkies2014-11-081-1/+1
| | | | Git-Dch: Ignore
* rename StringType VERSION to VERSIONNUMBERDavid Kalnischkies2014-10-031-2/+2
| | | | | | | aptitude has a define for VERSION, so to not generate a FTBFS we just rename our enum element to a slightly less generic name. Git-Dch: Ignore
* fix: Member variable 'X' is not initialized in the constructor.David Kalnischkies2014-09-271-1/+1
| | | | | Reported-By: cppcheck Git-Dch: Ignore
* drop stored StringItems in favor of in-memory mappingsDavid Kalnischkies2014-09-271-6/+12
| | | | | | | | | | | | | | | Strings like Section names or architectures are needed vary often. Instead of writing them each time we need them, we deploy sharing for these special strings. Until now, this was done with a linked list of strings in which we would search, which was stored in the cache. It turns out we can do this just as well in memory as well with a bunch of std::map's. In memory means here that it isn't available anymore if we have a partly invalid cache, but that isn't much of a problem in practice as the status file is compared to the other files we parse very small and includes mostly duplicates, so the space we would gain by storing is more or less equal to the size of the stored linked list…
* cleanup datatypes mix used in binary cacheDavid Kalnischkies2014-06-181-22/+22
| | | | | | | | We had a wild mixture of (unsigned) int, long and long long here without much sense, so this commit adds a few typedefs to get some sense in the typesystem and ensures that a ID isn't sometimes computed as int, stored as long and compared with a long long… as this could potentially bite us later on as the size of the archive only increases over time.
* parse and retrieve multiple Descriptions in one recordDavid Kalnischkies2014-05-091-2/+5
| | | | | | | | It seems unlikely for now that proper archives will carry multiple Description-* stanzas in the Packages (or Translation-*) file, but sometimes apt eats its own output as shown by the usage of the CD team and it would be interesting to let apt output multiple translations e.g. in 'apt-cache show'.
* mark optional (private) symbols as hiddenDavid Kalnischkies2014-03-211-9/+9
| | | | | | | | This methods should not be used by anyone expect the library itself as they are helpers for the specific class and therefore perfect candidates for hidding. Git-Dch: Ignore
* abstract version hash comparison a bitDavid Kalnischkies2014-03-131-0/+9
| | | | | | | | In #737085 we see that apt can be confused if informations about versions only differ slightly. This commit adds a way of at least adding a few more data points with the next abi break to help a bit with it. Git-Dch: Ignore
* follow method attribute suggestions by gccDavid Kalnischkies2014-03-131-1/+1
| | | | | Git-Dch: Ignore Reported-By: gcc -Wsuggest-attribute={pure,const,noreturn}
* cleanup headers and especially #includes everywhereDavid Kalnischkies2014-03-131-5/+7
| | | | | | | | Beside being a bit cleaner it hopefully also resolves oddball problems I have with high levels of parallel jobs. Git-Dch: Ignore Reported-By: iwyu (include-what-you-use)