summaryrefslogtreecommitdiff
path: root/apt-pkg/pkgcachegen.cc
Commit message (Collapse)AuthorAgeFilesLines
* Use size of the old cache as APT::Cache-Start defaultDavid Kalnischkies2021-02-041-1/+10
| | | | | | | Depending on your configured source 25 MB is hardly enough, so the mmap housing the cache while it is build has to grow. Repeatedly. We can cut down on the repeats of this by keeping a record of the size of the old cache assuming the sizes will remain roughly in the same ballpark.
* Avoid undefined pointer arithmetic while growing mmapDavid Kalnischkies2021-02-041-40/+23
| | | | | | | | | The undefined behaviour sanitizer complains with: runtime error: addition of unsigned offset to 0x… overflowed to 0x… Compilers and runtime do the right thing in any case and it is a codepath that can (and ideally should) be avoided for speed reasons alone, but fixing it can't hurt (too much).
* pkgcachegen: Avoid write to old cache for Version::ExtraJulian Andres Klode2021-01-131-1/+2
| | | | | | | | Assigning the result of AllocateInMap directly to Ver->d caused Ver->d to be resolved first, and hence if Ver was remapped during the AllocateInMap, we were trying to assign to the old value. Closes: #980037
* Add support for Phased-Update-PercentageJulian Andres Klode2021-01-081-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for Phased-Update-Percentage by pinning upgrades that are not to be installed down to 1. The output of policy has been changed to add the level of phasing, and documentation has been improved to document how phased updates work. The patch detects if it is running in a chroot, and if so, always includes phased updates, restoring classic apt behavior to avoid behavioral changes on buildd chroots. Various options are added to control this all: * APT::Get::{Always,Never}-Include-Phased-Updates and their legacy update-manager equivalents to always or never include phased updates * APT::Machine-ID can be set to a UUID string to have all machines in a fleet phase the same * Dir::Etc::Machine-ID is weird in that it's default is sort of like ../machine-id, but not really, as ../machine-id would look up $PWD/../machine-id and not relative to Dir::Etc; but it allows you to override the path to machine-id (as opposed to the value) * Dir::Bin::ischroot is the path to the ischroot(1) binary which is used to detect whether we are running in a chroot.
* apt-pkg: default visibility to hiddenJulian Andres Klode2020-02-261-1/+0
|
* Silence narrow conversion warnings, add error checksJulian Andres Klode2020-02-251-7/+20
| | | | | | When converting a long offset to a uint32_t to be stored in the map, check that this is safe to do. If the offset is negative, or we lose data in the conversion, we lost.
* Make map_pointer<T> typesafeJulian Andres Klode2020-02-241-21/+22
| | | | | | | | | | | Instead of just using uint32_t, which would allow you to assign e.g. a map_pointer<Version> to a map_pointer<Package>, use our own smarter struct that has strict type checking. We allow creating a map_pointer from a nullptr, and we allow comparing map_pointer to nullptr, which also deals with comparisons against 0 which are often used, as 0 will be implictly converted to nullptr.
* Wrap AllocateInMap with a templated versionJulian Andres Klode2020-02-241-14/+16
|
* Replace map_pointer_t with map_pointer<T>Julian Andres Klode2020-02-241-36/+36
| | | | | | This is a first step to a type safe cache, adding typing information everywhere. Next, we'll replace map_pointer<T> implementation with a type safe one.
* Use a 32-bit djb VersionHash instead of CRC-16Julian Andres Klode2020-02-181-3/+3
|
* NewGroup: Create GrpIterator after allocation (fix segfault)Julian Andres Klode2020-01-271-1/+2
| | | | | | | NewGroup created a GrpIterator and then called WriteStringInMap() which might remap the cache, causing the iterator to go invalid. Avoid this simply by creating the iterator later on.
* NewProvidesAllArch: Check if group is empty before using itJulian Andres Klode2020-01-161-1/+1
| | | | | | | | | | | | | APT 1.9.6 introduced empty groups by making use of groups to deduplicate package names. This is not normally a problem, but here we assumed that every group has at least one package. This caused a problem because automake was providing automake-1.16 while having the source package automake-1.16. So we found the automake-1.16 group, iterated over its empty package list, trying to store the provides (which hence never happened). LP: #1859952
* Remove includes of (md5|sha1|sha2).h headersJulian Andres Klode2020-01-141-1/+0
| | | | Remove it everywhere, except where it is still needed.
* Avoid extra out-of-cache hash table deduplication for package namesJulian Andres Klode2020-01-081-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were de-duplicating package name strings in StoreString, but also deduplicating most of them by them being in groups, so we had extra hash table lookups that could be avoided in NewGroup(). To continue deduplicating names across binary packages and source packages, insert groups for source packages as well. This is also a good first step in allowing efficient lookup of packages by source package - we can extend Group later by a list of SourceVersion objects, or alternatively, simply add a by-source chain into pkgCache::Version. This change improves performance by about 10% (913 to 814 ms), while having no significant overhead on the cache size: --- before +++ after @@ -1,7 +1,7 @@ -Total package names: 109536 (2.191 k) -Total package structures: 118689 (4.748 k) +Total package names: 119642 (2.393 k) +Total package structures: 118687 (4.747 k) Normal packages: 83309 - Pure virtual packages: 3365 + Pure virtual packages: 3363 Single virtual packages: 17811 Mixed virtual packages: 1973 Missing: 12231 @@ -10,21 +10,21 @@ Total distinct descriptions: 149291 (3.583 k) Total dependencies: 484135/156650 (12,2 M) Total ver/file relations: 57421 (1.378 k) Total Desc/File relations: 18219 (437 k) -Total Provides mappings: 29963 (719 k) +Total Provides mappings: 29959 (719 k) Total globbed strings: 226993 (5.332 k) Total slack space: 26,8 k -Total space accounted for: 38,1 M +Total space accounted for: 38,3 M Total buckets in PkgHashTable: 50503 - Unused: 5727 - Used: 44776 - Utilization: 88.6601% - Average entries: 2.65073 + Unused: 5728 + Used: 44775 + Utilization: 88.6581% + Average entries: 2.65074 Longest: 60 Shortest: 1 Total buckets in GrpHashTable: 50503 - Unused: 5727 - Used: 44776 - Utilization: 88.6601% - Average entries: 2.44631 - Longest: 10 + Unused: 4649 + Used: 45854 + Utilization: 90.7946% + Average entries: 2.60919 + Longest: 11 Shortest: 1
* pkgcachegen: Remove deprecated functionsJulian Andres Klode2019-02-261-5/+0
|
* pkgcache: Remove deprecated bitsJulian Andres Klode2019-02-261-3/+0
|
* Fix typo reported by codespell in code commentsDavid Kalnischkies2018-11-251-1/+1
| | | | | | | | No user visible change expect for some years old changelog entries, so we don't really need to add a new one for this… Reported-By: codespell Gbp-Dch: Ignore
* Remove obsolete RCS keywordsGuillem Jover2018-05-071-1/+0
| | | | Prompted-by: Jakub Wilk <jwilk@debian.org>
* Fix various typos reported by spellcheckersDavid Kalnischkies2018-05-051-1/+1
| | | | | Reported-By: codespell & spellintian Gbp-Dch: Ignore
* do not remap current files if nullptrs in cache generationDavid Kalnischkies2017-12-241-10/+11
| | | | | | | | | | | | | | If the cache needs to grow to make room to insert volatile files like deb files into the cache we were remapping null-pointers making them non-null-pointers in the process causing trouble later on. Only the current Releasefile pointer can currently legally be a nullpointer as volatile files have no release file they belong to, but for safety the pointer to the current Packages file is equally guarded. The option APT::Cache-Start can be used to workaround this problem. Reported-By: Mattia Rizzolo on IRC
* convert various c-style casts to C++-styleDavid Kalnischkies2017-12-131-4/+4
| | | | | | | | | | gcc was warning about ignored type qualifiers for all of them due to the last 'const', so dropping that and converting to static_cast in the process removes the here harmless warning to avoid hidden real issues in them later on. Reported-By: gcc Gbp-Dch: Ignore
* deprecate the single-line deprecation ignoring macroDavid Kalnischkies2017-12-131-1/+3
| | | | | | | | | | gcc has problems understanding this construct and additionally thinks it would produce multiple lines and stuff, so to keep using it isn't really worth it for the few instances we have: We can just write the long form there which works better. Reported-By: gcc Gbp-Dch: Ignore
* Reformat and sort all includes with clang-formatJulian Andres Klode2017-07-121-12/+12
| | | | | | | | | | | | | This makes it easier to see which headers includes what. The changes were done by running git grep -l '#\s*include' \ | grep -E '.(cc|h)$' \ | xargs sed -i -E 's/(^\s*)#(\s*)include/\1#\2 include/' To modify all include lines by adding a space, and then running ./git-clang-format.sh.
* Drop cacheiterators.h includeJulian Andres Klode2017-07-121-1/+0
| | | | | Including cacheiterators.h before pkgcache.h fails because pkgcache.h depends on cacheiterators.h.
* avoid validate/delete/load race in cache generationDavid Kalnischkies2017-01-191-28/+31
| | | | | | | Keeping the Fd of the cache file we have validated around to later load it into the mmap ensures not only that we load the same file (which wouldn't really be a problem in practice), but that this file also still exists and wasn't deleted e.g. by a 'apt clean' call run in parallel.
* Do not use MD5SumValue for Description_md5()Julian Andres Klode2016-11-221-13/+13
| | | | | | | | | | | Our profile says we spend about 5% of the time transforming the hex digits into the binary format used by HashsumValue, all for comparing them against the other strings. That makes no sense at all. According to callgrind, this reduces the overall instruction count from 5,3 billion to 5 billion in my example, which roughly matches the 5%.
* Compare size before data when ordering cache bucket entriesJulian Andres Klode2016-11-221-2/+2
| | | | | | | This has the effect of significantly reducing actual string comparisons, and should improve the performance of FindGrp a bit, although it's hardly measureable (callgrind says it uses 10% instructions less now).
* do not treat same-version local debs as downgradeDavid Kalnischkies2016-07-011-2/+15
| | | | | | As the volatile sources are parsed last they were sorted behind the dpkg/status file and hence are treated as a downgrade, which isn't really what you want to happen as from a user POV its an upgrade.
* Prevent double remapping of iterators and string viewsJulian Andres Klode2016-03-061-8/+22
| | | | | | | | | | | If an iterator or a stringview has multiple dynamic objects registered with it, it may be remapped twice. Prevent that by noting which iterators/views we have seen and not remapping one if we have already seen it. We most likely do not have any instance of multiple dynamics on a single object, but let's play safe - the overhead is not high.
* deal better with (very) small apt::cache-start valuesDavid Kalnischkies2016-01-271-16/+24
| | | | | | | | | It is a bit academic to support values which aren't big enough to fit even the hashtables without resizing, but cleaning up ensures that we do the right thing (aka not segfaulting) even if something goes wrong in these deep layers. You still can't have very very small values through… Git-Dch: Ignore
* convert Version() and Architecture() to APT::StringViewDavid Kalnischkies2016-01-261-12/+15
| | | | | | Part of hidden classes, so conversion is abi-free. Git-Dch: Ignore
* reimplement build-dep via apts normal resolverDavid Kalnischkies2016-01-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | build-dep was implemented by parsing the build-dependencies of a package and figuring out which packages to install/remove based on this. That means that for the first level of dependencies build-dep was implementing its very own resolver with all the benefits (aka: bugs) this gives us for not using the existing resolver for all levels. Making this work involves generating a dummy binary package with fitting Depends and Conflicts and as we can't create them out of thin air the cache generation needs to be involved so we end up writing a Packages file which we want to parse – after we have parsed the other Packages files already. With .dsc/.deb files we could add them before we started parsing anything. With a bit of care we can avoid generating too much data we have to throw away again (as many parts assume that e.g. the count of packages doesn't change midair), so that on a speed front there shouldn't be much of a difference, but output can be slightly confusing as if we have a completely valid cache on disk the "Reading package lists... Done" is printed two times – but apt is pretty quick about it in that case. Closes: #137560, #444930, #489911, #583914, #728317, #812173
* use consistently the last : as name:arch separatorDavid Kalnischkies2016-01-251-1/+1
| | | | | | | | Proper debian packages do not contain ':' in the package name, so for real packages this is a non-issue, but apt itself frequently makes use of packages with such an illegal name for internal proposes. Git-Dch: Ignore
* always create pkg at the time pkg:arch is createdDavid Kalnischkies2016-01-251-16/+29
| | | | | | | | | To resolve dependencies like "pkg:arch" we create a package with the name "pkg:arch" and the architecture "any". We create these packages only if a dependency needs it as these kind of dependencies aren't that common. This commit ensured that in the even this architecture specific dependency is the only relation this package has we still create the underlying package to have them available in provides resolution.
* Remap another (non-parameter) StringViewJulian Andres Klode2016-01-231-1/+3
| | | | | | | | | I only looked at parameters in the previous commit, which was not enough: One place also generated local string views. In this case, we only need to make ArchA dynamic, as NameA is not used after the FindPkg() call. Gbp-Dch: ignore
* Remap StringView instances pointing into the cacheJulian Andres Klode2016-01-231-0/+19
| | | | | | | | | | | | | | | | | | | | It turns out that StringViews might need to be remapped in some places because they come from the cache. For example, some sites pass a Ver.VerStr() to NewProvides(). Such a StringView would become invalid during the duration of the call if the cache is remapped, causing the program to die with a segmentation fault. We can take care of those issues by remapping string views in the same way we remap all the iterators. String views are only remapped if they point into the cache though, this allows us to write more generic code on the callee site without having to check whether the view points into the cache or not. That's not as efficient as possible, but the overhead does not appear to be measurable. Closes: #812251
* Pass the old map size to ReMap()Julian Andres Klode2016-01-231-4/+7
| | | | | | | This allows us to check if a value to be remapped was inside the cache or not, which will become useful at a later point. Gbp-Dch: ignore
* fix M-A:foreign provides creation for unknown archsDavid Kalnischkies2016-01-141-4/+12
| | | | | | | | Architectures for packages which do not belong to the native nor a foreign architecture (dubbed barbarian for now) which are marked M-A:foreign still provide in their own architecture even if not for others. Also, other M-A:foreign (and allowed) packages provide in these barbarian architectures.
* Store the size of strings in the cacheJulian Andres Klode2016-01-081-1/+2
| | | | | By storing the size of the string in the cache, we can make use of it when comparing the names in the hashtable in pkgCache::FindGrp.
* pkgCacheGenerator: CurMd5.Value() cannot be emptyJulian Andres Klode2016-01-081-4/+0
| | | | | It makes no sense to check if the value is empty, as it cannot be. It will always be a hexstring of exactly 32 bytes.
* pkgCacheGenerator::StoreString: Get rid of std::stringJulian Andres Klode2016-01-081-7/+5
| | | | | | | | | Instead of storing a string -> map_stringitem_t mapping, create our own data type that can point to either a normal string or a string inside the cache. This avoids the creation of any string and improves performance slightly (about 4%).
* Replace compare() == 0 checks with this == other checksJulian Andres Klode2016-01-081-1/+1
| | | | | | | This improves performance, as we now can ignore unequal strings based on their length already. Gbp-Dch: ignore
* pkgCacheGenerator: Use StringView for toStringJulian Andres Klode2016-01-081-5/+5
| | | | | | This removes some minor overhead. Gbp-Dch: ignore
* pkgCacheGenerator::StoreString: Move the string into the mapJulian Andres Klode2016-01-081-2/+2
| | | | | | | | Moving the string is likely faster than copying it. We could probably avoid strings alltogether in the future using some more crazy code, but I have not looked at that yet. Gbp-Dch: ignore
* Switch performance critical code to use APT::StringViewJulian Andres Klode2016-01-071-21/+22
| | | | | | This improves performance of the cache generation on my ARM platform (4x Cortex A15) by about 10% to 20% from 2.35-2.50 to 2.1 seconds.
* Do not sync the cache fileJulian Andres Klode2015-12-291-2/+0
| | | | Integrity is taken care of by the checksum now.
* Add support for calculating hashes over the entire cacheJulian Andres Klode2015-12-291-1/+7
|
* pkgCacheGenerator: Allow passing down an already created cacheJulian Andres Klode2015-12-291-3/+13
| | | | | If we already have opened a cache, there is no point in having to open it again.
* pkgcachegen: Use std::unordered_map instead of std::mapJulian Andres Klode2015-12-271-2/+2
| | | | | std::unordered_map is faster than std::map in our use case, reducing cache generation time by about 10% in my benchmark.
* add messages to our deprecation warnings in libaptDavid Kalnischkies2015-11-271-2/+2
| | | | Git-Dch: Ignore