summaryrefslogtreecommitdiff
path: root/apt-pkg/pkgcache.cc
Commit message (Collapse)AuthorAgeFilesLines
* Switch performance critical code to use APT::StringViewJulian Andres Klode2016-01-071-6/+32
| | | | | | This improves performance of the cache generation on my ARM platform (4x Cortex A15) by about 10% to 20% from 2.35-2.50 to 2.1 seconds.
* Increase APT::Cache-HashTableSize default to 50503Julian Andres Klode2016-01-031-1/+1
| | | | | | | | | | | This drop the hash table utilization from a high 98% to acceptable 74% on unstable, and the average bucket length from 4.6 to 1.8. This improves performance by about 5%, while increasing the size of the cache by 0.2 out of 38MB, that is 0.5%. 48481 is a nice number
* apt-cache: stats: Show a table utilization as percentageJulian Andres Klode2016-01-031-2/+2
| | | | Gbp-Dch: ignore
* Add support for calculating hashes over the entire cacheJulian Andres Klode2015-12-291-3/+33
|
* Switch to DJB hashing and use prime number as table sizeJulian Andres Klode2015-12-291-7/+7
| | | | | | On my testing system, consisting of unstable and experimental, this reduces the average chain from 6.5 to 4.5, and the longest chain from 17 to 15.
* pkgcache: Make hash arch-independent using fixed size integerJulian Andres Klode2015-12-141-2/+2
| | | | | | | | This helps writing test cases. Also adapt the test case that expected 64-bit. Nothing changes performance wise, the distribution of the hash values remains intact.
* Bump cache minor version to 2 to trigger rebuildsJulian Andres Klode2015-12-111-1/+1
| | | | | | | | With the package names now normalized to lower case, the caches of affected systems need to be rebuild. Adjust the minor version to trigger such a rebuild. Gbp-Dch: ignore
* Do not swap required and important in pkgCache::Priority()Julian Andres Klode2015-12-101-1/+1
| | | | | | | | required and important were swapped, leading to wrong output. Closes: #807523 Thanks: Manuel A. Fernandez Montecelo for discovering this
* remove incorrect optimization branchesDavid Kalnischkies2015-09-141-35/+6
| | | | | | | | | | | | These assumptions were once true, but they aren't anymore, so what is supposed to be a speed up is effectively a slowdown [not that it would be noticible]. Usage of SingleArchFindPkg was nuked in a stable update already as the included assumption was actually harmful btw, which is why we should get right of other 'non-harmful' but still untrue assumptions while we can. Git-Dch: Ignore
* implement dpkgs vision of interpreting pkg:<arch> dependenciesDavid Kalnischkies2015-09-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | How the Multi-Arch field and pkg:<arch> dependencies interact was discussed at DebConf15 in the "MultiArch BoF". dpkg and apt (among other tools like dose) had a different interpretation in certain scenarios which we resolved by agreeing on dpkg view – and this commit realizes this agreement in code. As was the case so far libapt sticks to the idea of trying to hide MultiArch as much as possible from individual frontends and instead translates it to good old SingleArch. There are certainly situations which can be improved in frontends if they know that MultiArch is upon them, but these are improvements – not necessary changes needed to unbreak a frontend. The implementation idea is simple: If we parse a dependency on foo:amd64 the dependency is formed on a package 'foo:amd64' of arch 'any'. This package is provided by package 'foo' of arch 'amd64', but not by 'foo' of arch 'i386'. Both of those foo packages provide each other through (assuming foo is M-A:foreign) to allow a dependency on 'foo' to be satisfied by either foo of amd64 or i386. Packages can also declare to provide 'foo:amd64' which is translated to providing 'foo:amd64:any' as well. This indirection over provides was chosen as the alternative would be to teach dependency resolvers how to deal with architecture specific dependencies – which violates the design idea of avoiding resolver changes, especially as architecture-specific dependencies are a cornercase with quite a few subtil rules. Handling it all over versioned provides as we already did for M-A in general seems much simpler as it just works for them. This switch to :any has actually a "surprising" benefit as well: Even frontends showing a package name via .Name() [which doesn't show the architecture] will display the "architecture" for dependencies in which it was explicitely requested, while we will not show the 'strange' :any arch in FullName(true) [= pretty-print] either. Before you had to specialcase these and by default you wouldn't get these details shown. The only identifiable disadvantage is that this complicates error reporting and handling. apt-get's ShowBroken has existing problems with virtual packages [it just shows the name without any reason], so that has to be worked on eventually. The other case is that detecting if a package is completely unknown or if it was at least referenced somewhere needs to acount for this "split" – not that it makes a practical difference which error is shown… but its one of the improvements possible.
* store ':any' pseudo-packages with 'any' as architectureDavid Kalnischkies2015-09-141-1/+1
| | | | | | | | | | Previously we had python:any:amd64, python:any:i386, … in the cache and the dependencies of an amd64 package would be on python:any:amd64, of an i386 on python:any:i386 and so on. That seems like a relatively pointless endeavor given that they will all be provided by the same packages and therefore also a waste of space. Git-Dch: Ignore
* Cleanup includes after running iwyuMichael Vogt2015-08-171-1/+0
|
* parse packages from all architectures into the cacheDavid Kalnischkies2015-08-101-50/+37
| | | | | | | | | | | | | | | | | | | | | | | Now that we can dynamically create dependencies and provides as needed rather than requiring to know with which architectures we will deal before running we can allow the listparser to parse all records rather than skipping records of "unknown" architectures. This can e.g. happen if a user has foreign architecture packages in his status file without dpkg knowing about this architecture (or apt configured in this way). A sideeffect is that now arch:all packages are (correctly) recorded as available from any Packages file, not just from the native one – which has its downsides for the resolver as mixed-arch source packages can appear in different architectures at different times, but that is the problem of the resolver and dealing with it in the parser is at best a hack (and also depends on a helpful repository). Another sideeffect is that his allows :none packages to appear in Packages files again as we don't do any kind of checks now, but given that they aren't really supported (anymore) by anyone we can live with that.
* hide implicit deps in apt-cache again by defaultDavid Kalnischkies2015-08-101-45/+16
| | | | | | | | | | | | | | Before MultiArch implicits weren't a thing, so they were hidden by default by definition. Adding them for MultiArch solved many problems, but having no reliable way of detecting which dependency (and provides) is implicit or not causes problems everytime we want to output dependencies without confusing our observers with unneeded implementation details. The really notworthy point here is actually that we keep now a better record of how a dependency came to be so that we can later reason about it more easily, but that is hidden so deep down in the library internals that change is more the problems it solves than the change itself.
* use a smaller type for flags storage in the cacheDavid Kalnischkies2015-08-101-20/+22
| | | | | | | | We store very few flags in the cache, so keeping storage space for 8 is enough for all of them and still leaves a few unused bits remaining for future extensions without wasting bytes for nothing. Git-Dch: Ignore
* remove the compatibility markers for 4.13 abiDavid Kalnischkies2015-08-101-3/+0
| | | | | | | | We aren't and we will not be really compatible again with the previous stable abi, so lets drop these markers (which never made it into a released version) for good as they have outlived their intend already. Git-Dch: Ignore
* split-up Dependency structDavid Kalnischkies2015-08-101-20/+36
| | | | | | | | | | | | | | | | | | | | | Having dependency data separated from the link between version/package and the dependency allows use to work on sharing the depdency data a bit as it turns out that many dependencies are in fact duplicates. How many are duplicates various heavily with the sources configured, but for a single Debian release the ballpark is 2 duplicates for each dependency already (e.g. libc6 counts 18410 dependencies, but only 45 unique). Add more releases and the duplicates count only rises to get ~6 for 3 releases. For each architecture a user has configured which given the shear number of dependencies amounts to MBs of duplication. We can cut down on this number, but pay a heavy price for it: In my many releases(3) + architectures(3) test we have a 10% (~ 0.5 sec) increase in cache creationtime, but also 10% less cachesize (~ 10 MB). Further work is needed to rip the whole benefits from this through, so this is just the start. Git-Dch: Ignore
* bunch of micro-optimizations for depcacheDavid Kalnischkies2015-08-101-8/+13
| | | | | | | | DepCache functions are called a lot, so if we can squeeze some drops out of them for free we should do so. Takes also the opportunity to remove some whitespace errors from these functions. Git-Dch: Ignore
* avoid virtual in the iteratorsDavid Kalnischkies2015-08-101-3/+5
| | | | | | | | With a bit of trickery and the Curiously recurring template pattern we can free us from our use of virtual in the iterators were it is unneeded bloat as we never deal with pointers to iterators and similar such. Git-Dch: Ignore
* make all d-pointer * const pointersDavid Kalnischkies2015-08-101-1/+1
| | | | | | | | | | | | | | Doing this disables the implicit copy assignment operator (among others) which would cause hovac if used on the classes as it would just copy the pointer, not the data the d-pointer points to. For most of the classes we don't need a copy assignment operator anyway and in many classes it was broken before as many contain a pointer of some sort. Only for our Cacheset Container interfaces we define an explicit copy assignment operator which could later be implemented to copy the data from one d-pointer to the other if we need it. Git-Dch: Ignore
* add d-pointer, virtual destructors and de-inline de/constructorsDavid Kalnischkies2015-06-161-0/+2
| | | | | | | | To have a chance to keep the ABI for a while we need all three to team up. One of them missing and we might loose, so ensuring that they are available is a very tedious but needed task once in a while. Git-Dch: Ignore
* store Release files data in the CacheDavid Kalnischkies2015-06-121-21/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | We used to read the Release file for each Packages file and store the data in the PackageFile struct even through potentially many Packages (and Translation-*) files could use the same data. The point of the exercise isn't the duplicated data through. Having the Release files as first-class citizens in the Cache allows us to properly track their state as well as allows us to use the information also for files which aren't in the cache, but where we know to which Release file they belong (Sources are an example for this). This modifies the pkgCache structs, especially the PackagesFile struct which depending on how libapt users access the data in these structs can mean huge breakage or no visible change. As a single data point: aptitude seems to be fine with this. Even if there is breakage it is trivial to fix in a backportable way while avoiding breakage for everyone would be a huge pain for us. Note that not all PackageFile structs have a corresponding ReleaseFile. In particular the dpkg/status file as well as *.deb files have not. As these have only a Archive property need, the Component property takes over this duty and the ReleaseFile remains zero. This is also the reason why it isn't needed nor particularily recommended to change from PackagesFile to ReleaseFile blindly. Sticking with the earlier is usually the better option.
* Merge branch 'debian/sid' into debian/experimentalMichael Vogt2015-05-221-13/+1
|\ | | | | | | | | | | | | | | | | Conflicts: apt-pkg/pkgcache.h debian/changelog methods/https.cc methods/server.cc test/integration/test-apt-download-progress
| * remove "first package seen is native package" assumptionDavid Kalnischkies2015-04-221-13/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fix for #777760 causes packages of foreign (and the native) architectures, to be created correctly, but invalidates (like the previously existing, but policy-forbidden architecture-less packages we had to support for some upgrade scenarios) the assumption that the first (and only) package in the cache for a single architecture system must be the package for the native architecture (as, where should the other architectures come from, right? Wrong.). Depending on the order of parsing sources more or less packages can be effected by this. The effects are strange (for apt it mostly effects simulation/debug output, but also apt-mark on these specific packages), which complicates debugging, but relatively harmless if understood as most actions do not need direct named access to packages. The problem is fixed by removing the single-arch special casing in the paths who had them (Cache.FindPkg), so they use the same code as multi-arch systems, which use them as a wrapper for Grp.FindPkg. Note that single-arch system code was using Grp.FindPkg before as well if a Grp structure was handily available, so we don't introduce new untested code here: We remove more brittle special cases which are less tested instead (this was planed to be done for Stretch anyhow). Note further that the method with the assumption itself isn't fixed. As it is a private method I opted for declaring it deprecated instead and remove all its call positions. As it is private no-one can call this method legally (thanks to how c++ works by default its still an exported symbol through) and fixing it basically means reimplementing code we already have in Grp.FindPkg. Removing rather than fixing seems hence like a good solution. Closes: 782777 Thanks: Axel Beckert for testing
* | merge debian/sid into debian/experimentalDavid Kalnischkies2015-03-161-6/+0
|\|
| * deprecate the Section member from package structDavid Kalnischkies2014-11-101-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A version belongs to a section and has hence a section member of its own. A package on the other hand can have multiple versions from different sections. This was "solved" by using the section which was parsed first as order of sources.list defines, but that is obviously a horribly unpredictable thing. Users are way better of with the Section() as returned by the version they are dealing with. It is likely the same for all versions of a package, but in the few cases it isn't, it is important (like packages moving from main/* to contrib/* or into oldlibs …). Backport of 7a66977 which actually instantly removes the member.
| * mark as Automatic/Downloadable pure as gcc suggestsDavid Kalnischkies2014-05-221-2/+2
| | | | | | | | | | Git-Dch: Ignore Reported-By: gcc
* | guard pkg/grp hashtable creation changesDavid Kalnischkies2014-11-081-17/+22
| | | | | | | | | | | | | | | | | | The change itself is no problem ABI wise, but the remove of the old undynamic hashtables is, so we bring it back for older abis and happily use the now available free space to backport more recent additions like the dynamic hashtable itself. Git-Dch: Ignore
* | fix: Member variable 'X' is not initialized in the constructor.David Kalnischkies2014-09-271-0/+1
| | | | | | | | | | Reported-By: cppcheck Git-Dch: Ignore
* | drop stored StringItems in favor of in-memory mappingsDavid Kalnischkies2014-09-271-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Strings like Section names or architectures are needed vary often. Instead of writing them each time we need them, we deploy sharing for these special strings. Until now, this was done with a linked list of strings in which we would search, which was stored in the cache. It turns out we can do this just as well in memory as well with a bunch of std::map's. In memory means here that it isn't available anymore if we have a partly invalid cache, but that isn't much of a problem in practice as the status file is compared to the other files we parse very small and includes mostly duplicates, so the space we would gain by storing is more or less equal to the size of the stored linked list…
* | packages in the cache are sorted by name so noise-freeDavid Kalnischkies2014-09-271-9/+2
| | | | | | | | | | | | | | | | | | | | Commit aa0fe657e46b87cc692895a36df12e8b74bb27bb sorts the package names in the hashtable. We make use of this already in these functions, but as a minor sideeffect it also means that we don't have 'noise' anymore between packages belonging to the same group. We therefore don't need to check for a matching name in Grp.FindPkg anymore. Git-Dch: Ignore
* | search for pkg names in the cache case-sensitiveDavid Kalnischkies2014-09-271-4/+4
| | | | | | | | | | | | | | | | | | | | | | Package names have to be lowercase (debian-policy §5.6.1) and in as lowlevel as these method are it would be quiet strange to treat an invalid package "suddently" as a valid one which other tools might or might not accept. If case-insensitivity is really needed the frontend should ensure this rather than these methods waste cpu cycles by default. Git-Dch: Ignore
* | deprecate Pkg->Name in favor of Grp->NameDavid Kalnischkies2014-09-271-6/+3
| | | | | | | | | | | | | | They both store the same information, so this field just takes up space in the Package struct for no good reason. We mark it "just" as deprecated instead of instantly removing it though as it isn't misleading like Section was and is potentially used in the wild more often.
* | remove the Section member from package structDavid Kalnischkies2014-06-181-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A version belongs to a section and has hence a section member of its own. A package on the other hand can have multiple versions from different sections. This was "solved" by using the section which was parsed first as order of sources.list defines, but that is obviously a horribly unpredictable thing. We therefore directly remove this struct member to free some space and mark the access method as deprecated, which is told to return the section of the 'newest' known version, which is at least predictable, but possible not what it returned before – but nobody knows. Users are way better of with the Section() as returned by the version they are dealing with. It is likely the same for all versions of a package, but in the few cases it isn't, it is important (like packages moving from main/* to contrib/* or into oldlibs …).
* | cleanup datatypes mix used in binary cacheDavid Kalnischkies2014-06-181-2/+2
| | | | | | | | | | | | | | | | We had a wild mixture of (unsigned) int, long and long long here without much sense, so this commit adds a few typedefs to get some sense in the typesystem and ensures that a ID isn't sometimes computed as int, stored as long and compared with a long long… as this could potentially bite us later on as the size of the archive only increases over time.
* | increase hashtable size for packages/groups by factor 5David Kalnischkies2014-06-181-12/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It also makes the size configureable, so it can be adapted in the future without the need for an abi break - and even by users… The increase was long overdue as it gives a >10% decrease in runtime of e.g. 'apt-get check -s'. Some (useless) benchmark with 69933 groups and 187796 packages without a pre-built cache: time apt-get check -so APT::Cache-HashTableSize=1 → 20m time apt-get check -so APT::Cache-HashTableSize=1000 → 6,41s time apt-get check -so APT::Cache-HashTableSize=2000 → 5,64s (old) time apt-get check -so APT::Cache-HashTableSize=3000 → 5,30s time apt-get check -so APT::Cache-HashTableSize=5000 → 5,08s time apt-get check -so APT::Cache-HashTableSize=6000 → 5,05s time apt-get check -so APT::Cache-HashTableSize=7000 → 5,02s time apt-get check -so APT::Cache-HashTableSize=8000 → 5,00s time apt-get check -so APT::Cache-HashTableSize=9000 → 4,98s time apt-get check -so APT::Cache-HashTableSize=10000 → 4,96s (new) time apt-get check -so APT::Cache-HashTableSize=15000 → 4,90s time apt-get check -so APT::Cache-HashTableSize=20000 → 4,86s time apt-get check -so APT::Cache-HashTableSize=30000 → 4,77s time apt-get check -so APT::Cache-HashTableSize=40000 → 4,74s time apt-get check -so APT::Cache-HashTableSize=50000 → 4,73s time apt-get check -so APT::Cache-HashTableSize=60000 → 4,71s The gap increases further for operations which have more package lookups. Factor 5 was chosen as higher values do not provide any really significant timing advantage anymore compared to the memory increase in my testing and there is always the possibility to increase it now if that changes. (also most users will not have 3 releases and 4 architectures in the cache, so theirs will be much smaller and faster).
* | Merge remote-tracking branch 'mvo/feature/hash-stats' into debian/experimentalMichael Vogt2014-06-181-8/+4
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: apt-pkg/acquire-item.cc apt-pkg/acquire-item.h apt-pkg/deb/debmetaindex.h apt-pkg/pkgcache.cc test/integration/test-apt-ftparchive-src-cachedb
| * | [API-Break] rename pkgCache::Package::NextPackage to pkgCache::Package::NextMichael Vogt2014-06-181-4/+4
| | | | | | | | | | | | | | | This is a internal struct not a external interface so the actual breakage should be small.
| * | increase Pkg/Grp hash table size from 2k to 64kMichael Vogt2014-05-291-5/+1
| |/
* | invalid cache if architecture set doesn't matchDavid Kalnischkies2014-05-101-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The cache heavily depends on the architecture(s) it is build for, especially if you move from single- to multiarch. Adding a new architecture to dpkg therefore has to be detected and must invalidate the cache so that we don't operate on incorrect data. The incorrect data will prevent us from doing otherwise sensible actions (it doesn't allow bad things to happen) and the recovery is simple and automatic in most cases, so this hides pretty well and is also not as serious as it might sound at first. Closes: 745036
* | mark as Automatic/Downloadable pure as gcc suggestsDavid Kalnischkies2014-05-081-2/+2
| | | | | | | | | | Git-Dch: Ignore Reported-By: gcc
* | Merge branch 'debian/sid' into debian/experimentalMichael Vogt2014-05-071-14/+21
|\| | | | | | | | | | | | | | | | | | | | | | | Conflicts: apt-pkg/cachefilter.h apt-pkg/contrib/fileutl.cc apt-pkg/contrib/netrc.h apt-pkg/deb/debsrcrecords.cc apt-pkg/init.h apt-pkg/pkgcache.cc debian/apt.install.in debian/changelog
| * abstract version hash comparison a bitDavid Kalnischkies2014-03-131-0/+4
| | | | | | | | | | | | | | | | In #737085 we see that apt can be confused if informations about versions only differ slightly. This commit adds a way of at least adding a few more data points with the next abi break to help a bit with it. Git-Dch: Ignore
| * cleanup headers and especially #includes everywhereDavid Kalnischkies2014-03-131-2/+5
| | | | | | | | | | | | | | | | Beside being a bit cleaner it hopefully also resolves oddball problems I have with high levels of parallel jobs. Git-Dch: Ignore Reported-By: iwyu (include-what-you-use)
| * warning: unused parameter ‘foo’ [-Wunused-parameter]David Kalnischkies2014-03-131-1/+1
| | | | | | | | | | Reported-By: gcc -Wunused-parameter Git-Dch: Ignore
| * warning: extra ‘;’ [-Wpedantic]David Kalnischkies2014-03-131-10/+10
| | | | | | | | | | Git-Dch: Ignore Reported-By: gcc -Wpedantic
* | Merge branch 'debian/sid' into debian/experimentalMichael Vogt2014-02-271-2/+2
|\| | | | | | | | | | | | | | | Conflicts: apt-private/private-list.cc configure.ac debian/apt.install.in debian/changelog
| * Fix typos in documentation (codespell)Michael Vogt2014-02-221-2/+2
| |
* | Merge branch 'debian/sid' into debian/experimentalMichael Vogt2013-08-151-0/+12
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: apt-pkg/contrib/strutl.cc apt-pkg/deb/dpkgpm.cc configure.ac debian/changelog doc/po/apt-doc.pot po/apt-all.pot po/ar.po po/ast.po po/bg.po po/bs.po po/ca.po po/cs.po po/cy.po po/da.po po/de.po po/dz.po po/el.po po/es.po po/eu.po po/fi.po po/fr.po po/gl.po po/hu.po po/it.po po/ja.po po/km.po po/ko.po po/ku.po po/lt.po po/mr.po po/nb.po po/ne.po po/nl.po po/nn.po po/pl.po po/pt.po po/pt_BR.po po/ro.po po/ru.po po/sk.po po/sl.po po/sv.po po/th.po po/tl.po po/uk.po po/vi.po po/zh_CN.po po/zh_TW.po test/integration/framework test/integration/test-bug-602412-dequote-redirect test/integration/test-ubuntu-bug-346386-apt-get-update-paywall test/interactive-helper/aptwebserver.cc test/interactive-helper/makefile
| * Version 3 for DPkg::Pre-Install-Pkgs with MultiArch infoDavid Kalnischkies2013-07-111-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds on top of Version 2 to all displayed version numbers the architecture as well as the MultiArch flag for consumption by the hooks. Most of the time the architecture will be the same for both versions displayed, but packages might change from "all" to "any" (or back) between versions so we can't display the architecture for packages. Pseudo-Format for Version 3: <name> <version> <arch> <m-a-flag> <compare> <version> <arch> <m-a-flag> Examples: stuff - - none < 1 amd64 none **CONFIGURE** libsame 1 i386 same < 2 i386 same **CONFIGURE** stuff 2 i386 none > 1 i386 none **CONFIGURE** libsame 2 i386 same > - - none **REMOVE** toolkit 1 all foreign > - - none **REMOVE** Closes: #712116