diff options
author | Arch Librarian <arch@canonical.com> | 2004-09-20 16:50:36 +0000 |
---|---|---|
committer | Arch Librarian <arch@canonical.com> | 2004-09-20 16:50:36 +0000 |
commit | 578bfd0aed2ec993f4ad85fa6a7094a852261422 (patch) | |
tree | 737f52267f6468a4b6754f8c6665824767352746 /doc/cache.sgml |
Base revisions
Author: jgg
Date: 1998-07-02 02:58:12 GMT
Base revisions
Diffstat (limited to 'doc/cache.sgml')
-rw-r--r-- | doc/cache.sgml | 761 |
1 files changed, 761 insertions, 0 deletions
diff --git a/doc/cache.sgml b/doc/cache.sgml new file mode 100644 index 000000000..43f8d4fff --- /dev/null +++ b/doc/cache.sgml @@ -0,0 +1,761 @@ +<!doctype debiandoc system> +<!-- -*- mode: sgml; mode: fold -*- --> +<book> +<title>APT Cache File Format</title> + +<author>Jason Gunthorpe <email>jgg@debian.org</email></author> +<version>$Id: cache.sgml,v 1.1 1998/07/02 02:58:12 jgg Exp $</version> + +<abstract> +This document describes the complete implementation and format of the APT +Cache file. The APT Cache file is a way for APT to parse and store a +large number of package files for display in the UI. It's primary design +goal is to make display of a single package in the tree very fast by +pre-linking important things like dependencies and provides. + +The specification doubles as documentation for one of the in-memory +structures used by the package library and the APT GUI. + +</abstract> + +<copyright> +Copyright © Jason Gunthorpe, 1997. +<p> +APT and this document are free software; you can redistribute them and/or +modify them under the terms of the GNU General Public License as published +by the Free Software Foundation; either version 2 of the License, or (at your +option) any later version. + +<p> +For more details, on Debian GNU/Linux systems, see the file +/usr/doc/copyright/GPL for the full license. +</copyright> + +<toc sect> + +<chapt>Introduction +<!-- Purpose {{{ --> +<!-- ===================================================================== --> +<sect>Purpose + +<p> +This document describes the implementation of an architecture +dependent binary cache file. The goal of this cache file is two fold, +firstly to speed loading and processing of the package file array and +secondly to reduce memory consumption of the package file array. + +<p> +The implementation is aimed at an environment with many primary package +files, for instance someone that has a Package file for their CD-ROM, a +Package file for the latest version of the distribution on the CD-ROM and a +package file for the development version. Always present is the information +contained in the status file which might be considered a separate package +file. + +<p> +Please understand, this is designed as a -CACHE FILE- it is not ment to be +used on any system other than the one it was created for. It is not ment to +be authoritative either, ie if a system crash or software failure occures it +must be perfectly acceptable for the cache file to be in an inconsistant +state. Furthermore at any time the cache file may be erased without losing +any information. + +<p> +Also the structures and storage layout is optimized for use by the APT +GUI and may not be suitable for all purposes. However it should be possible +to extend it with associate cache files that contain other information. + +<p> +To keep memory use down the cache file only contains often used fields and +fields that are inexepensive to store, the Package file has a full list of +fields. Also the client may assume that all items are perfectly valid and +need not perform checks against their correctness. Removal of information +from the cache is possible, but blanks will be left in the file, and +unused strings will also be present. The recommended implementation is to +simply rebuild the cache each time any of the data files change. It is +possible to add a new package file to the cache without any negative side +effects. + +<sect1>Note on Pointer access +<p> +Every item in every structure is stored as the index to that structure. +What this means is that once the files is mmaped every data access has to +go through a fixup stage to get a real memory pointer. This is done +by taking the tndex, multiplying it by the type size and then adding +it to the start address of the memory block. This sounds complex, but +in C it is a single array dereference. Because all items are aligned to +their size and indexs are stored as multiples of the size of the structure +the format is immediately portable to all possible architectures - BUT the +generated files are -NOT-. + +<p> +This scheme allows code like this to be written: +<example> + void *Map = mmap(...); + Package *PkgList = (Package *)Map; + Header *Head = (Header *)Map; + char *Strings = (char *)Map; + cout << (Strings + PkgList[Head->HashTable[0]]->Name) << endl; +</example> +<p> +Notice the lack of casting or multiplication. The net result is to return +the name of the first package in the first hash bucket, without error +checks. + +<p> +The generator uses allocation pools to group similarly sized structures in +large blocks to eliminate any alignment overhead. The generator also +assures that no structures overlap and all indexes are unique. Although +at first glance it may seem like there is the potential for two structures +to exist at the same point the generator never allows this to happen. +(See the discussion of free space pools) + <!-- }}} --> + +<chapt>Structures +<!-- Header {{{ --> +<!-- ===================================================================== --> +<sect>Header +<p> +This is the first item in the file. +<example> + struct Header + { + // Signature information + unsigned long Signature; + short MajorVersion; + short MinorVersion; + bool Dirty; + + // Size of structure values + unsigned short HeaderSz; + unsigned short PackageSz; + unsigned short PackageFileSz; + unsigned short VersionSz; + unsigned short DependencySz; + unsigned short ProvidesSz; + + // Structure counts + unsigned long PackageCount; + unsigned long VersionCount; + unsigned long DependsCount; + unsigned long PackageFileCount; + + // Offsets + unsigned long FileList; // PackageFile + unsigned long StringList; // StringItem + + // Pool structures + unsigned long PoolStart[6]; + unsigned long PoolSize[6]; + unsigned long PoolAln[6]; + + // Package name lookup + unsigned long HashTable[512]; // Package + }; +</example> +<taglist> +<tag>Signature<item> +This must contain the hex value 0x98FE76DC which is designed to verify +that the system loading the image has the same byte order and byte size as +the system saving the image + +<tag>MajorVersion +<tag>MinorVersion<item> +These contain the version of the cache file, currently 0.2. + +<tag>Dirty<item> +Dirty is true if the cache file was opened for reading, the client expects +to have written things to it and have not fully synced it. The file should +be erased and rebuilt if it is true. + +<tag>HeaderSz +<tag>PackageSz +<tag>PackageFileSz +<tag>VersionSz +<tag>DependencySz +<tag>ProvidesSz<item> +*Sz contains the sizeof() that particular structure. It is used as an +extra consistancy check on the structure of the file. + +If any of the size values do not exactly match what the client expects then +the client should refuse the load the file. + +<tag>PackageCount +<tag>VersionCount +<tag>DependsCount +<tag>PackageFileCount<item> +These indicate the number of each structure contianed in the cache. +PackageCount is especially usefull for generating user state structures. +See Package::Id for more info. + +<tag>FileList<item> +This contains the index of the first PackageFile structure. The PackageFile +structures are singely linked lists that represent all package files that +have been merged into the cache. + +<tag>StringList<item> +This contains a list of all the unique strings (string item type strings) in +the cache. The parser reads this list into memory so it can match strings +against it. + +<tag>PoolStart +<tag>PoolSize +<tag>PoolAln<item> +The Pool structures manage the allocation pools that the generator uses. +Start indicates the first byte of the pool, Size is the number of bytes +remaining in the pool and Aln (alignment) is the structure size of the pool. +An Aln of 0 indicates the slot is empty. There should be the same number of +slots as there are structure types. The generator stores this information +so future additions can make use of any unused pool blocks. + +<tag>HashTable<item> +HashTable is a hash table that provides indexing for all of the packages. +Each package name is inserted into the hash table using the following has +function: +<example> + unsigned long Hash(string Str) + { + unsigned long Hash = 0; + for (const char *I = Str.begin(); I != Str.end(); I++) + Hash += *I * ((Str.end() - I + 1)); + return Hash % _count(Head.HashTable); + } +</example> +<p> +By iterating over each entry in the hash table it is possible to iterate over +the entire list of packages. Hash Collisions are handled with a singely linked +list of packages based at the hash item. The linked list contains only +packages that macth the hashing function. + +</taglist> + <!-- }}} --> +<!-- Package {{{ --> +<!-- ===================================================================== --> +<sect>Package +<p> +This contians information for a single unique package. There can be any +number of versions of a given package. Package exists in a singly +linked list of package records starting at the hash index of the name in +the Header->HashTable. +<example> + struct Pacakge + { + // Pointers + unsigned long Name; // Stringtable + unsigned long VersionList; // Version + unsigned long TargetVer; // Version + unsigned long CurrentVer; // Version + unsigned long TargetDist; // StringTable (StringItem) + unsigned long Section; // StringTable (StringItem) + + // Linked lists + unsigned long NextPackage; // Package + unsigned long RevDepends; // Dependency + unsigned long ProvidesList; // Provides + + // Install/Remove/Purge etc + unsigned char SelectedState; // What + unsigned char InstState; // Flags + unsigned char CurrentState; // State + + // Unique ID for this pkg + unsigned short ID; + unsigned short Flags; + }; +</example> + +<taglist> +<tag>Name<item> +Name of the package. + +<tag>VersionList<item> +Base of a singely linked list of version structures. Each structure +represents a unique version of the package. The version structures +contain links into PackageFile and the original text file as well as +detailed infromation about the size and dependencies of the specific +package. In this way multiple versions of a package can be cleanly handled +by the system. Furthermore, this linked list is guarenteed to be sorted +from Highest version to lowest version with no duplicate entries. + +<tag>TargetVer +<tag>CurrentVer<item> +This is an index (pointer) to the sub version that is being targeted for +upgrading. CurrentVer is an index to the installed version, either can be +0. + +<tag>TargetDist<item> +This indicates the target distribution. Automatic upgrades should not go +outside of the specified dist. If it is 0 then the global target dist should +be used. The string should be contained in the StringItem list. + +<tag>Section<item> +This indicates the deduced section. It should be "Unknown" or the section +of the last parsed item. + +<tag>NextPackage<item> +Next link in this hash item. This linked list is based at Header.HashTable +and contains only packages with the same hash value. + +<tag>RevDepends<item> +Reverse Depends is a linked list of all dependencies linked to this package. + +<tag>ProvidesList<item> +This is a linked list of all provides for this package name. + +<tag>SelectedState +<tag>InstState +<tag>CurrentState<item> +These corrispond to the 3 items in the Status field found in the status +file. See the section on defines for the possible values. +<p> +SelectedState is the state that the user wishes the package to be +in. +<p> +InstState is the installation state of the package. This normally +should be Ok, but if the installation had an accident it may be otherwise. +<p> +CurrentState indicates if the package is installed, partially installed or +not installed. + +<tag>ID<item> +ID is a value from 0 to Header->PackageCount. It is a unique value assigned +by the generator. This allows clients to create an array of size PackageCount +and use it to store state information for the package map. For instance the +status file emitter uses this to track which packages have been emitted +already. + +<tag>Flags<item> +Flags are some usefull indicators of the package's state. + +</taglist> + + <!-- }}} --> +<!-- PackageFile {{{ --> +<!-- ===================================================================== --> +<sect>PackageFile +<p> +This contians information for a single package file. Package files are +referenced by Version structures. This is a singly linked list based from +Header.FileList +<example> + struct PackageFile + { + // Names + unsigned long FileName; // Stringtable + unsigned long Version; // Stringtable + unsigned long Distribution; // Stringtable + unsigned long Size; + + // Linked list + unsigned long NextFile; // PackageFile + unsigned short ID; + unsigned short Flags; + time_t mtime; // Modification time + }; +</example> +<taglist> + +<tag>FileName<item> +Refers the the physical disk file that this PacakgeFile represents. + +<tag>Version<item> +Version is the given version, ie 1.3.1, 2.4_revision_1 etc. + +<tag>Distribution<item> +Distribution is the symbolic name for this PackageFile, hamm,bo,rexx etc + +<tag>Size<item> +Size is provided as a simple check to ensure that the package file has not +been altered. + +<tag>ID<item> +See Package::ID. + +<tag>Flags<item> +Provides some flags for the PackageFile, see the section on defines. + +<tag>mtime<item> +Modification time for the file at time of cache generation. + +</taglist> + + <!-- }}} --> +<!-- Version {{{ --> +<!-- ===================================================================== --> +<sect>Version +<p> +This contians the information for a single version of a package. This is a +singley linked list based from Package.Versionlist. + +<p> +The version list is always sorted from highest version to lowest version by +the generator. Also there may not be any duplicate entries in the list (same +VerStr). + +<example> + struct Version + { + unsigned long VerStr; // Stringtable + unsigned long File; // PackageFile + unsigned long Section; // StringTable (StringItem) + + // Lists + unsigned long NextVer; // Version + unsigned long DependsList; // Dependency + unsigned long ParentPkg; // Package + unsigned long ProvidesList; // Provides + + unsigned long Offset; + unsigned long Size; + unsigned long InstalledSize; + unsigned short ID; + unsigned char Priority; + }; +</example> +<taglist> + +<tag>VerStr<item> +This is the complete version string. + +<tag>File<item> +References the PackageFile that this version came out of. File can be used +to determine what distribution the Version applies to. If File is 0 then +this is a blank version. The structure should also have a 0 in all other +fields excluding VerStr and Possibly NextVer. + +<tag>Section<item> +This string indicates which section it is part of. The string should be +contained in the StringItem list. + +<tag>NextVer<item> +Next step in the linked list. + +<tag>DependsList<item> +This is the base of the dependency list. + +<tag>ParentPkg<item> +This links the version to the owning package, allowing reverse dependencies +to determine the package. + +<tag>ProvidesList<item> +Head of the linked list of Provides::NextPkgProv, forward provides. + +<tag>Offset<item> +The byte offset of the first line of this item in the specified +PackageFile + +<tag>Size +<tag>InstalledSize<item> +The archive size for this version. For debian this is the size of the .deb +file. Installed size is the uncompressed size for this version + +<tag>ID<item> +See Package::ID. + +<tag>Priority<item> +This is the parsed priority value of the package. +</taglist> + + <!-- }}} --> +<!-- Dependency {{{ --> +<!-- ===================================================================== --> +<sect>Dependency +<p> +Dependency contains the information for a single dependency record. The records +are split up like this to ease processing by the client. The base of list +linked list is Version.DependsList. All forms of dependencies are recorded +here including Conflicts, Suggests and Recommends. + +<p> +Multiple depends on the same package must be grouped together in +the Dependency lists. Clients should assume this is always true. + +<example> + struct Dependency + { + unsigned long Version; // Stringtable + unsigned long Package; // Package + unsigned long NextDepends; // Dependency + unsigned long NextRevDepends; // Reverse dependency linking + unsigned long ParentVer; // Upwards parent version link + + // Specific types of depends + unsigned char Type; + unsigned char CompareOp; + unsigned short ID; + }; +</example> +<taglist> +<tag>Version<item> +The string form of the version that the dependency is applied against. + +<tag>Package<item> +The index of the package file this depends applies to. If the package file +does not already exist when the dependency is inserted a blank one (no +version records) should be created. + +<tag>NextDepends<item> +Linked list based off a Version structure of all the dependencies in that +version. + +<tag>NextRevDepends<item> +Reverse dependency linking, based off a Package structure. This linked list +is a list of all packages that have a depends line for a given package. + +<tag>ParentVer<item> +Parent version linking, allows the reverse dependency list to link +back to the version and package that the dependency are for. + +<tag>Type<item> +Describes weather it is depends, predepends, recommends, suggests, etc. + +<tag>CompareOp<item> +Describes the comparison operator specified on the depends line. If the high +bit is set then it is a logical or with the previous record. + +<tag>ID<item> +See Package::ID. + +</taglist> + + <!-- }}} --> +<!-- Provides {{{ --> +<!-- ===================================================================== --> +<sect>Provides +<p> +Provides handles virtual packages. When a Provides: line is encountered +a new provides record is added associating the package with a virtual +package name. The provides structures are linked off the package structures. +This simplifies the analysis of dependencies and other aspects A provides +refers to a specific version of a specific package, not all versions need to +provide that provides. + +<p> +There is a linked list of provided package names started from each +version that provides packages. This is the forwards provides mechanism. +<example> + struct Provides + { + unsigned long ParentPkg; // Package + unsigned long Version; // Version + unsigned long ProvideVersion; // Stringtable + unsigned long NextProvides; // Provides + unsigned long NextPkgProv; // Provides + }; +</example> +<taglist> +<tag>ParentPkg<item> +The index of the package that head of this linked list is in. ParentPkg->Name +is the name of the provides. + +<tag>Version<item> +The index of the version this provide line applies to. + +<tag>ProvideVersion<item> +Each provides can specify a version in the provides line. This version allows +dependencies to depend on specific versions of a Provides, as well as allowing +Provides to override existing packages. This is experimental. + +<tag>NextProvides<item> +Next link in the singly linked list of provides (based off package) + +<tag>NextPkgProv<item> +Next link in the singly linked list of provides for 'Version'. + +</taglist> + + <!-- }}} --> +<!-- StringItem {{{ --> +<!-- ===================================================================== --> +<sect>StringItem +<p> +StringItem is used for generating single instances of strings. Some things +like Section Name are are usefull to have as unique tags. It is part of +a linked list based at Header::StringList. +<example> + struct StringItem + { + unsigned long String; // Stringtable + unsigned long NextItem; // StringItem + }; +</example> +<taglist> +<tag>String<item> +The string this refers to. + +<tag>NextItem<item> +Next link in the chain. +</taglist> + <!-- }}} --> +<!-- StringTable {{{ --> +<!-- ===================================================================== --> +<sect>StringTable +<p> +All strings are simply inlined any place in the file that is natural for the +writer. The client should make no assumptions about the positioning of +strings. All stringtable values point to a byte offset from the start of the +file that a null terminated string will begin. + <!-- }}} --> +<!-- Defines {{{ --> +<!-- ===================================================================== --> +<sect>Defines +<p> +Several structures use variables to indicate things. Here is a list of all +of them. + +<sect1>Definitions for Dependency::Type +<p> +<example> +#define pkgDEP_Depends 1 +#define pkgDEP_PreDepends 2 +#define pkgDEP_Suggests 3 +#define pkgDEP_Recommends 4 +#define pkgDEP_Conflicts 5 +#define pkgDEP_Replaces 6 +</example> +</sect1> + +<sect1>Definitions for Dependency::CompareOp +<p> +<example> +#define pkgOP_OR 0x10 +#define pkgOP_LESSEQ 0x1 +#define pkgOP_GREATEREQ 0x2 +#define pkgOP_LESS 0x3 +#define pkgOP_GREATER 0x4 +#define pkgOP_EQUALS 0x5 +</example> +The lower 4 bits are used to indicate what operator is being specified and +the upper 4 bits are flags. pkgOP_OR indicates that the next package is +or'd with the current package. +</sect1> + +<sect1>Definitions for Package::SelectedState +<p> +<example> +#define pkgSTATE_Unkown 0 +#define pkgSTATE_Install 1 +#define pkgSTATE_Hold 2 +#define pkgSTATE_DeInstall 3 +#define pkgSTATE_Purge 4 +</example> +</sect1> + +<sect1>Definitions for Package::InstState +<p> +<example> +#define pkgSTATE_Ok 0 +#define pkgSTATE_ReInstReq 1 +#define pkgSTATE_Hold 2 +#define pkgSTATE_HoldReInstReq 3 +</example> +</sect1> + +<sect1>Definitions for Package::CurrentState +<p> +<example> +#define pkgSTATE_NotInstalled 0 +#define pkgSTATE_UnPacked 1 +#define pkgSTATE_HalfConfigured 2 +#define pkgSTATE_UnInstalled 3 +#define pkgSTATE_HalfInstalled 4 +#define pkgSTATE_ConfigFiles 5 +#define pkgSTATE_Installed 6 +</example> +</sect1> + +<sect1>Definitions for Package::Flags +<p> +<example> +#define pkgFLAG_Auto (1 << 0) +#define pkgFLAG_New (1 << 1) +#define pkgFLAG_Obsolete (1 << 2) +#define pkgFLAG_Essential (1 << 3) +#define pkgFLAG_ImmediateConf (1 << 4) +</example> +</sect1> + +<sect1>Definitions for Version::Priority +<p> +Zero is used for unparsable or absent Priority fields. +<example> +#define pkgPRIO_Important 1 +#define pkgPRIO_Required 2 +#define pkgPRIO_Standard 3 +#define pkgPRIO_Optional 4 +#define pkgPRIO_Extra 5 +</example> +</sect1> + +<sect1>Definitions for PackageFile::Flags +<p> +<example> +#define pkgFLAG_NotSource (1 << 0) +</example> +</sect1> + + <!-- }}} --> + +<chapt>Notes on the Generator +<!-- Notes on the Generator {{{ --> +<!-- ===================================================================== --> +<p> +The pkgCache::MergePackageFile function is currently the only generator of +the cache file. It implements a conversion from the normal textual package +file into the cache file. + +<p> +The generator assumes any package declaration with a +Status: line is a 'Status of the package' type of package declaration. +A Package with a Target-Version field should also really have a status field. +The processing of a Target-Version field can create a place-holder Version +structure that is empty to refer to the specified version (See Version +for info on what a empty Version looks like). The Target-Version syntax +allows the specification of a specific version and a target distribution. + +<p> +Different section names on different versions is supported, but I +do not expect to use it. To simplify the GUI it will mearly use the section +in the Package structure. This should be okay as I hope sections do not change +much. + +<p> +The generator goes through a number of post processing steps after producing +a disk file. It sorts all of the version lists to be in descending order +and then generates the reverse dependency lists for all of the packages. +ID numbers and count values are also generated in the post processing step. + +<p> +It is possible to extend many of the structures in the cache with extra data. +This is done by using the ID member. ID will be a unique number from 0 to +Header->??Count. For example +<example> +struct MyPkgData; +MyPkgData *Data = new MyPkgData[Header->PackageCount]; +Data[Package->ID]->Item = 0; +</example> +This provides a one way reference between package structures and user data. To +get a two way reference would require a member inside the MyPkgData structure. + +<p> +The generators use of free space pools tend to make the package file quite +large, and quite full of blank space. This could be fixed with sparse files. + + <!-- }}} --> + +<chapt>Future Directions +<!-- Future Directions {{{ --> +<!-- ===================================================================== --> +<p> +Some good directions to take the cache file is into a cache directory that +contains many associated caches that cache other important bits of +information. (/var/cache/apt, FHS2) + +<p> +Caching of the info/*.list is an excellent place to start, by generating all +the list files into a tree structure and reverse linking them to the package +structures in the main cache file major speed gains in dpkg might be achived. + + <!-- }}} --> + +</book> |