apt/test/integration/test-pdiff-usage, branch 1.3

apt/test/integration/test-pdiff-usage, branch 1.3_exp3 Debians commandline package manager https://git.kalnischkies.de/apt/atom?h=1.3_exp3 2016-04-25T13:35:52Z don't ask server if we have entire file in partial/ 2016-04-25T13:35:52Z David Kalnischkies david@kalnischkies.de 2016-04-07T15:48:17Z urn:sha1:742f67eaede80d2f9b3631d8697ebd63b8f95427 We have this situation in cases were parts of the transaction are refused (e.g. in a hashsum mismatch) and rerun the update (e.g. in the hope that we get a mirror which is synced this time). Previously we would ask the server with an if-range and in the best case recieve a 416 in response (less featureful server might end up giving us the entire file again or we get the wrong file this time giving us a hashsum mismatch…), which is a waste of time if we know already by checking the hashsums that we got the complete and correct file. make random acquire queues work less random 2016-04-25T13:35:52Z David Kalnischkies david@kalnischkies.de 2016-04-06T10:50:26Z urn:sha1:4aa6ebf6d78131416ef173b1ce472f014da25136 Queues feeding workers like rred are created in a random pattern to get a few of them to run in parallel – but if we already have an idling queue we don't need to assign it to a (potentially new) random queue as that saves us the (agruably small) overhead of starting up a new queue, avoids adding jobs to an already busy queue while others idle and as a bonus reduces the size of debug logs a bit. We also keep starting new queues now until we reach our limit before we assign work at random to them, which should give us a more effective utilisation overall compared to potentially adding work to busy queues while we haven't reached our queue limit yet. stop handling items in doomed transactions 2016-04-07T11:48:31Z David Kalnischkies david@kalnischkies.de 2016-04-05T23:08:57Z urn:sha1:38f8704e419ed93f433129e20df5611df6652620 With the previous commit we track the state of transactions, so we can now use our knowledge to avoid processing data for a transaction which was already closed (via an abort in this case). This is needed as multiple independent processes are interacting in the process, so there isn't a simple immediate full-engine stop and it would also be bad to teach each and every item how to check if its manager has failed subordinate and what to do in that case. In the pdiff case, which deals (potentially) with many items during its lifetime e.g. a hashsum mismatch in another file can abort the transaction the file we try to patch via pdiff belongs to. This causes some of the items (which are already done) to be aborted with it, but items still in the process of acquisition continue in the processing and will later try to use all the items together failing in strange ways as cleanup already happened. The chosen solution is to dry up the communication channels instead by ignoring new requests for data acquisition, canceling requests which are not assigned to a queue and not calling Done/Failed on items anymore. This means that e.g. already started or pending (e.g. pipelined) downloads aren't stopped and continue as normal for now, but they remain in partial/ and aren't processed further so the next update command will pick them up and put them to good use while the current process fails updating (for this transaction group) in an orderly fashion. Closes: 817240 Thanks: Barr Detwix & Vincent Lefevre for log files don't use Desc.URI to calculate .diff/Index filenames 2016-03-14T10:47:19Z David Kalnischkies david@kalnischkies.de 2016-03-13T00:02:30Z urn:sha1:b7a1076f18022cbeb7baf4d82ab8bae0f725a573 The URI descibing an item can change via mirrors/redirectors which causes the .diff/Index files to get the wrong names in storage. Git-Dch: Ignore require $(HASH)-Download field in .diff/Index files 2016-03-14T10:47:19Z David Kalnischkies david@kalnischkies.de 2016-03-14T00:09:32Z urn:sha1:4a808deaac462e7714a345dac676c6da294a2ee0 Now that we ignore SHA1-only files it makes sense to require also the provision of hashes for the compressed patches as this was introduced in the same patchset as support for non-SHA1 hashes in the file itself in dak and adding support in other archive creators (if they support pdiffs at all) will likely be in the same batch. The reason for the change itself is simple: If you are 'scared' enough about the security of SHA1, you shouldn't uncompress a file you haven't verified at all – after all, it could be exploiting a bug or a zip bomb. test: remove SHA1 support testing as unsupported 2016-03-14T10:47:18Z David Kalnischkies david@kalnischkies.de 2016-03-13T20:49:37Z urn:sha1:8d0d92558c00d1825e413ce67be51a46a5c18aea Given that we refuse to use SHA1-only .diff/Indexes no point in shipping and running code which pretends to check support for it which given that all these tests are run 3 times eats a noticeable amount of time. Git-Dch: Ignore Test that SHA1-only .diff/Index files are not used 2016-03-13T12:05:30Z Julian Andres Klode jak@debian.org 2016-03-13T12:05:30Z urn:sha1:f345d0571d055c2cd5da3a9e423753f1ac21a9aa Ensure that .diff/Index files that only contain SHA1 values and no SHA2 values are not used. do not move not-failed pdiff-patches into CWD on failure 2016-03-06T11:57:38Z David Kalnischkies david@kalnischkies.de 2016-03-06T11:03:34Z urn:sha1:dfcf7f356b790338f0a3e9df3c5d6f159814fe53 If a single pdiff fails, we have to fail the entire patching endeavour and fall back to getting the complete file instead. That is easy in serverside merged pdiffs as we get them one by one. For clientside we get them all at once through, which means that a failure in one has to stop the entire pipeline, which works as expected (as proven by the bugreporters as they don't even notice it happening). The problem is just that the first failing pdiff will do the cleanup, so another pdiff which happens to be successfully acquired after we processed the failure doesn't find the file it is supposed to use as a basename anymore, so the patch is renamed to what should be the unique extension and moved into the current working directory. Processing is then stopped as the patch realizes that it isn't the last one which completed downloading. On the plus side this means this is neither us using a bad temporary location nor a security problem. It "just" overrides unconditionally files in your current working directory (if you happen to have them named like a pdiff patch – a bit unlikely perhaps) and so drops files there which are never used again. I guess this was introduced in 4e3c5633b1e74b4f58b95f339cfbbf4cbf21ab3e for real as I made the need for the existence of the base file rather explicit, but the potential lingers in the code for far longer. Closes: #816837 remove uncompressed leftover partial file before pdiff bootstrap 2016-01-08T16:51:23Z David Kalnischkies david@kalnischkies.de 2016-01-08T16:51:23Z urn:sha1:ef3c549e00b2a0487ddee0aeb70e3a29f76c2fbb The code already deals with compressed leftovers, but forgot the uncompressed files. The opertunity is picked to reorder this code and add debug messages about the actions taken as well as produce such a leftover file in the associated testcase. use filesize of compressed pdiffs for the limit if possible 2016-01-08T14:40:01Z David Kalnischkies david@kalnischkies.de 2016-01-08T14:30:05Z urn:sha1:4e6219da0dd1e68fad7db972f7ddd76598645228 With the addition of the $HASH-Download field in the .diff/Index we got the size of the compressed patches for 'free', so if that information is available we can use it for a more fitting calculation of the size requirements of the patches vs. the complete file. Note that this predicts a too small size in the transition case in which the information isn't available for all patches, but figuring this out would be a lot of code for practically nothing as only one update can ever be in such a transition phase.