summaryrefslogtreecommitdiff
path: root/methods/http.cc
Commit message (Collapse)AuthorAgeFilesLines
* Fix typos reported by codespell in code commentsDavid Kalnischkies2019-07-101-3/+4
| | | | | | | | Also in old changelogs, but nothing really user visible like error messages or alike so barely noteworthy. Reported-By: codespell Gbp-Dch: Ignore
* Apply various suggestions by cppcheckDavid Kalnischkies2019-07-081-3/+3
| | | | Reported-By: cppcheck
* http: Fix Host header in proxied https connectionsSimon Körner2019-06-111-3/+3
| | | | | | | | | | | | | | Currently CONNECT requests use the name of the proxy as Host value, instead of the origin server's name. According to RFC 2616 "The Host field value MUST represent the naming authority of the origin server or gateway given by the original URL." The current implementation causes problems with some proxy vendors. This commit fixes this. [jak: Adding a test case] See merge request apt-team/apt!66
* apt-pkg: URI: Add 'explicit' to single argument constructorJulian Andres Klode2019-04-301-1/+1
| | | | | This needs a fair amount of changes elsewhere in the code, hence this is separate from the previous commits.
* Allow setting Referer header for http methodDavid Kalnischkies2018-11-251-0/+4
| | | | | | Not needed for common interactions, but for some download-file interactions it could be useful to set a specific referer as some servers do not serve requested files otherwise.
* Revert "http: Fix handling of server connection closure"Julian Andres Klode2018-11-131-3/+4
| | | | | | | | This reverts commit fb3f36593563d09a8d1727cc7c6deb0b49823ca2. It caused downloads to hang on long-lived connections on certain servers. Gbp-Dch: full
* http: Fix handling of server connection closureJulian Andres Klode2018-11-121-4/+3
| | | | | | | | | | | | | If the server closed the connection while we're reading data, and we end up not having any data left to write; that is, for example, we received 0 bytes, then we did not exit before, as we only returned success if there was data to write. This is wrong: Obviously, if we have reached our limit, we are done anyway. It's a bit unclear if we actually ever reached this part, but it does make some sense wrt the bug below. LP: #1801338
* Use steady clock source for bandwidth limitationDavid Kalnischkies2018-05-291-9/+9
| | | | | Using the time of day for this is slightly wrong just like it is for progress, just less visible.
* Remove unused time-tracking from http methodDavid Kalnischkies2018-05-281-17/+0
| | | | | | | The Stats method isn't called anywhere, was partly commented out before, but we keep updating the time for it – lets avoid this pointless busywork. Gbp-Dch: Ignore
* Lower default timeout from 120s to 30sJulian Andres Klode2018-05-241-2/+2
| | | | | 120s is an insanely high default time out, lower it to 30s to make things a bit nicer.
* Remove obsolete RCS keywordsGuillem Jover2018-05-071-1/+0
| | | | Prompted-by: Jakub Wilk <jwilk@debian.org>
* reimplement and simplify mirror:// methodDavid Kalnischkies2018-01-031-1/+13
| | | | | | | | | | Embedding an entire acquire stack and HTTP logic in the mirror method made it rather heavy weight and fragile. This reimplement goes the other way by doing only the bare minimum in the method itself and instead redirect the actual download of files to their proper methods. The reimplementation drops the (in the real world) unused query-string feature as it isn't really implementable in the new architecture.
* report transient errors as transient errorsDavid Kalnischkies2017-12-131-80/+123
| | | | | | | | | | | | The Fail method for acquire methods has a boolean parameter indicating the transient-nature of a reported error. The problem with this is that Fail is called very late at a point where it is no longer easily identifiable if an error is indeed transient or not, so some calls were and some weren't and the acquire system would later mostly ignore the transient flag and guess by using the FailReason instead. Introducing a tri-state enum we can pass the information about fatal or transient errors through the callstack to generate the correct fails.
* Also look at https_proxy for https URLsJulian Andres Klode2017-11-191-4/+13
| | | | | We accidentally regressed here in 1.5 when replacing the https method.
* Sandbox methods with seccomp-BPF; except cdrom, gpgv, rshJulian Andres Klode2017-10-221-0/+2
| | | | | | | | | | | | This reduces the number of syscalls to about 140 from about 350 or so, significantly reducing security risks. Also change prepare-release to ignore the architecture lists in the build dependencies when generating the build-depends package for travis. We might want to clean up things a bit more and/or move it somewhere else.
* Run Proxy-Auto-Detect script from main processJulian Andres Klode2017-10-221-1/+3
| | | | | | | This avoids running the Proxy-Auto-Detect script inside the untrusted (well, less trusted for now) sandbox. This will allow us to restrict the http method from fork()ing or exec()ing via seccomp.
* allow the auth.conf to be root:root ownedDavid Kalnischkies2017-07-261-2/+1
| | | | | | | | | | | Opening the file before we drop privileges in the methods allows us to avoid chowning in the acquire main process which can apply to the wrong file (imagine Binary scoped settings) and surprises users as their permission setup is overridden. There are no security benefits as the file is open, so an evil method could as before read the contents of the file, but it isn't worse than before and we avoid permission problems in this setup.
* reimplement and document auth.confDavid Kalnischkies2017-07-261-3/+2
| | | | | | | | | | | | | | | | | | We have support for an netrc-like auth.conf file since 0.7.25 (closing 518473), but it was never documented in apt that it even exists and netrc seems to have fallen out of usage as a manpage for it no longer exists making the feature even more arcane. On top of that the code was a bit of a mess (as it is written in c-style) and as a result the matching of machine tokens to URIs also a bit strange by checking for less specific matches (= without path) first. We now do a single pass over the stanzas. In practice early adopters of the undocumented implementation will not really notice the differences and the 'new' behaviour is simpler to document and more usual for an apt user. Closes: #811181
* fail early in http if server answer is too small as wellDavid Kalnischkies2017-07-261-2/+2
| | | | | | | | | Failing on too much data is good, but we can do better by checking for exact filesizes as we know with hashsums how large a file should be, so if we get a file which has a size we do not expect we can drop it directly, regardless of if the file is larger or smaller than what we expect which should catch most cases which would end up as hashsum errors later now a lot sooner.
* fail earlier if server answers with too much dataDavid Kalnischkies2017-07-261-3/+13
| | | | | | | | | We tend to operate on rather large static files, which means we usually get Content-Length information from the server. If we combine this information with the filesize we are expecting (factoring in pipelining) we can avoid reading a bunch of data we are ending up rejecting anyhow by just closing the connection saving bandwidth and time both for the server as well as the client.
* Reformat and sort all includes with clang-formatJulian Andres Klode2017-07-121-8/+8
| | | | | | | | | | | | | This makes it easier to see which headers includes what. The changes were done by running git grep -l '#\s*include' \ | grep -E '.(cc|h)$' \ | xargs sed -i -E 's/(^\s*)#(\s*)include/\1#\2 include/' To modify all include lines by adding a space, and then running ./git-clang-format.sh.
* Stop bragging about old speeds in http.cc commentsJulian Andres Klode2017-07-031-8/+0
| | | | | | That's just ridiculous these days. Gbp-Dch: ignore
* http: Add support for https:// proxiesJulian Andres Klode2017-06-301-3/+8
| | | | | | HTTPS proxies just require unwrapping the TLS layer at the proxy connection, that's easy, and of course sending proxy-specific headers that are sent on "http" proxies.
* http: Add support for CONNECT proxying to HTTPS locationsJulian Andres Klode2017-06-301-1/+128
| | | | | | | | | | | | | | | | | | | Proxying HTTPS traffic requires the proxy providing the CONNECT method. This implements the client side of it, although it is a bit hacky. HTTP connect is a normal HTTP CONNECT request, followed by a normal HTTP response, just that the body of the response is the TCP stream of the target host. We use a special wrapper in case there are data bytes in the header packets - in that case, the bytes are stored in a buffer and the buffer will be drained first, afterwards the connection continues directly with the TCP stream (with one more vcall). Also: Do not send full URI to https destinations when proxying, as we are directly interfacing with the destination data stream.
* support tor+https being handled by httpDavid Kalnischkies2017-06-281-3/+10
| | | | | | The apt-transport-tor package operates via simple symlinks which can result in 'http' being called as 'tor+https', so it must pick up the right configuration pieces and trigger https support also in plus names.
* methods: http: Drain pending data before selectingJulian Andres Klode2017-06-281-1/+9
| | | | | | | | GnuTLS can already have data pending in its buffers, we need to to drain that first otherwise select() might block indefinitely. Gbp-Dch: ignore
* methods: Add HTTPS support to http method, using GnuTLSJulian Andres Klode2017-06-281-177/+10
| | | | | | | | | | | | | | The http method will eventually replace the curl-based https method, but for now, this is an opt-in experiment that can be enabled by setting Dir::Bin::Methods::https to "http". Known issues: - We do not support HTTPS proxies yet - We do not support proxying HTTPS connections yet (CONNECT) - IssuerCert and SslForceVersion are unsupported Gbp-Dch: Full
* methods: connect: Switch from int fds to new MethodFdJulian Andres Klode2017-06-281-37/+40
| | | | | | | | Use std::unique_ptr<MethodFd> everywhere we used an integer-based file descriptor before. This allows us to implement stuff like TLS support easily. Gbp-Dch: ignore
* rename ServerMethod to BaseHttpMethodDavid Kalnischkies2016-12-311-4/+4
| | | | | | | This 'method' is the abstract base for http and https and should as such be called out like this rather using an easily confused name. Gbp-Dch: Ignore
* separating state variables regarding server/requestDavid Kalnischkies2016-12-311-53/+46
| | | | | | | | | | | | | | Having a Reset(bool) method to partially reset certain variables like the download size always were strange, so this commit splits the ServerState into an additional RequestState living on the stack for as long as we deal with this request causing an automatic "reset". There is much to do still to make this code look better, but this is a good first step which compiles cleanly and passes all tests, so keeping it as history might be beneficial and due to avoiding explicit memory allocations it ends up fixing a small memory leak in https, too. Closes: #440057
* http: skip connection cleanup if we close it anyhowDavid Kalnischkies2016-11-111-0/+3
| | | | Suggested in #529794
* improve SOCKS error messages for http slightlyDavid Kalnischkies2016-11-101-14/+46
| | | | | | | | | The 0.0.0.0:0 tor reports is pretty useless by itself, but even if an IP would be reported it is better to show the user the hostname we wanted the proxy to connect to in the same error message. We improve upon it further by looking for this bind address in particular and remap error messages slightly to give users a better chance of figuring out what went wrong. Upstream Tor can't do that as it is technically wrong.
* don't sent Range requests if we know its not acceptedDavid Kalnischkies2016-08-161-1/+1
| | | | | | If the server told us in a previous request that it isn't supporting Ranges with bytes via an Accept-Ranges header missing bytes, we don't try to formulate requests using Ranges.
* reorganize server-states resetting in http/httpsDavid Kalnischkies2016-08-161-0/+7
| | | | | | | | | | | | | We keep various information bits about the server around, some only effecting the currently handled file (like sizes) while others should be persistent (like pipeline detections). http used to reset all file-related manually, which is a bit silly if we already have a Reset() method – which does reset all through –, so extending it with a parameter for reuse and calling it from https too (as this was previously resetting by just creating a new state struct – it uses no value of the persistent state-keeping yet as it supports no pipelining). Gbp-Dch: Ignore
* http: auto-configure for local Tor proxy if called as 'tor'David Kalnischkies2016-08-111-0/+3
| | | | | | | | | | | | | With apts http transport supporting socks5h proxies and all the work in terms of configuration of methods based on the name it is called with it becomes surprisingly easy to implement Tor support equally (and perhaps even a bit exceeding) what is available currently in apt-transport-tor. How this will turn out to be handled packaging wise we will see in https://lists.debian.org/deity/2016/08/msg00012.html , but until this is resolved we can add the needed support without actively enabling it for now, so that this can be tested better.
* implement socks5h proxy support for http methodDavid Kalnischkies2016-08-101-22/+169
| | | | | | | | | | | | | | | | | Socks support is a requested feature in sofar that the internet is actually believing Acquire::socks::Proxy would exist. It doesn't and this commit isn't adding it as that isn't how our configuration works, but it allows Acquire::http::Proxy="socks5h://…". The HTTPS method was changed already to support socks proxies (all versions) via curl. This commit implements only SOCKS5 (RFC1928) with no auth or pass&user auth (RFC1929), but not GSSAPI which is required by the RFC. The 'h' in the protocol name further indicates that DNS resolution is delegated to the socks proxy rather than performed locally. The implementation works and was tested with Tor as socks proxy for which implementing socks5h only can actually be considered a feature. Closes: 744934
* implement generic config fallback for methodsDavid Kalnischkies2016-08-101-29/+30
| | | | | | | | | | The https method implemented for a long while now a hardcoded fallback to the same options in http, which, while it works, is rather inflexible if we want to allow the methods to use another name to change their behavior slightly, like apt-transport-tor does to https – most of the diff being s#https#tor#g which then fails to do the full circle fallthrough tor -> https -> http for https sources. With this config infrastructure this could be implemented now.
* use the same redirection handling for http and httpsDavid Kalnischkies2016-08-101-0/+29
| | | | | | | | cURL which backs our https implementation can handle redirects on its own, but by dealing with them on our own we gain finer control over which redirections will be performed (we don't like https → http) and by whom so that redirections to other hosts correctly spawn a new https method dealing with these instead of letting the current one deal with it.
* detect redirection loops in acquire instead of workersDavid Kalnischkies2016-08-101-0/+6
| | | | | | | Having the detection handled in specific (http) workers means that a redirection loop over different hostnames isn't detected. Its also not a good idea have this implement in each method independently even if it would work
* fail on unsupported http/https proxy settingsDavid Kalnischkies2016-08-101-0/+2
| | | | Closes: #623443
* prevent C++ locale number formatting in text APIs (try 2)David Kalnischkies2016-07-301-1/+1
| | | | | | | Followup of b58e2c7c56b1416a343e81f9f80cb1f02c128e25. Still a regression of sorts of 8b79c94af7f7cf2e5e5342294bc6e5a908cacabf. Closes: 832044
* avoid 416 response teardown binding to null pointerDavid Kalnischkies2016-07-051-7/+9
| | | | | | | | | | methods/http.cc:640:13: runtime error: reference binding to null pointer of type 'struct FileFd' This reference is never used in the cases it has a nullptr, so the practical difference is non-existent, but its a bug still. Reported-By: gcc -fsanitize=undefined
* use +0000 instead of UTC by default as timezone in outputDavid Kalnischkies2016-07-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | All apt versions support numeric as well as 3-character timezones just fine and its actually hard to write code which doesn't "accidently" accepts it. So why change? Documenting the Date/Valid-Until fields in the Release file is easy to do in terms of referencing the datetime format used e.g. in the Debian changelogs (policy §4.4). This format specifies only the numeric timezones through, not the nowadays obsolete 3-character ones, so in the interest of least surprise we should use the same format even through it carries a small risk of regression in other clients (which encounter repositories created with apt-ftparchive). In case it is really regressing in practice, the hidden option -o APT::FTPArchive::Release::NumericTimezone=0 can be used to go back to good old UTC as timezone. The EDSP and EIPP protocols use this 'new' format, the text interface used to communicate with the acquire methods does not for compatibility reasons even if none of our methods would be effected and I doubt any other would (in these instances the timezone is 'GMT' as that is what HTTP/1.1 requires). Note that this is only true for apt talking to methods, (libapt-based) methods talking to apt will respond with the 'new' format. It is therefore strongly adviced to support both also in method input.
* http: don't hang on redirect with length + connection closeDavid Kalnischkies2016-06-151-4/+4
| | | | | | | | Most servers who close the connection do not send a content-length as this is redundant information usually, but some might and while testing with our server and with 'aptwebserver::response-header::Connection' set to 'close' I noticed that http hangs after a redirect in such cases, so if we have the information, just use it instead of discarding it.
* use std::locale::global instead of setlocaleDavid Kalnischkies2016-05-281-1/+0
| | | | | | We use a wild mixture of C and C++ ways of generating output, so having a consistent world-view in both styles sounds like a good idea and should help in preventing regressions.
* prevent C++ locale number formatting in text APIsDavid Kalnischkies2016-05-271-2/+2
| | | | | | | | | | | Setting the C++ locale via std::locale::global(std::locale("")); which would otherwise default to the default C locale (aka: unaffected by setlocale) effects the formatting of numeric types in IO streams, which for output for humans is perfectly sensible, but breaks our many text interfaces used and parsed by us and others without expecting the numbers to be formatted. Closes: #825396
* fix two memory leaks reported by gccDavid Kalnischkies2015-09-141-2/+2
| | | | | Reported-By: gcc -fsanitize=address -fno-sanitize=vptr Git-Dch: Ignore
* Merge branch 'debian/sid' into debian/experimentalMichael Vogt2015-05-221-1/+1
|\ | | | | | | | | | | | | | | | | Conflicts: apt-pkg/pkgcache.h debian/changelog methods/https.cc methods/server.cc test/integration/test-apt-download-progress
| * Fix endless loop in apt-get update that can cause disk fillupMichael Vogt2015-05-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The apt http code parses Content-Length and Content-Range. For both requests the variable "Size" is used and the semantic for this Size is the total file size. However Content-Length is not the entire file size for partital file requests. For servers that send the Content-Range header first and then the Content-Length header this can lead to globbing of Size so that its less than the real file size. This may lead to a subsequent passing of a negative number into the CircleBuf which leads to a endless loop that writes data. Thanks to Anton Blanchard for the analysis and initial patch. LP: #1445239
| * dispose http(s) 416 error page as non-contentDavid Kalnischkies2014-12-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Real webservers (like apache) actually send an error page with a 416 response, but our client didn't expect it leaving the page on the socket to be parsed as response for the next request (http) or as file content (https), which isn't what we want at all… Symptom is a "Bad header line" as html usually doesn't parse that well to an http-header. This manifests itself e.g. if we have a complete file (or larger) in partial/ which isn't discarded by If-Range as the server doesn't support it (or it is just newer, think: mirror rotation). It is a sort-of regression of 78c72d0ce22e00b194251445aae306df357d5c1a, which removed the filesize - 1 trick, but this had its own problems… To properly test this our webserver gains the ability to reply with transfer-encoding: chunked as most real webservers will use it to send the dynamically generated error pages. (The tests and their binary helpers had to be slightly modified to apply, but the patch to fix the issue itself is unchanged.) Closes: 768797