From: Lucian Mogosanu Date: Fri, 23 Aug 2019 13:58:08 +0000 (+0300) Subject: posts: 09c X-Git-Url: https://git.mogosanu.ro/?a=commitdiff_plain;h=5601b0ebeecd56a8d7bf6e7b9cdafbf32b4e7992;p=thetarpit.git posts: 09c --- diff --git a/drafts/000-hunchentoot-vi.markdown b/drafts/000-hunchentoot-vi.markdown deleted file mode 100644 index 2f474cf..0000000 --- a/drafts/000-hunchentoot-vi.markdown +++ /dev/null @@ -1,382 +0,0 @@ ---- -postid: 000 -title: Hunchentoot: requests and replies [a] -date: August 23, 2019 -author: Lucian Mogoșanu -tags: tech, tmsr ---- - -This post is part of a series on [Common Lisp WWWism][cl-www], more -specifically dissecting the Common Lisp web server known as -Hunchentoot. Other posts in the series include: - -* a walk through the project's [history][hunchentoot-i]; -* a set of [architectural notes][hunchentoot-ii] and - [follow-up][hunchentoot-iii] on the same; -* a review of [acceptors][hunchentoot-iv]; and -* a review of [taskmasters][hunchentoot-v]. - -This post is a two-parter (see below why) that will discuss the -objects known as "requests" and "replies", as they are part of the -very same fundamental mechanism. - -The reader has probably noticed that [little][chunking] to nothing has -been discussed about the core of this whole orchestra, the core being -the *HTTP* piece -- yes, we're what looks like more than halfway -through this series and most of what we've discussed comprises -[TCPisms][tcp] and [CL-on-PC][alf-cl-on-pc]. So let's begin our -incursion into the subject with a [likbez][likbez]: - -The idea behind HTTP is simple: let there be a network of nodes N; let -the nodes be divided into client nodes C and server nodes S[^1]; let -every node s in S be associated with a set of resources Rs. In this -framework, HTTP specifies a means for a client c in C to access a -resource r in Rs, knowing that s in S is a server. Furthermore, it -allows one c to interact with an r owned by s in other ways, such as -by "posting" data to that r. Additionally, newer specifications of the -protocol have introduced other so-called "methods" of interaction; I -will deliberately omit them, both for the sake of brevity and because -all or most of the "additional" stuff is to be burned down and left -out of any such future "HTTP" protocol[^2]. - -So let's say that a c in C wants to interact with a s in S as per -above. The premise being that c and s communicate using HTTP, then c -will send s a message called a request, which contains the resource to -be accessed, the method and other such information, as specified in -the [RFC][rfc-1945-5]. Upon receiving this request message, s will -respond to c with a message called a response, which contains a status -code, the message size, the data and so on, again, as per the -[RFC][rfc-1945-6]. This is then (as viewed from the airplane) the -whole mechanism that our Hunchentoot needs to implement in order to be -able to communicate with curls, web clients, proxies and so on: -receiving and processing requests; and baking and sending -replies. Note that Hunchentoot merely provides *the mechanism* for -this; the actual policy (i.e. whether a resource is to be associated -with a file on the disk, or a set of database entries or whatever) is -implemented by the user. - -In what is becoming a traditional tarpitian style of documenting code, -we will move directly to: - -**[[]]** [**request**][ht-c-req]: Object holding everything pertaining -to a HTTP request: headers, a method, local/remote address/ports, the -protocol version, a socket stream, GET/POST parameters, the resource -being requested; additionally: the raw POST data, a reference to the -acceptor on which the request was made, "auxiliary data" to be -employed by the user however he or she wishes. - -Note that (somewhat counter-intuitively) request parsing and object -creation doesn't occur in request.lisp, but upstream in -[process-connection][ht-pc]; more specifically, headers are parsed in -headers.lisp, in [get-request-data][ht-iv-grd][^3]. So then what does -request.lisp do? Well, it: defines the data structure; implements a -lot of scaffolding for GET and POST parameter parsing; implements a -user interface for request handlers; and finally, it creates a context -in which request handlers can be safely executed, i.e. if something -fails, execution can be unwound to the place where -[handle-request][ht-hr] was called and an error response can be logged -and returned. Let's look at each of these pieces. - -The first set of functions deals with parameter parsing. In -particular, GET parameter parsing is performed when a request object -is instantiated, while POST parameters are parsed on request, -i.e. when the accessor method is called. Let's see: - -[ii2] [**initialize-instance**][ht-ii2]: -Similarly to other pieces of code [under review][ht-otpct-ii], this is -an ":after" method which gets called immediately after an object -instantiation. a. an error handling context is defined; in which -b. [script-name][ht-sn] and [query-string][ht-qs] are set based on the -request URI[^4]; c. [get-parameters][ht-gp] are set[^5]; d. -[cookies-in][ht-ci-acc] are set[^6]; e. [session][ht-sess] is set[^7]; -and finally, if everything fails, f. [log-message\*][ht-lms] is called -to log the error and [return-code\*][ht-rcs] is set to -http-bad-request. - -By the way, since HTTP hasn't escaped Unicode, URL decoding needs a -character format, which is determined based on the content-type field -in the header, which is determined using the -[external-format-from-content-type][ht-effct] function. - -[effct] -[**external-format-from-content-type**][ht-effct]: Takes a -content-type string as an argument; if this argument is non-nil, then -take the charset from [parse-content-type][ht-pct][^8]; and try to -convert the result into a flexi-streams "external-format" via -[make-external-format][flex-mef]. If this fails, send a -[warning][ht-hw]. - -[mrpp] -[**maybe-read-post-parameters**][ht-mrpp]: This does quite a bit of -checking on the parameters it receives, namely it only does something -when: the content-type header is set; and the request method is POST; -and [the "force" parameter is set; or the [raw-post-data][ht-rpd-slot] -slot is not set]; and the [raw-post-data][ht-rpd-slot] slot is not set -to t -- to quote from a comment, "can't reparse multipart posts, even -when FORCEd". Furthermore, for the function to do anything, the -content-length header must be set or [input chunking][ht-icp] must be -enabled; otherwise, a warning is logged and the function returns. - -If all checks pass, then wrapped in a condition handler: -a. [parse-content-type][ht-pct] (see [footnote #8](#fn8) for details), -yielding a type, subtype and charset; b. try [making][flex-mef] an -external-format based b1. on the external-format parameter, and b2. on -the charset found at (a), and b3. if all fails, fall back to -[\*hunchentoot-default-external-format\*][ht-shdefs]. - -Once we have an external-format, c. populate the -[post-parameters][ht-pp-slot] slot: c1. if content-type is -"application/x-www-form-urlencoded", then use -form-url-encoded-list-to-alist (see [footnote #5](#fn5)); otherwise -c2. the content-type is "multipart/form-data", which is parsed using -[parse-multipart-form-data](#pmfd). - -Finally, d. if something fails in one of the previous steps, then -d1. an error is logged; d2. the return-code is set to -http-bad-request; and d3. the request is [aborted][ht-arh]. - -[pmfd] -[**parse-multipart-form-data**][ht-pmfd]: a. in a condition-handling -context; b. make a new content-stream with the external-format set to -latin-1[^9]; then on that content-stream, -c. [parse-rfc2388-form-data](#prfd); then d. [get-post-data](#gpd); -and e. if the result from (d) is a non-empty string, it's considered -"stray data" and reported; finally, f. if an error occurs, it's logged -and nothing is returned. - -Otherwise, the result from (c) is returned, as per -[prog1][clhs-prog1]. - -[prfd] -[**parse-rfc2388-form-data**][ht-prfd]: Fortunately for us, parsing -multipart-blah-blah is encapsulated in yet another [RFC][rfc-2388] of -its own, for which there already exists a CL -"library"[^10]. Unfortunately, the coad written around said "library" -is still kludgy. Let's see. - -a\. parse the content-type header; then b. look for a "boundary" -content-type parameter, and return empty-handed if that doesn't exist; -otherwise c. for each MIME part; d. get the MIME headers; and -e\. particularly, the content-disposition header; and f. particularly, -the "name" field of that header. - -g\. when the item at (f) exists, append the following to the result: -g1. the item at (f), converted using [convert-hack](#ch); and g2. the -contents, converted using the same [convert-hack](#ch). However, -mime-part-contents can return either[^11] g2i. a path to a local file, -in which case the coad stores the path, the (converted) filename and -its content-type; or g2ii. a string, in which case the (converted) -string is stored. - -[ch] [**convert-hack**][ht-ch]: You might -wonder what this does and why it exists in the first place. Let's -quote from the documentation itself: - -> The rfc2388 package is buggy in that it operates on a character -> stream and thus only accepts encodings which are 8 bit -> transparent. In order to support different encodings for parameter -> values submitted, we post process whatever string values the rfc2388 -> package has returned. - -I don't know what the fuck "8 bit transparent" means, but the function -does exactly this: it converts the input string to a raw vector of -octets, then converts said vector (using [octets-to-string][flex-ots]) -to a string of the encoding given by the external-format parameter. So -this is just dancing around the previous [latin1](#pmfd) game -- yes, -if you send a UTF-8-encoded file wrapped in a (ISO-8859-1-encoded) -POST request, the result will be mixed-encoding data, and whoever gets -said data will have to make heads and tails of the resulting pile of -shit. - -I *can't wait* for the moment when the ban on this multipart fungus -comes into effect, it'll be a joyous day. - -[gpd] [**get-post-data**][ht-gpd]: Reads -data from the request stream and sets the [raw-post-data][ht-rpd-slot] -slot: - -a\. if the want-stream argument is set, then the stream is converted to -latin-1-encoded (as per above) and the slot is set to this stream, -bound by the content-length (if this field exists). - -b\. if content-length is set and it's greater than the already-read -argument -- i.e. there is still data to be read from the stream, -assuming the user has already read some of it -- then check whether -[chunking][ht-icp] is enabled and, if so, log a warning; either way, -read the content and let it be assigned to raw-post-data. - -c\. if [chunking][ht-icp] is enabled, then c1. setup two arrays: an -adjustable "content" array and a buffer; c2. setup a position marker -for the content array; c3. read into the buffer; then c4. adjust the -content array to the new size; then c5. copy data from the buffer into -the content array at the current position; and finally, c6. stop when -there's no more content to be read. - -As you can well see, I am running out of space, so contrary to [the -schedule][tmsr-work-iv] I'm going to split this into two pieces, the -second part to be published next week. Annoyingly enough, this is also -delaying [other work][logs-ttp-comments], including the fabled -tarpitian-comment-server, so for now the venue for comments remains -[#spyked][contact]. - -[^1]: In practice some c in C can also be a s in S and vice-versa, why - not? In a sane world C and S would be the same set, and thus our - client-server-herpderp model would become that of a peer-to-peer - network, that is, one in which all nodes would both host resources - and ask for them. Again, I ask: why not? And if not, then pray - tell, why does the [Incan][inca] star topology appeal so much to - you? - -[^2]: Witness, just as an example, the difference between RFCs - [1945][rfc-1945] and [2616][rfc-2616]: the former specifies - precisely three methods -- of which the second is, for purely - practical reasons, a subset of the first -- while the - latter... well! - - By the way, do you think this is all? Nope: current specifications - split HTTP into no less than [seven parts][http-rfcs], which makes - this a tome of its own. And as if this wasn't enough, as of 2019 - RFC 7230 has been obsoleted by [8615][rfc-8615], and if this keeps - up at the current pace God help us, by 2029 we'll probably get to - RFCs numbered in the hundred thousands. - - In other words, fuck these sons of bitches and all their cancerous - "improvements". - -[^3]: And here's where I find out I've actually been reading all this - in the correct order, given that *I know* where this particular - bit occurs and I don't need to spend hours digging into finding - out. Pretty neat, huh? - - Now how 'bout *you* get a blog and start doing this for the coad - that you're using? Wouldn't that be neat? - -[^4]: Given an URI of the general form - http://my-resource.tld?p1=v1&p2=v2..., script-name denotes the - part before the question mark, while query-string denotes the part - after it. - -[^5]: The query-string is split by ampersands and passed to - [form-url-encoded-list-to-alist][ht-fuelta], which takes this list - and splits each element by the equals sign. Thus the string - p1=v1&p2=v2... ends up being represented as the association list: - - ~~~~ {.commonlisp} - (("p1" . "v1") ("p2" . "v2") ...) - ~~~~ - -[^6]: The process is similar to the previous footnote. The cookie - string is split and passed to [cookies-to-alist][ht-cta] which - does pretty much the same thing as the previously-described - function, only there's no URL decoding going on. - -[^7]: I've set to deliberately omit this part since the beginning, so - I won't go into details here. - -[^8]: Did I by any chance mention HTTP has grown into a tumour-like - structure? As but one of many examples of malignant cells: the - "content-type" field contains a type/subtype part - (e.g. "text/plain", or "application/octet-stream" or whatever); - but it also contains *parameters*, such as a charset, or - "multipart" markers, into which I won't get just yet, because my - blood pressure is already going up. - - Anyway, [parse-content-type][ht-pct] reads all these and returns a - type, a subtype and a charset, if it exists. - - By the by, unlike the previous footnotes, the "parameter" alist is - constructed using Chunga's [read-name-value-pairs][chunga-rnvp], - which seems to do more or less the same thing as those - functions. So where the fuck does all this duplication come from? - -[^9]: This looks confusing after all the previous "external-format" - fudge. Note that *the content* has a user-provided format, while - the multipart-blah-blah syntax is [by default][rfc-1945-3-6-1] - Latin1. This is not fucking amusing, I know. - -[^10]: Written by one Janis Dzerins. There's also a - variation/"improvement" on this coad, rfc-2388-binary, which is - used by another Lisp piece that I have on my disk, I forgot which - one. - - Speaking of which, I expect I'll run into more of these "duplicate - libraries" problems in the future, which will require - porting/adaptation work. No, I'm not gonna use a thousand - libraries for XML parsing, there's [one][s-xml], and that's it. If - you're gonna complain, then you'd better have a good reason, and - be ready to explain where the fuck you were when I needed that - code. - -[^11]: Y'know, I didn't set out to review *this* piece of code back - when I started this, but it can't be helped. mime-part-contents is - the "-contents" part of a [defstruct][r2388-mp] built using - [parse-mime][r2388-pm]. Now, this code behaves in (what I believe - to be) a very odd manner, namely: when it encounters arbitrary - string input, it returns it as-is; the alternative is to encounter - a MIME part that contains a filename field, in which case this - coad creates a new temporary file and stores the content there; in - this case, the mime-part structure will contain a bunch of headers - and a *path* to the contents, which is an unexpected extra layer - of indirection, because why the fuck not. - - This behaviour can be overriden by setting the - "write-content-to-file" parameter to nil. However, this is the - *default* behaviour, and what Hunchentoot expects. Fuck my life. - -[ht-c-req]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L31 -[ht-ci-acc]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L68 -[ht-gp]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L71 -[ht-pp-slot]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L75 -[ht-sn]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L79 -[ht-qs]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L83 -[ht-sess]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L86 -[ht-rpd-slot]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L94 -[ht-ch]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L118 -[ht-prfd]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L127 -[ht-gpd]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L150 -[ht-ii2]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L185 -[ht-pmfd]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L266 -[ht-mrpp]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L282 -[ht-effct]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L495 -[ht-pc]: /posts/y06/098-hunchentoot-iv.html#pc -[ht-iv-grd]: /posts/y06/098-hunchentoot-iv.html#fn3 -[ht-hr]: /posts/y06/098-hunchentoot-iv.html#hr -[ht-otpct-ii]: /posts/y06/09b-hunchentoot-v.html#otpct-ii -[ht-fuelta]: http://coad.thetarpit.org/hunchentoot/c-util.lisp.html#L244 -[ht-cta]: http://coad.thetarpit.org/hunchentoot/c-util.lisp.html#L255 -[ht-lms]: /posts/y06/098-hunchentoot-iv.html#lms -[ht-rcs]: http://coad.thetarpit.org/hunchentoot/c-reply.lisp.html#L103 -[ht-pct]: http://coad.thetarpit.org/hunchentoot/c-util.lisp.html#L283 -[ht-hw]: http://coad.thetarpit.org/hunchentoot/c-conditions.lisp.html#L58 -[ht-icp]: http://coad.thetarpit.org/hunchentoot/c-util.lisp.html#L342 -[ht-shdefs]: http://coad.thetarpit.org/hunchentoot/c-specials.lisp.html#L275 -[ht-arh]: /posts/y06/098-hunchentoot-iv.html#selection-762.0-762.5 -[cl-www]: /posts/y05/090-tmsr-work-ii.html#selection-108.0-108.17 -[hunchentoot-i]: /posts/y05/093-hunchentoot-i.html -[hunchentoot-ii]: /posts/y05/096-hunchentoot-ii.html -[hunchentoot-iii]: /posts/y06/097-hunchentoot-iii.html -[hunchentoot-iv]: /posts/y06/098-hunchentoot-iv.html -[hunchentoot-v]: /posts/y06/09b-hunchentoot-v.html -[chunking]: /posts/y06/098-hunchentoot-iv.html#fn2 -[tcp]: /posts/y05/096-hunchentoot-ii.html#fn1 -[alf-cl-on-pc]: http://trilema.com/2019/trilema-goes-dark/#comment-130686 -[likbez]: http://logs.nosuchlabs.com/log-search?q=likbez&chan=trilema -[inca]: http://trilema.com/republican-thesaurus/?b=(cca%202016&e=femstate.#select -[rfc-1945]: https://tools.ietf.org/html/rfc1945#section-5.1.1 -[rfc-2616]: https://tools.ietf.org/html/rfc2616#section-5.1.1 -[http-rfcs]: https://www.w3.org/Protocols/ -[rfc-8615]: https://tools.ietf.org/html/rfc8615 -[rfc-1945-5]: https://tools.ietf.org/html/rfc1945#section-5 -[rfc-1945-6]: https://tools.ietf.org/html/rfc1945#section-6 -[flex-mef]: http://edicl.github.io/flexi-streams/#make-external-format -[chunga-rnvp]: https://edicl.github.io/chunga/#read-name-value-pairs -[rfc-1945-3-6-1]: https://tools.ietf.org/html/rfc1945#section-3.6.1 -[clhs-prog1]: http://clhs.lisp.se/Body/m_prog1c.htm -[rfc-2388]: https://tools.ietf.org/html/rfc2388 -[s-xml]: /posts/y05/086-s-xml.html -[r2388-mp]: http://coad.thetarpit.org/rfc2388/c-rfc2388.lisp.html#L404 -[r2388-pm]: http://coad.thetarpit.org/rfc2388/c-rfc2388.lisp.html#L415 -[flex-ots]: http://edicl.github.io/flexi-streams/#octets-to-string -[tmsr-work-iv]: /posts/y06/099-tmsr-work-iv.html#selection-120.0-120.3 -[logs-ttp-comments]: http://logs.nosuchlabs.com/log/trilema/2019-08-16#1929229 -[contact]: http://webchat.freenode.net/?channels=#spyked&nick=f_thetarpit_09c diff --git a/posts/y06/09c-hunchentoot-via.markdown b/posts/y06/09c-hunchentoot-via.markdown new file mode 100644 index 0000000..f2457ab --- /dev/null +++ b/posts/y06/09c-hunchentoot-via.markdown @@ -0,0 +1,389 @@ +--- +postid: 09c +title: Hunchentoot: requests and replies [a] +date: August 23, 2019 +author: Lucian Mogoșanu +tags: tech, tmsr +--- + +This post is part of a series on [Common Lisp WWWism][cl-www], more +specifically dissecting the Common Lisp web server known as +Hunchentoot. Other posts in the series include: + +* a walk through the project's [history][hunchentoot-i]; +* a set of [architectural notes][hunchentoot-ii] and + [follow-up][hunchentoot-iii] on the same; +* a review of [acceptors][hunchentoot-iv]; and +* a review of [taskmasters][hunchentoot-v]. + +This post is a two-parter (see below why) that will discuss the +objects known as "requests" and "replies", as they are part of the +very same fundamental mechanism. + +The reader has probably noticed that [little][chunking] to nothing has +been discussed about the core of this whole orchestra, the core being +the *HTTP* piece -- yes, we're what looks like more than halfway +through this series and most of what we've discussed comprises +[TCPisms][tcp] and [CL-on-PC][alf-cl-on-pc]. So let's begin our +incursion into the subject with a [likbez][likbez]: + +The idea behind HTTP is simple: let there be a network of nodes N; let +the nodes be divided into client nodes C and server nodes S[^1]; let +every node s in S be associated with a set of resources Rs. In this +framework, HTTP specifies a means for a client c in C to access a +resource r in Rs, knowing that s in S is a server. Furthermore, it +allows one c to interact with an r owned by s in other ways, such as +by "posting" data to that r. Additionally, newer specifications of the +protocol have introduced other so-called "methods" of interaction; I +will deliberately omit them, both for the sake of brevity and because +all or most of the "additional" stuff is to be burned down and left +out of any such future "HTTP" protocol[^2]. + +So let's say that a c in C wants to interact with a s in S as per +above. The premise being that c and s communicate using HTTP, then c +will send s a message called a request, which contains the resource to +be accessed, the method and other such information, as specified in +the [RFC][rfc-1945-5]. Upon receiving this request message, s will +respond to c with a message called a response, which contains a status +code, the message size, the data and so on, again, as per the +[RFC][rfc-1945-6]. This is then (as viewed from the airplane) the +whole mechanism that our Hunchentoot needs to implement in order to be +able to communicate with curls, web clients, proxies and so on: +receiving and processing requests; and baking and sending +replies. Note that Hunchentoot merely provides *the mechanism* for +this; the actual policy (i.e. whether a resource is to be associated +with a file on the disk, or a set of database entries or whatever) is +implemented by the user. + +In what is becoming a traditional tarpitian style of documenting code, +we will move directly to: + +**[[]]** [**request**][ht-c-req]: Object holding everything pertaining +to a HTTP request: headers, a method, local/remote address/ports, the +protocol version, a socket stream, GET/POST parameters, the resource +being requested; additionally: the raw POST data, a reference to the +acceptor on which the request was made, "auxiliary data" to be +employed by the user however he or she wishes. + +Note that (somewhat counter-intuitively) request parsing and object +creation doesn't occur in request.lisp, but upstream in +[process-connection][ht-pc]; more specifically, headers are parsed in +headers.lisp, in [get-request-data][ht-iv-grd][^3]. So then what does +request.lisp do? Well, it: defines the data structure; implements a +lot of scaffolding for GET and POST parameter parsing; implements a +user interface for request handlers; and finally, it creates a context +in which request handlers can be safely executed, i.e. if something +fails, execution can be unwound to the place where +[handle-request][ht-hr] was called and an error response can be logged +and returned. Let's look at each of these pieces. + +The first set of functions deals with parameter parsing. In +particular, GET parameter parsing is performed when a request object +is instantiated, while POST parameters are parsed on request, +i.e. when the accessor method is called. Let's see: + +[ii2] [**initialize-instance**][ht-ii2]: +Similarly to other pieces of code [under review][ht-otpct-ii], this is +an ":after" method which gets called immediately after an object +instantiation. a. an error handling context is defined; in which +b. [script-name][ht-sn] and [query-string][ht-qs] are set based on the +request URI[^4]; c. [get-parameters][ht-gp] are set[^5]; d. +[cookies-in][ht-ci-acc] are set[^6]; e. [session][ht-sess] is set[^7]; +and finally, if everything fails, f. [log-message\*][ht-lms] is called +to log the error and [return-code\*][ht-rcs] is set to +http-bad-request. + +By the way, since HTTP hasn't escaped Unicode, URL decoding needs a +character format, which is determined based on the content-type field +in the header, which is determined using the +[external-format-from-content-type][ht-effct] function. + +[effct] +[**external-format-from-content-type**][ht-effct]: Takes a +content-type string as an argument; if this argument is non-nil, then +take the charset from [parse-content-type][ht-pct][^8]; and try to +convert the result into a flexi-streams "external-format" via +[make-external-format][flex-mef]. If this fails, send a +[warning][ht-hw]. + +[mrpp] +[**maybe-read-post-parameters**][ht-mrpp]: This does quite a bit of +checking on the parameters it receives, namely it only does something +when: the content-type header is set; and the request method is POST; +and [the "force" parameter is set; or the [raw-post-data][ht-rpd-slot] +slot is not set]; and the [raw-post-data][ht-rpd-slot] slot is not set +to t -- to quote from a comment, "can't reparse multipart posts, even +when FORCEd". Furthermore, for the function to do anything, the +content-length header must be set or [input chunking][ht-icp] must be +enabled; otherwise, a warning is logged and the function returns. + +If all checks pass, then wrapped in a condition handler: +a. [parse-content-type][ht-pct] (see [footnote #8](#fn8) for details), +yielding a type, subtype and charset; b. try [making][flex-mef] an +external-format based b1. on the external-format parameter, and b2. on +the charset found at (a), and b3. if all fails, fall back to +[\*hunchentoot-default-external-format\*][ht-shdefs]. + +Once we have an external-format, c. populate the +[post-parameters][ht-pp-slot] slot: c1. if content-type is +"application/x-www-form-urlencoded", then use +form-url-encoded-list-to-alist (see [footnote #5](#fn5)); otherwise +c2. the content-type is "multipart/form-data", which is parsed using +[parse-multipart-form-data](#pmfd). + +Finally, d. if something fails in one of the previous steps, then +d1. an error is logged; d2. the return-code is set to +http-bad-request; and d3. the request is [aborted][ht-arh]. + +[pmfd] +[**parse-multipart-form-data**][ht-pmfd]: a. in a condition-handling +context; b. make a new content-stream with the external-format set to +latin-1[^9]; then on that content-stream, +c. [parse-rfc2388-form-data](#prfd); then d. [get-post-data](#gpd); +and e. if the result from (d) is a non-empty string, it's considered +"stray data" and reported; finally, f. if an error occurs, it's logged +and nothing is returned. + +Otherwise, the result from (c) is returned, as per +[prog1][clhs-prog1]. + +[prfd] +[**parse-rfc2388-form-data**][ht-prfd]: Fortunately for us, parsing +multipart-blah-blah is encapsulated in yet another [RFC][rfc-2388] of +its own, for which there already exists a CL +"library"[^10]. Unfortunately, the coad written around said "library" +is still kludgy. Let's see. + +a\. parse the content-type header; then b. look for a "boundary" +content-type parameter, and return empty-handed if that doesn't exist; +otherwise c. for each MIME part; d. get the MIME headers; and +e\. particularly, the content-disposition header; and f. particularly, +the "name" field of that header. + +g\. when the item at (f) exists, append the following to the result: +g1. the item at (f), converted using [convert-hack](#ch); and g2. the +contents, converted using the same [convert-hack](#ch). However, +mime-part-contents can return either[^11] g2i. a path to a local file, +in which case the coad stores the path, the (converted) filename and +its content-type; or g2ii. a string, in which case the (converted) +string is stored. + +[ch] [**convert-hack**][ht-ch]: You might +wonder what this does and why it exists in the first place. Let's +quote from the documentation itself: + +> The rfc2388 package is buggy in that it operates on a character +> stream and thus only accepts encodings which are 8 bit +> transparent. In order to support different encodings for parameter +> values submitted, we post process whatever string values the rfc2388 +> package has returned. + +I don't know what the fuck "8 bit transparent" means, but the function +does exactly this: it converts the input string to a raw vector of +octets, then converts said vector (using [octets-to-string][flex-ots]) +to a string of the encoding given by the external-format parameter. So +this is just dancing around the previous [latin1](#pmfd) game -- yes, +if you send a UTF-8-encoded file wrapped in a (ISO-8859-1-encoded) +POST request, the result will be mixed-encoding data, and whoever gets +said data will have to make heads and tails of the resulting pile of +shit. + +I *can't wait* for the moment when the ban on this multipart fungus +comes into effect, it'll be a joyous day. + +[gpd] [**get-post-data**][ht-gpd]: Reads +data from the request stream and sets the [raw-post-data][ht-rpd-slot] +slot: + +a\. if the want-stream argument is set, then the stream is converted to +latin-1-encoded (as per above) and the slot is set to this stream, +bound by the content-length (if this field exists). + +b\. if content-length is set and it's greater than the already-read +argument -- i.e. there is still data to be read from the stream, +assuming the user has already read some of it -- then check whether +[chunking][ht-icp] is enabled and, if so, log a warning; either way, +read the content and let it be assigned to raw-post-data. + +c\. if [chunking][ht-icp] is enabled, then c1. setup two arrays: an +adjustable "content" array and a buffer; c2. setup a position marker +for the content array; c3. read into the buffer; then c4. adjust the +content array to the new size; then c5. copy data from the buffer into +the content array at the current position; and finally, c6. stop when +there's no more content to be read. + +As you can well see, I am running out of space, so contrary to [the +schedule][tmsr-work-iv] I'm going to split this into two pieces, the +second part to be published next week. Annoyingly enough, this is also +delaying [other work][logs-ttp-comments], including the fabled +tarpitian-comment-server, so for now the venues for comments remain +[#spyked][contact] and (if you know where you're heading) +[#trilema][contact2]. + +P.S.: As per discussion in [the forum][logs-1930325], the next item in +the Hunchentoot series, following this "requests and replies" +miniseries, will be a genesis for the whole thing. + +[^1]: In practice some c in C can also be a s in S and vice-versa, why + not? In a sane world C and S would be the same set, and thus our + client-server-herpderp model would become that of a peer-to-peer + network, that is, one in which all nodes would both host resources + and ask for them. Again, I ask: why not? And if not, then pray + tell, why does the [Incan][inca] star topology appeal so much to + you? + +[^2]: Witness, just as an example, the difference between RFCs + [1945][rfc-1945] and [2616][rfc-2616]: the former specifies + precisely three methods -- of which the second is, for purely + practical reasons, a subset of the first -- while the + latter... well! + + By the way, do you think this is all? Nope: current specifications + split HTTP into no less than [seven parts][http-rfcs], which makes + this a tome of its own. And as if this wasn't enough, as of 2019 + RFC 7230 has been obsoleted by [8615][rfc-8615], and if this keeps + up at the current pace God help us, by 2029 we'll probably get to + RFCs numbered in the hundred thousands. + + Long story short, fuck these sons of bitches and all their + cancerous "improvements". + +[^3]: And here's where I find out I've actually been reading all this + in the correct order, given that *I know* where this particular + bit occurs and I don't need to spend hours digging into finding + out. Pretty neat, huh? + + Now how 'bout *you* get a blog and start doing this for the coad + that you're using? Wouldn't that be neat? + +[^4]: Given an URI of the general form + http://my-resource.tld?p1=v1&p2=v2..., script-name denotes the + part before the question mark, while query-string denotes the part + after it. + +[^5]: The query-string is split by ampersands and passed to + [form-url-encoded-list-to-alist][ht-fuelta], which takes this list + and splits each element by the equals sign. Thus the string + p1=v1&p2=v2... ends up being represented as the association list: + + ~~~~ {.commonlisp} + (("p1" . "v1") ("p2" . "v2") ...) + ~~~~ + +[^6]: The process is similar to the previous footnote. The cookie + string is split and passed to [cookies-to-alist][ht-cta] which + does pretty much the same thing as the previously-described + function, only there's no URL decoding going on. + +[^7]: I've set to deliberately omit this part since the beginning, so + I won't go into details here. + +[^8]: Did I by any chance mention HTTP has grown into a tumour-like + structure? As but one of many examples of malignant cells: the + "content-type" field contains a type/subtype part + (e.g. "text/plain", or "application/octet-stream" or whatever); + but it also contains *parameters*, such as a charset, or + "multipart" markers, into which I won't get just yet, because my + blood pressure is already going up. + + Anyway, [parse-content-type][ht-pct] reads all these and returns a + type, a subtype and a charset, if it exists. + + By the by, unlike the previous footnotes, the "parameter" alist is + constructed using Chunga's [read-name-value-pairs][chunga-rnvp], + which seems to do more or less the same thing as those + functions. So where the fuck does all this duplication come from? + +[^9]: This looks confusing after all the previous "external-format" + fudge. Note that *the content* has a user-provided format, while + the multipart-blah-blah syntax is [by default][rfc-1945-3-6-1] + Latin1. This is not fucking amusing, I know. + +[^10]: Written by one Janis Dzerins. There's also a + variation/"improvement" on this coad, rfc-2388-binary, which is + used by another Lisp piece that I have on my disk, I forgot which + one. + + Speaking of which, I expect I'll run into more of these "duplicate + libraries" problems in the future, which will require + porting/adaptation work. No, I'm not gonna use a thousand + libraries for XML parsing, there's [one][s-xml], and that's it. If + you're gonna complain, then you'd better have a good reason, and + be ready to explain where the fuck you were when I needed that + code. + +[^11]: Y'know, I didn't set out to review *this* piece of code back + when I started this, but it can't be helped. mime-part-contents is + the "-contents" part of a [defstruct][r2388-mp] built using + [parse-mime][r2388-pm]. Now, this code behaves in (what I believe + to be) a very odd manner, namely: when it encounters arbitrary + string input, it returns it as-is; the alternative is to encounter + a MIME part that contains a filename field, in which case this + coad creates a new temporary file and stores the content there; in + this case, the mime-part structure will contain a bunch of headers + and a *path* to the contents, which is an unexpected extra layer + of indirection, because why the fuck not. + + This behaviour can be overriden by setting the + "write-content-to-file" parameter to nil. However, this is the + *default* behaviour, and what Hunchentoot expects. Fuck my life. + +[ht-c-req]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L31 +[ht-ci-acc]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L68 +[ht-gp]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L71 +[ht-pp-slot]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L75 +[ht-sn]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L79 +[ht-qs]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L83 +[ht-sess]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L86 +[ht-rpd-slot]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L94 +[ht-ch]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L118 +[ht-prfd]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L127 +[ht-gpd]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L150 +[ht-ii2]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L185 +[ht-pmfd]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L266 +[ht-mrpp]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L282 +[ht-effct]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L495 +[ht-pc]: /posts/y06/098-hunchentoot-iv.html#pc +[ht-iv-grd]: /posts/y06/098-hunchentoot-iv.html#fn3 +[ht-hr]: /posts/y06/098-hunchentoot-iv.html#hr +[ht-otpct-ii]: /posts/y06/09b-hunchentoot-v.html#otpct-ii +[ht-fuelta]: http://coad.thetarpit.org/hunchentoot/c-util.lisp.html#L244 +[ht-cta]: http://coad.thetarpit.org/hunchentoot/c-util.lisp.html#L255 +[ht-lms]: /posts/y06/098-hunchentoot-iv.html#lms +[ht-rcs]: http://coad.thetarpit.org/hunchentoot/c-reply.lisp.html#L103 +[ht-pct]: http://coad.thetarpit.org/hunchentoot/c-util.lisp.html#L283 +[ht-hw]: http://coad.thetarpit.org/hunchentoot/c-conditions.lisp.html#L58 +[ht-icp]: http://coad.thetarpit.org/hunchentoot/c-util.lisp.html#L342 +[ht-shdefs]: http://coad.thetarpit.org/hunchentoot/c-specials.lisp.html#L275 +[ht-arh]: /posts/y06/098-hunchentoot-iv.html#selection-762.0-762.5 +[cl-www]: /posts/y05/090-tmsr-work-ii.html#selection-108.0-108.17 +[hunchentoot-i]: /posts/y05/093-hunchentoot-i.html +[hunchentoot-ii]: /posts/y05/096-hunchentoot-ii.html +[hunchentoot-iii]: /posts/y06/097-hunchentoot-iii.html +[hunchentoot-iv]: /posts/y06/098-hunchentoot-iv.html +[hunchentoot-v]: /posts/y06/09b-hunchentoot-v.html +[chunking]: /posts/y06/098-hunchentoot-iv.html#fn2 +[tcp]: /posts/y05/096-hunchentoot-ii.html#fn1 +[alf-cl-on-pc]: http://trilema.com/2019/trilema-goes-dark/#comment-130686 +[likbez]: http://logs.nosuchlabs.com/log-search?q=likbez&chan=trilema +[inca]: http://trilema.com/republican-thesaurus/?b=(cca%202016&e=femstate.#select +[rfc-1945]: https://tools.ietf.org/html/rfc1945#section-5.1.1 +[rfc-2616]: https://tools.ietf.org/html/rfc2616#section-5.1.1 +[http-rfcs]: https://www.w3.org/Protocols/ +[rfc-8615]: https://tools.ietf.org/html/rfc8615 +[rfc-1945-5]: https://tools.ietf.org/html/rfc1945#section-5 +[rfc-1945-6]: https://tools.ietf.org/html/rfc1945#section-6 +[flex-mef]: http://edicl.github.io/flexi-streams/#make-external-format +[chunga-rnvp]: https://edicl.github.io/chunga/#read-name-value-pairs +[rfc-1945-3-6-1]: https://tools.ietf.org/html/rfc1945#section-3.6.1 +[clhs-prog1]: http://clhs.lisp.se/Body/m_prog1c.htm +[rfc-2388]: https://tools.ietf.org/html/rfc2388 +[s-xml]: /posts/y05/086-s-xml.html +[r2388-mp]: http://coad.thetarpit.org/rfc2388/c-rfc2388.lisp.html#L404 +[r2388-pm]: http://coad.thetarpit.org/rfc2388/c-rfc2388.lisp.html#L415 +[flex-ots]: http://edicl.github.io/flexi-streams/#octets-to-string +[tmsr-work-iv]: /posts/y06/099-tmsr-work-iv.html#selection-120.0-120.3 +[logs-ttp-comments]: http://logs.nosuchlabs.com/log/trilema/2019-08-16#1929229 +[contact]: http://webchat.freenode.net/?channels=#spyked&nick=f_thetarpit_09c +[contact2]: http://webchat.freenode.net/?channels=#trilema&nick=f_thetarpit_09c +[logs-1930325]: http://logs.nosuchlabs.com/log/trilema/2019-08-23#1930325