From: Lucian Mogosanu Date: Tue, 23 Jul 2019 17:26:21 +0000 (+0300) Subject: posts: 096 X-Git-Tag: v0.11~26 X-Git-Url: https://git.mogosanu.ro/?a=commitdiff_plain;h=65abe93ed41ff2c96d74190f7e76ccd58ec03d23;p=thetarpit.git posts: 096 --- diff --git a/drafts/000-hunchentoot-ii.markdown b/drafts/000-hunchentoot-ii.markdown deleted file mode 100644 index 94964c7..0000000 --- a/drafts/000-hunchentoot-ii.markdown +++ /dev/null @@ -1,183 +0,0 @@ ---- -postid: 000 -title: Notes on Hunchentoot architecture -date: July 18, 2019 -author: Lucian Mogoșanu -tags: tech, tmsr ---- - -This post is part of a series on [Common Lisp WWWism][cl-www]. Before -continuing, I heartily recommend a review of [prior][cl-www] -[discussions][m7-work] on the subject. - -As [previously mentioned][hunchentoot-i], Hunchentoot is *big*, that -is, circa 6000-7000 LoC. However, despite its weight, our CL web -server of choice is not just an amorphous pile of code dropped from -the thought-prepuce of its original author; for one, it carries along -with it a well-written [documentation page][edicl-hunch] which -describes each of the pieces; and for the other, judging by the -technical documentation, we can at least hope that the code is to some -degree written by a sane mind. - -Thusly proceeding from these two artifacts (documentation and code), -the first step towards producing a genesis of Hunchentoot will be to -write down my own notes describing its organization -- I will point at -some coad, but I'm not diving into the functional details just yet; -instead, I am looking at the abstractions and the interaction between -components. - -First, some HTTP basics: let's say we have an ideal item H -representing a HTTP server. Our server H is a program that serves -pages to us; more concretely, it a. binds to a port P; b. waits for -incoming requests Rq on P; c. processes each Rq; and d. responds with -a reply Rp for each Rq. - -This model is all nice and dandy, isn't it? Except it doesn't lead us -anywhere by itself. The problem is that HTTP imports the notion of -"connection", and implicitly TCP, into its spec. This makes it very -inconvenient for us, because now H: a. binds to a (TCP) port P; -b. listens on P for incoming connections C; then c. each C needs to be -confirmed on H's end, i.e. it must be accepted by the server; then -d. for each C, H waits for one or more incoming Rqs; then it -e. processes each Rq; and finally, it f. responds with a Rp[^1]. In -addition to the increased complexity of this model, one other problem -is that the server has no way to tell when the requesting entity has -finished sending Rqs, so who's going to end C then? And either way, H -will now find itself in the position of having to manage Cs, which -have nothing to do whatsoever with the notion of a HTTP request. - -Now that we have the basics in place, let's take a look at the -abstractions exposed by our particular H, which for historical reasons -we've decided to christen Hunchentoot. - -In Hunchentoot, the entity that makes a HTTP server (bound to a port, -etc.) come to life is called an **[acceptor][ht-acceptor]**. Acceptors -encapsulate the port, IP address, listening socket, etc. plus some -state and some basic server configuration data, such as the document -root for serving static files and paths to logfiles -- in other words, -all the data needed to perform at least (a), (b) and (c) -above. Moreover, the user can extend acceptor functionality to define -custom handlers for URLs, as illustrated by the -[easy-acceptor][ht-easy-acceptor] subclass. - -However, acceptors don't have any say in *when* connections (the Cs -above) and requests (the Rqs above) are handled, i.e. how tasks are -distributed among workers, and if there are any dedicated worker -threads at all. Work management is done through the -**[taskmaster][ht-taskmaster]** abstraction. A very broad sketch of -how this works: after listening (i.e. (b)) is complete, the acceptor -calls the taskmaster via [execute-acceptor][ht-execute-acceptor], in -order to establish when connections are accepted (i.e. (c)) and where -and when requests are handled (i.e. (d) to (f)). When the taskmaster -is ready for (c), it calls the acceptor's -[accept-connections][ht-accept-connections], which performs the accept -and gives back control to the taskmaster -([handle-incoming-connection][ht-handle-incoming-connection]), which -at some point calls back into the acceptor -([process-connection][ht-process-connection]) to let it perform (d), -(e) and (f). - -The keen reader will by now wonder what's the point to all this -dancing around between taskmasters and acceptors. For one, each -acceptor has a taskmaster, and the other way around; for another, all -this "execute, then accept, then handle, then process" seems -arbitrarily assigned to either the acceptor or the taskmaster, so -really, what the fuck? - -The main reasoning behind this acceptor-taskmaster separation is the -following: acceptors do useful work, which is mainly accepting -connections and handling the requests sent via the former; meanwhile, -taskmasters are hooked immediately before this useful work occurs, so -that they obey a decision made apriori by the user whether said work -will be scheduled on a new thread or performed on the same one. In -other words, we're given [flexibility][s-xml] at the cost of extra -lines of code. Given my lack of direct experience with Hunchentoot, -I'm not sure yet whether this cost is worth it or not, but if it -proves to be more trouble than it's worth, I will personally carve the -thing out. - -Moving on to other abstractions, the next on the list are -**[request][ht-request]** and **[reply][ht-reply]** objects. These, as -the name suggests, encapsulate HTTP request/reply data, such as the -URL, headers, cookies, return codes and so on. To continue on the -previous thread: once the acceptor starts [processing -connections][ht-process-connection] (i.e. (d)), it will create request -objects and process each of them -- -[process-request][ht-process-request] will call -[handle-request][ht-handle-request] (i.e. (e)), which will call -[acceptor-dispatch-request][ht-acceptor-dispatch-request], which can -be customized by the user via e.g. subclassing, to perform request -processing and, finally, step (f). - -I will gloss over **[session][ht-session]** objects for the moment, as -they are less relevant to the overall architecture. It's sufficient to -say that they serve as an abstraction for "stateful shit over this -stateless protocol", which is something I'd be happy to see die a -gruesome death. - -I was going to make a diagram and show some examples of Hunchentoot at -work, but I am well over the one thousand word limit, so I will stop -this episode here. We can now put the next [couple of -weeks][w30-31-work] in perspective, though: - -* I believe there is some merit to making a visual representation of - what I've just written. Procedural programming was designed to - provide separation between logical units of work, but so far - everything here is looking like a tangled mess[^2]. -* All this discussion isn't of much value without some examples of - Hunchentoot in action. Speaking of which: this is perhaps a small - project in itself, but it would be fun to find out and document this - proggy's breaking points. -* I need to dig deeper into coad and start owning it, and the first - victim will be our star child, the acceptor implementation. At the - moment it's hard to give an estimate of how long this will take, but - I'll make sure to look at this before I set out to climb the - mountain. - -[^1]: Quick likbez on how Unix and TCP make the whole thing work: - - In (a), the operating system binds a socket S owned by H to P, - i.e. it keeps a note somewhere that subsequent connections to P - will be assigned to S and that H *may* accept said connections. In - (b), H signals its availability to receive connections by - performing the listen system call; this puts S into the - "listening" state, as specified in [RFC 793][rfc-793]. - - At this point, whenever a TCP client wants to initiate a - connection, it needs to go through the SYN-SYN+ACK-ACK three-way - handshake hula hoop; so the client sends his SYN, and then in - order for communication to move forward, the server must send his - SYN+ACK, which only happens in the accept phase, i.e. (c). Then - the accept function returns a new socket S' for the new - connection, and only then can actual HTTP communication start. - - I won't even go into the details of why this is retarded, it's - been beaten to death in [the logs][tcp]. Either way, there's no - way around this pile of shit for nodes talking to the heathen WWW. - -[^2]: I remember reading the same words somewhere else, and I even - know [where][eulora]. I'm not the first, nor even the second in - line to look at large open sores coads, you see. - -[cl-www]: /posts/y05/090-tmsr-work-ii.html#selection-108.0-108.17 -[m7-work]: /posts/y05/094-tmsr-work-iii.html -[hunchentoot-i]: /posts/y05/093-hunchentoot-i.html -[edicl-hunch]: http://archive.is/MP2bT -[rfc-793]: http://archive.is/hFcq3 -[tcp]: http://btcbase.org/log-search?q=tcp -[ht-acceptor]: http://coad.thetarpit.org/hunchentoot/c-acceptor.lisp.html#L42 -[ht-easy-acceptor]: http://coad.thetarpit.org/hunchentoot/c-easy-handlers.lisp.html#L330 -[ht-taskmaster]: http://coad.thetarpit.org/hunchentoot/c-taskmaster.lisp.html#L31 -[ht-execute-acceptor]: http://coad.thetarpit.org/hunchentoot/c-taskmaster.lisp.html#L40 -[ht-accept-connections]: http://coad.thetarpit.org/hunchentoot/c-acceptor.lisp.html#L251 -[ht-handle-incoming-connection]: http://coad.thetarpit.org/hunchentoot/c-taskmaster.lisp.html#L47 -[ht-process-connection]: http://coad.thetarpit.org/hunchentoot/c-acceptor.lisp.html#L272 -[s-xml]: /posts/y05/086-s-xml.html#selection-109.0-109.109 -[ht-request]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L31 -[ht-reply]: http://coad.thetarpit.org/hunchentoot/c-reply.lisp.html#L31 -[ht-process-request]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L105 -[ht-handle-request]: http://coad.thetarpit.org/hunchentoot/c-acceptor.lisp.html#L284 -[ht-acceptor-dispatch-request]: http://coad.thetarpit.org/hunchentoot/c-acceptor.lisp.html#L293 -[ht-session]: http://coad.thetarpit.org/hunchentoot/c-session.lisp.html#L83 -[w30-31-work]: /posts/y05/094-tmsr-work-iii.html#selection-150.0-150.5 -[eulora]: http://ossasepia.com/2019/06/18/euloran-blus-and-boos/#selection-25.1-35.40 diff --git a/posts/y05/096-hunchentoot-ii.markdown b/posts/y05/096-hunchentoot-ii.markdown new file mode 100644 index 0000000..d82ad35 --- /dev/null +++ b/posts/y05/096-hunchentoot-ii.markdown @@ -0,0 +1,199 @@ +--- +postid: 096 +title: Notes on Hunchentoot architecture +date: July 20, 2019 +author: Lucian Mogoșanu +tags: tech, tmsr +--- + +This post is part of a series on [Common Lisp WWWism][cl-www]. Before +continuing, I heartily recommend a review of [prior][cl-www] +[communications][m7-work] on the subject. + +As [previously mentioned][hunchentoot-i], Hunchentoot is *quite big*, +that is, circa 6000-7000 LoC. However, despite its weight, our CL web +server of choice is not just an amorphous pile of code dropped from +the thought-prepuce of its original author; for one, it carries along +with it a well-written [documentation page][edicl-hunch] which +describes each of the pieces; and for the other, judging by the +technical documentation, we can at least hope that the code is to some +degree written by a sane mind. + +Thusly proceeding from these two artifacts (documentation and code), +the first step towards producing a genesis of Hunchentoot will be to +write down my own notes describing its organization -- I will point at +some coad, but I'm not diving into the functional details just yet; +instead, I am looking at the abstractions and the interaction between +components. + +First, some HTTP basics: let's say we have an ideal item H +representing a HTTP server. Our server H is a program that serves +pages to us; more concretely, it a. binds to an address A; b. waits +for incoming requests Rq on A; c. processes each Rq; and d. responds +with a reply Rp for each Rq. + +This model is all nice and dandy, isn't it? Except it doesn't lead us +anywhere by itself. The problem is that HTTP imports the notion of +"connection", and implicitly TCP, into its spec. This makes it very +inconvenient for us, because in our new model H: a. binds to a (TCP) +port P; b. listens on P for incoming connections C; then c. each C +needs to be confirmed on H's end, i.e. it must be accepted by the +server; then d. for each C, H waits for one or more incoming Rqs; then +it e. processes each Rq; and finally, it f. responds to each Rq with a +Rp[^1]. In addition to the increased complexity of this model, one +other problem is that the server has no way to tell when the +requesting entity has finished sending Rqs, so who's going to end C +then? And either way, H will now find itself in the position of having +to manage Cs, which have nothing to do whatsoever with the notion of a +HTTP request. + +Now that we have the basics in place, let's take a look at the +abstractions exposed by our particular H, which for historical reasons +we've decided to christen Hunchentoot. + +In Hunchentoot, the entity that makes a HTTP server (bound to a port, +etc.) come to life is called an **[acceptor][ht-acceptor]**. Acceptors +encapsulate the port, IP address, listening socket, etc. plus some +state and some basic server configuration data, such as the document +root for serving static files and paths to logfiles -- in other words, +all the data needed to perform at least (a), (b) and (c) +above. Moreover, the user can extend acceptor functionality to define +custom handlers for URLs, as illustrated by the +[easy-acceptor][ht-easy-acceptor] subclass. + +However, acceptors don't have any say in *when* connections (the Cs +above) and requests (the Rqs above) are handled, i.e. how tasks are +distributed among workers, and if there are any dedicated worker +threads at all. Work management is done through the +**[taskmaster][ht-taskmaster]** abstraction. A very broad sketch of +how this works: after listening (i.e. (b)) is complete, the acceptor +calls the taskmaster via [execute-acceptor][ht-execute-acceptor], in +order to establish on what thread are connections accepted (i.e. (c)) +and where and when requests are handled (i.e. (d) to (f)). When the +taskmaster is ready for (c), it calls the acceptor's +[accept-connections][ht-accept-connections], which performs the accept +and gives back control to the taskmaster +([handle-incoming-connection][ht-handle-incoming-connection]), which +at some point calls back into the acceptor +([process-connection][ht-process-connection]) to let it perform (d), +(e) and (f). + +The keen reader will by now wonder what's the point to all this +dancing around between taskmasters and acceptors. For one, each +acceptor has a taskmaster *and* the other way around; for another, all +this "execute, then accept, then handle, then process" seems +arbitrarily assigned to either the acceptor or the taskmaster, so +really, what the fuck? + +The main reasoning behind this acceptor-taskmaster separation is the +following: acceptors do useful work, which is mainly accepting +connections and handling the requests sent via the former; meanwhile, +taskmasters are hooked immediately before this useful work occurs, so +that they obey a decision made apriori by the user whether said work +will be scheduled on a new thread or performed on the same one. In +other words, we're given [flexibility][s-xml] at the cost of extra +lines of code. Given my lack of direct experience with Hunchentoot, +I'm not sure yet whether this cost is worth it or not, but if it +proves to be more trouble than it's worth, I will personally carve the +thing out. + +Moving on to other abstractions, the next on the list are +**[request][ht-request]** and **[reply][ht-reply]** objects. These, as +the name suggests, encapsulate HTTP request/reply data, such as the +URL, headers, cookies, return codes and so on. To continue on the +previous thread: once the acceptor starts [processing +connections][ht-process-connection] (i.e. (d)), it will create request +objects and process each of them -- +[process-request][ht-process-request] will call +[handle-request][ht-handle-request] (i.e. (e)), which will call +[acceptor-dispatch-request][ht-acceptor-dispatch-request], which can +be customized by the user via defmethod for the job of processing +requests and, finally, step (f). + +I will gloss over **[session][ht-session]** objects for the moment, as +they are less relevant to the overall architecture. It's sufficient to +say that they serve as an abstraction for "stateful shit over this +stateless protocol", which is something I'd be happy to see die a +gruesome death. + +I was going to make a diagram and show some examples of Hunchentoot at +work, but I am well over the one thousand word limit, so I will stop +this episode here. We can now put the next [couple of +weeks][w30-31-work] in perspective, though: + +* I believe there is some merit to making a visual representation of + what I've just written. Procedural programming was designed to + provide separation between logical units of work, but so far + everything here is looking like a tangled mess[^2]. +* All this discussion isn't of much value without some examples of + Hunchentoot in action. Speaking of which: this is perhaps a small + project in itself, but it would be fun to find out and document this + proggy's breaking points. +* I need to dig deeper into coad and start owning it, and the first + victim will be our child star, the acceptor implementation. At the + moment it's hard to give an estimate of how long this will take, + I'll have to make another initial investigation and estimation + before I set out to climb the whole mountain. + +[^1]: Quick likbez on how Unix and TCP make the whole thing work: + + First off, readers will notice that the "A" in the simplified + model turns into a "P" in the second one, and for a good reason: + while the IP protocol specifies addresses for hosts, transport + layer protocols (e.g. TCP and UDP) usually specify addresses for + *applications* running on a given host, and that's precisely what + our P is: an address used by a client (say, a web browser) to + identify a server application (e.g. a web server) running on some + host. And it's not only the server that binds to a port, the + client also gets one, only this client-side port allocation is + usually performed by the operating system. + + Second, from the Unix side, both the server's (passive) connection + and each individual connection with a client get an object called + a socket. That is, in (a), the operating system binds a socket S + owned by H to P, i.e. it keeps a note somewhere that subsequent + connections to P will be assigned to S and that H *may* accept + said connections. In (b), H signals its availability to receive + connections by performing the listen system call; this puts S into + the "listening" state, as specified in [RFC 793][rfc-793]. + + At this point, whenever a TCP client wants to initiate a + connection, it needs to go through the SYN-SYN+ACK-ACK three-way + handshake hula hoop; so the client sends his SYN, and then in + order for communication to move forward, the server must send his + SYN+ACK, which only happens in the accept phase, i.e. (c). Then + the accept function returns a new socket S' for the new + connection, and only then can actual HTTP communication start. + + I won't even go into the details of why this is retarded, it's + been beaten to death in [the logs][tcp]. Either way, there's no + way around this pile of shit for computers talking to the heathen + WWW... actually, if there *is* one, I'd very much like to hear + about it. + +[^2]: I remember reading the same words somewhere else, and I even + know [where][eulora]. I'm not the first, nor even the second in + line to look at large open sores coads, you see. + +[cl-www]: /posts/y05/090-tmsr-work-ii.html#selection-108.0-108.17 +[m7-work]: /posts/y05/094-tmsr-work-iii.html +[hunchentoot-i]: /posts/y05/093-hunchentoot-i.html +[edicl-hunch]: http://archive.is/MP2bT +[rfc-793]: http://archive.is/hFcq3 +[tcp]: http://btcbase.org/log-search?q=tcp +[ht-acceptor]: http://coad.thetarpit.org/hunchentoot/c-acceptor.lisp.html#L42 +[ht-easy-acceptor]: http://coad.thetarpit.org/hunchentoot/c-easy-handlers.lisp.html#L330 +[ht-taskmaster]: http://coad.thetarpit.org/hunchentoot/c-taskmaster.lisp.html#L31 +[ht-execute-acceptor]: http://coad.thetarpit.org/hunchentoot/c-taskmaster.lisp.html#L40 +[ht-accept-connections]: http://coad.thetarpit.org/hunchentoot/c-acceptor.lisp.html#L251 +[ht-handle-incoming-connection]: http://coad.thetarpit.org/hunchentoot/c-taskmaster.lisp.html#L47 +[ht-process-connection]: http://coad.thetarpit.org/hunchentoot/c-acceptor.lisp.html#L272 +[s-xml]: /posts/y05/086-s-xml.html#selection-109.0-109.109 +[ht-request]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L31 +[ht-reply]: http://coad.thetarpit.org/hunchentoot/c-reply.lisp.html#L31 +[ht-process-request]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L105 +[ht-handle-request]: http://coad.thetarpit.org/hunchentoot/c-acceptor.lisp.html#L284 +[ht-acceptor-dispatch-request]: http://coad.thetarpit.org/hunchentoot/c-acceptor.lisp.html#L293 +[ht-session]: http://coad.thetarpit.org/hunchentoot/c-session.lisp.html#L83 +[w30-31-work]: /posts/y05/094-tmsr-work-iii.html#selection-150.0-150.5 +[eulora]: http://ossasepia.com/2019/06/18/euloran-blus-and-boos/#selection-25.1-35.40