From: Lucian Mogosanu <lucian@mogosanu.ro>
Date: Tue, 23 Jul 2019 17:26:21 +0000 (+0300)
Subject: posts: 096
X-Git-Tag: v0.11~26
X-Git-Url: https://git.mogosanu.ro/?a=commitdiff_plain;h=65abe93ed41ff2c96d74190f7e76ccd58ec03d23;p=thetarpit.git

posts: 096
---

diff --git a/drafts/000-hunchentoot-ii.markdown b/drafts/000-hunchentoot-ii.markdown
deleted file mode 100644
index 94964c7..0000000
--- a/drafts/000-hunchentoot-ii.markdown
+++ /dev/null
@@ -1,183 +0,0 @@
----
-postid: 000
-title: Notes on Hunchentoot architecture
-date: July 18, 2019
-author: Lucian MogoÈanu
-tags: tech, tmsr
----
-
-This post is part of a series on [Common Lisp WWWism][cl-www]. Before
-continuing, I heartily recommend a review of [prior][cl-www]
-[discussions][m7-work] on the subject.
-
-As [previously mentioned][hunchentoot-i], Hunchentoot is *big*, that
-is, circa 6000-7000 LoC. However, despite its weight, our CL web
-server of choice is not just an amorphous pile of code dropped from
-the thought-prepuce of its original author; for one, it carries along
-with it a well-written [documentation page][edicl-hunch] which
-describes each of the pieces; and for the other, judging by the
-technical documentation, we can at least hope that the code is to some
-degree written by a sane mind.
-
-Thusly proceeding from these two artifacts (documentation and code),
-the first step towards producing a genesis of Hunchentoot will be to
-write down my own notes describing its organization -- I will point at
-some coad, but I'm not diving into the functional details just yet;
-instead, I am looking at the abstractions and the interaction between
-components.
-
-First, some HTTP basics: let's say we have an ideal item H
-representing a HTTP server. Our server H is a program that serves
-pages to us; more concretely, it a. binds to a port P; b. waits for
-incoming requests Rq on P; c. processes each Rq; and d. responds with
-a reply Rp for each Rq.
-
-This model is all nice and dandy, isn't it? Except it doesn't lead us
-anywhere by itself. The problem is that HTTP imports the notion of
-"connection", and implicitly TCP, into its spec. This makes it very
-inconvenient for us, because now H: a. binds to a (TCP) port P;
-b. listens on P for incoming connections C; then c. each C needs to be
-confirmed on H's end, i.e. it must be accepted by the server; then
-d. for each C, H waits for one or more incoming Rqs; then it
-e. processes each Rq; and finally, it f. responds with a Rp[^1]. In
-addition to the increased complexity of this model, one other problem
-is that the server has no way to tell when the requesting entity has
-finished sending Rqs, so who's going to end C then? And either way, H
-will now find itself in the position of having to manage Cs, which
-have nothing to do whatsoever with the notion of a HTTP request.
-
-Now that we have the basics in place, let's take a look at the
-abstractions exposed by our particular H, which for historical reasons
-we've decided to christen Hunchentoot.
-
-In Hunchentoot, the entity that makes a HTTP server (bound to a port,
-etc.) come to life is called an **[acceptor][ht-acceptor]**. Acceptors
-encapsulate the port, IP address, listening socket, etc. plus some
-state and some basic server configuration data, such as the document
-root for serving static files and paths to logfiles -- in other words,
-all the data needed to perform at least (a), (b) and (c)
-above. Moreover, the user can extend acceptor functionality to define
-custom handlers for URLs, as illustrated by the
-[easy-acceptor][ht-easy-acceptor] subclass.
-
-However, acceptors don't have any say in *when* connections (the Cs
-above) and requests (the Rqs above) are handled, i.e. how tasks are
-distributed among workers, and if there are any dedicated worker
-threads at all. Work management is done through the
-**[taskmaster][ht-taskmaster]** abstraction. A very broad sketch of
-how this works: after listening (i.e. (b)) is complete, the acceptor
-calls the taskmaster via [execute-acceptor][ht-execute-acceptor], in
-order to establish when connections are accepted (i.e. (c)) and where
-and when requests are handled (i.e. (d) to (f)). When the taskmaster
-is ready for (c), it calls the acceptor's
-[accept-connections][ht-accept-connections], which performs the accept
-and gives back control to the taskmaster
-([handle-incoming-connection][ht-handle-incoming-connection]), which
-at some point calls back into the acceptor
-([process-connection][ht-process-connection]) to let it perform (d),
-(e) and (f).
-
-The keen reader will by now wonder what's the point to all this
-dancing around between taskmasters and acceptors. For one, each
-acceptor has a taskmaster, and the other way around; for another, all
-this "execute, then accept, then handle, then process" seems
-arbitrarily assigned to either the acceptor or the taskmaster, so
-really, what the fuck?
-
-The main reasoning behind this acceptor-taskmaster separation is the
-following: acceptors do useful work, which is mainly accepting
-connections and handling the requests sent via the former; meanwhile,
-taskmasters are hooked immediately before this useful work occurs, so
-that they obey a decision made apriori by the user whether said work
-will be scheduled on a new thread or performed on the same one. In
-other words, we're given [flexibility][s-xml] at the cost of extra
-lines of code. Given my lack of direct experience with Hunchentoot,
-I'm not sure yet whether this cost is worth it or not, but if it
-proves to be more trouble than it's worth, I will personally carve the
-thing out.
-
-Moving on to other abstractions, the next on the list are
-**[request][ht-request]** and **[reply][ht-reply]** objects. These, as
-the name suggests, encapsulate HTTP request/reply data, such as the
-URL, headers, cookies, return codes and so on. To continue on the
-previous thread: once the acceptor starts [processing
-connections][ht-process-connection] (i.e. (d)), it will create request
-objects and process each of them --
-[process-request][ht-process-request] will call
-[handle-request][ht-handle-request] (i.e. (e)), which will call
-[acceptor-dispatch-request][ht-acceptor-dispatch-request], which can
-be customized by the user via e.g. subclassing, to perform request
-processing and, finally, step (f).
-
-I will gloss over **[session][ht-session]** objects for the moment, as
-they are less relevant to the overall architecture. It's sufficient to
-say that they serve as an abstraction for "stateful shit over this
-stateless protocol", which is something I'd be happy to see die a
-gruesome death.
-
-I was going to make a diagram and show some examples of Hunchentoot at
-work, but I am well over the one thousand word limit, so I will stop
-this episode here. We can now put the next [couple of
-weeks][w30-31-work] in perspective, though:
-
-* I believe there is some merit to making a visual representation of
-  what I've just written. Procedural programming was designed to
-  provide separation between logical units of work, but so far
-  everything here is looking like a tangled mess[^2].
-* All this discussion isn't of much value without some examples of
-  Hunchentoot in action. Speaking of which: this is perhaps a small
-  project in itself, but it would be fun to find out and document this
-  proggy's breaking points.
-* I need to dig deeper into coad and start owning it, and the first
-  victim will be our star child, the acceptor implementation. At the
-  moment it's hard to give an estimate of how long this will take, but
-  I'll make sure to look at this before I set out to climb the
-  mountain.
-
-[^1]: Quick likbez on how Unix and TCP make the whole thing work:
-
-	In (a), the operating system binds a socket S owned by H to P,
-    i.e. it keeps a note somewhere that subsequent connections to P
-    will be assigned to S and that H *may* accept said connections. In
-    (b), H signals its availability to receive connections by
-    performing the listen system call; this puts S into the
-    "listening" state, as specified in [RFC 793][rfc-793].
-
-	At this point, whenever a TCP client wants to initiate a
-    connection, it needs to go through the SYN-SYN+ACK-ACK three-way
-    handshake hula hoop; so the client sends his SYN, and then in
-    order for communication to move forward, the server must send his
-    SYN+ACK, which only happens in the accept phase, i.e. (c). Then
-    the accept function returns a new socket S' for the new
-    connection, and only then can actual HTTP communication start.
-
-	I won't even go into the details of why this is retarded, it's
-    been beaten to death in [the logs][tcp]. Either way, there's no
-    way around this pile of shit for nodes talking to the heathen WWW.
-
-[^2]: I remember reading the same words somewhere else, and I even
-    know [where][eulora]. I'm not the first, nor even the second in
-    line to look at large open sores coads, you see.
-
-[cl-www]: /posts/y05/090-tmsr-work-ii.html#selection-108.0-108.17
-[m7-work]: /posts/y05/094-tmsr-work-iii.html
-[hunchentoot-i]: /posts/y05/093-hunchentoot-i.html
-[edicl-hunch]: http://archive.is/MP2bT
-[rfc-793]: http://archive.is/hFcq3
-[tcp]: http://btcbase.org/log-search?q=tcp
-[ht-acceptor]: http://coad.thetarpit.org/hunchentoot/c-acceptor.lisp.html#L42
-[ht-easy-acceptor]: http://coad.thetarpit.org/hunchentoot/c-easy-handlers.lisp.html#L330
-[ht-taskmaster]: http://coad.thetarpit.org/hunchentoot/c-taskmaster.lisp.html#L31
-[ht-execute-acceptor]: http://coad.thetarpit.org/hunchentoot/c-taskmaster.lisp.html#L40
-[ht-accept-connections]: http://coad.thetarpit.org/hunchentoot/c-acceptor.lisp.html#L251
-[ht-handle-incoming-connection]: http://coad.thetarpit.org/hunchentoot/c-taskmaster.lisp.html#L47
-[ht-process-connection]: http://coad.thetarpit.org/hunchentoot/c-acceptor.lisp.html#L272
-[s-xml]: /posts/y05/086-s-xml.html#selection-109.0-109.109
-[ht-request]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L31
-[ht-reply]: http://coad.thetarpit.org/hunchentoot/c-reply.lisp.html#L31
-[ht-process-request]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L105
-[ht-handle-request]: http://coad.thetarpit.org/hunchentoot/c-acceptor.lisp.html#L284
-[ht-acceptor-dispatch-request]: http://coad.thetarpit.org/hunchentoot/c-acceptor.lisp.html#L293
-[ht-session]: http://coad.thetarpit.org/hunchentoot/c-session.lisp.html#L83
-[w30-31-work]: /posts/y05/094-tmsr-work-iii.html#selection-150.0-150.5
-[eulora]: http://ossasepia.com/2019/06/18/euloran-blus-and-boos/#selection-25.1-35.40
diff --git a/posts/y05/096-hunchentoot-ii.markdown b/posts/y05/096-hunchentoot-ii.markdown
new file mode 100644
index 0000000..d82ad35
--- /dev/null
+++ b/posts/y05/096-hunchentoot-ii.markdown
@@ -0,0 +1,199 @@
+---
+postid: 096
+title: Notes on Hunchentoot architecture
+date: July 20, 2019
+author: Lucian MogoÈanu
+tags: tech, tmsr
+---
+
+This post is part of a series on [Common Lisp WWWism][cl-www]. Before
+continuing, I heartily recommend a review of [prior][cl-www]
+[communications][m7-work] on the subject.
+
+As [previously mentioned][hunchentoot-i], Hunchentoot is *quite big*,
+that is, circa 6000-7000 LoC. However, despite its weight, our CL web
+server of choice is not just an amorphous pile of code dropped from
+the thought-prepuce of its original author; for one, it carries along
+with it a well-written [documentation page][edicl-hunch] which
+describes each of the pieces; and for the other, judging by the
+technical documentation, we can at least hope that the code is to some
+degree written by a sane mind.
+
+Thusly proceeding from these two artifacts (documentation and code),
+the first step towards producing a genesis of Hunchentoot will be to
+write down my own notes describing its organization -- I will point at
+some coad, but I'm not diving into the functional details just yet;
+instead, I am looking at the abstractions and the interaction between
+components.
+
+First, some HTTP basics: let's say we have an ideal item H
+representing a HTTP server. Our server H is a program that serves
+pages to us; more concretely, it a. binds to an address A; b. waits
+for incoming requests Rq on A; c. processes each Rq; and d. responds
+with a reply Rp for each Rq.
+
+This model is all nice and dandy, isn't it? Except it doesn't lead us
+anywhere by itself. The problem is that HTTP imports the notion of
+"connection", and implicitly TCP, into its spec. This makes it very
+inconvenient for us, because in our new model H: a. binds to a (TCP)
+port P; b. listens on P for incoming connections C; then c. each C
+needs to be confirmed on H's end, i.e. it must be accepted by the
+server; then d. for each C, H waits for one or more incoming Rqs; then
+it e. processes each Rq; and finally, it f. responds to each Rq with a
+Rp[^1]. In addition to the increased complexity of this model, one
+other problem is that the server has no way to tell when the
+requesting entity has finished sending Rqs, so who's going to end C
+then? And either way, H will now find itself in the position of having
+to manage Cs, which have nothing to do whatsoever with the notion of a
+HTTP request.
+
+Now that we have the basics in place, let's take a look at the
+abstractions exposed by our particular H, which for historical reasons
+we've decided to christen Hunchentoot.
+
+In Hunchentoot, the entity that makes a HTTP server (bound to a port,
+etc.) come to life is called an **[acceptor][ht-acceptor]**. Acceptors
+encapsulate the port, IP address, listening socket, etc. plus some
+state and some basic server configuration data, such as the document
+root for serving static files and paths to logfiles -- in other words,
+all the data needed to perform at least (a), (b) and (c)
+above. Moreover, the user can extend acceptor functionality to define
+custom handlers for URLs, as illustrated by the
+[easy-acceptor][ht-easy-acceptor] subclass.
+
+However, acceptors don't have any say in *when* connections (the Cs
+above) and requests (the Rqs above) are handled, i.e. how tasks are
+distributed among workers, and if there are any dedicated worker
+threads at all. Work management is done through the
+**[taskmaster][ht-taskmaster]** abstraction. A very broad sketch of
+how this works: after listening (i.e. (b)) is complete, the acceptor
+calls the taskmaster via [execute-acceptor][ht-execute-acceptor], in
+order to establish on what thread are connections accepted (i.e. (c))
+and where and when requests are handled (i.e. (d) to (f)). When the
+taskmaster is ready for (c), it calls the acceptor's
+[accept-connections][ht-accept-connections], which performs the accept
+and gives back control to the taskmaster
+([handle-incoming-connection][ht-handle-incoming-connection]), which
+at some point calls back into the acceptor
+([process-connection][ht-process-connection]) to let it perform (d),
+(e) and (f).
+
+The keen reader will by now wonder what's the point to all this
+dancing around between taskmasters and acceptors. For one, each
+acceptor has a taskmaster *and* the other way around; for another, all
+this "execute, then accept, then handle, then process" seems
+arbitrarily assigned to either the acceptor or the taskmaster, so
+really, what the fuck?
+
+The main reasoning behind this acceptor-taskmaster separation is the
+following: acceptors do useful work, which is mainly accepting
+connections and handling the requests sent via the former; meanwhile,
+taskmasters are hooked immediately before this useful work occurs, so
+that they obey a decision made apriori by the user whether said work
+will be scheduled on a new thread or performed on the same one. In
+other words, we're given [flexibility][s-xml] at the cost of extra
+lines of code. Given my lack of direct experience with Hunchentoot,
+I'm not sure yet whether this cost is worth it or not, but if it
+proves to be more trouble than it's worth, I will personally carve the
+thing out.
+
+Moving on to other abstractions, the next on the list are
+**[request][ht-request]** and **[reply][ht-reply]** objects. These, as
+the name suggests, encapsulate HTTP request/reply data, such as the
+URL, headers, cookies, return codes and so on. To continue on the
+previous thread: once the acceptor starts [processing
+connections][ht-process-connection] (i.e. (d)), it will create request
+objects and process each of them --
+[process-request][ht-process-request] will call
+[handle-request][ht-handle-request] (i.e. (e)), which will call
+[acceptor-dispatch-request][ht-acceptor-dispatch-request], which can
+be customized by the user via defmethod for the job of processing
+requests and, finally, step (f).
+
+I will gloss over **[session][ht-session]** objects for the moment, as
+they are less relevant to the overall architecture. It's sufficient to
+say that they serve as an abstraction for "stateful shit over this
+stateless protocol", which is something I'd be happy to see die a
+gruesome death.
+
+I was going to make a diagram and show some examples of Hunchentoot at
+work, but I am well over the one thousand word limit, so I will stop
+this episode here. We can now put the next [couple of
+weeks][w30-31-work] in perspective, though:
+
+* I believe there is some merit to making a visual representation of
+  what I've just written. Procedural programming was designed to
+  provide separation between logical units of work, but so far
+  everything here is looking like a tangled mess[^2].
+* All this discussion isn't of much value without some examples of
+  Hunchentoot in action. Speaking of which: this is perhaps a small
+  project in itself, but it would be fun to find out and document this
+  proggy's breaking points.
+* I need to dig deeper into coad and start owning it, and the first
+  victim will be our child star, the acceptor implementation. At the
+  moment it's hard to give an estimate of how long this will take,
+  I'll have to make another initial investigation and estimation
+  before I set out to climb the whole mountain.
+
+[^1]: Quick likbez on how Unix and TCP make the whole thing work:
+
+	First off, readers will notice that the "A" in the simplified
+    model turns into a "P" in the second one, and for a good reason:
+    while the IP protocol specifies addresses for hosts, transport
+    layer protocols (e.g. TCP and UDP) usually specify addresses for
+    *applications* running on a given host, and that's precisely what
+    our P is: an address used by a client (say, a web browser) to
+    identify a server application (e.g. a web server) running on some
+    host. And it's not only the server that binds to a port, the
+    client also gets one, only this client-side port allocation is
+    usually performed by the operating system.
+
+	Second, from the Unix side, both the server's (passive) connection
+	and each individual connection with a client get an object called
+	a socket. That is, in (a), the operating system binds a socket S
+	owned by H to P, i.e. it keeps a note somewhere that subsequent
+	connections to P will be assigned to S and that H *may* accept
+	said connections.  In (b), H signals its availability to receive
+	connections by performing the listen system call; this puts S into
+	the "listening" state, as specified in [RFC 793][rfc-793].
+
+	At this point, whenever a TCP client wants to initiate a
+    connection, it needs to go through the SYN-SYN+ACK-ACK three-way
+    handshake hula hoop; so the client sends his SYN, and then in
+    order for communication to move forward, the server must send his
+    SYN+ACK, which only happens in the accept phase, i.e. (c). Then
+    the accept function returns a new socket S' for the new
+    connection, and only then can actual HTTP communication start.
+
+	I won't even go into the details of why this is retarded, it's
+    been beaten to death in [the logs][tcp]. Either way, there's no
+    way around this pile of shit for computers talking to the heathen
+    WWW... actually, if there *is* one, I'd very much like to hear
+    about it.
+
+[^2]: I remember reading the same words somewhere else, and I even
+    know [where][eulora]. I'm not the first, nor even the second in
+    line to look at large open sores coads, you see.
+
+[cl-www]: /posts/y05/090-tmsr-work-ii.html#selection-108.0-108.17
+[m7-work]: /posts/y05/094-tmsr-work-iii.html
+[hunchentoot-i]: /posts/y05/093-hunchentoot-i.html
+[edicl-hunch]: http://archive.is/MP2bT
+[rfc-793]: http://archive.is/hFcq3
+[tcp]: http://btcbase.org/log-search?q=tcp
+[ht-acceptor]: http://coad.thetarpit.org/hunchentoot/c-acceptor.lisp.html#L42
+[ht-easy-acceptor]: http://coad.thetarpit.org/hunchentoot/c-easy-handlers.lisp.html#L330
+[ht-taskmaster]: http://coad.thetarpit.org/hunchentoot/c-taskmaster.lisp.html#L31
+[ht-execute-acceptor]: http://coad.thetarpit.org/hunchentoot/c-taskmaster.lisp.html#L40
+[ht-accept-connections]: http://coad.thetarpit.org/hunchentoot/c-acceptor.lisp.html#L251
+[ht-handle-incoming-connection]: http://coad.thetarpit.org/hunchentoot/c-taskmaster.lisp.html#L47
+[ht-process-connection]: http://coad.thetarpit.org/hunchentoot/c-acceptor.lisp.html#L272
+[s-xml]: /posts/y05/086-s-xml.html#selection-109.0-109.109
+[ht-request]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L31
+[ht-reply]: http://coad.thetarpit.org/hunchentoot/c-reply.lisp.html#L31
+[ht-process-request]: http://coad.thetarpit.org/hunchentoot/c-request.lisp.html#L105
+[ht-handle-request]: http://coad.thetarpit.org/hunchentoot/c-acceptor.lisp.html#L284
+[ht-acceptor-dispatch-request]: http://coad.thetarpit.org/hunchentoot/c-acceptor.lisp.html#L293
+[ht-session]: http://coad.thetarpit.org/hunchentoot/c-session.lisp.html#L83
+[w30-31-work]: /posts/y05/094-tmsr-work-iii.html#selection-150.0-150.5
+[eulora]: http://ossasepia.com/2019/06/18/euloran-blus-and-boos/#selection-25.1-35.40