From: Lucian Mogosanu Date: Sun, 17 Feb 2019 18:18:48 +0000 (+0200) Subject: posts: 085, draft X-Git-Tag: v0.11~101 X-Git-Url: https://git.mogosanu.ro/?a=commitdiff_plain;h=79829ae20e9c4d1ef8c482a25147f95c237aafdd;p=thetarpit.git posts: 085, draft --- diff --git a/posts/y05/085-s-xml.markdown b/posts/y05/085-s-xml.markdown new file mode 100644 index 0000000..1b4c475 --- /dev/null +++ b/posts/y05/085-s-xml.markdown @@ -0,0 +1,86 @@ +--- +postid: 085 +title: An XML parser for Common Lisp programs +date: February 18, 2019 +author: Lucian Mogoșanu +tags: tech, tmsr +--- + +This post is part of a [series][tmsr-schedule-i] of published +[artifacts][intellectual-ownership] that (will) represent components for +The Republic's RSS bot, [Feedbot][feedbot]. The idea behind this series +is to grow Feedbot piece by piece[^1], starting from the smallest +elements that [fit in head][fits-in-head], then using them as building +blocks for the actual product, which will flow downstream from the +[botworks][botworks] V tree. + +The first item to be published is S-XML, an XML parser written in Common +Lisp. Both the name and the code have been lifted from files +[published][s-xml] by one [nonperson][nonperson] known as Sven Van +Caekenberghe, who, fortunately, wrote a library that is relatively small +(around a thousand LoC), is organized so that it can be grasped in one +sitting, and is known to work[^2]. Unfortunately though, as with most +(all?) heathen programs encountered, this one isn't without warts. Thus, +in addition to providing a patch, this article discusses the structure +of S-XML and its current problems. + +The patch for S-XML is available in my +[V source repository][s-xml-v]. Now, as to the library itself, it is +structured as follows. + +S-XML contains two layers of abstraction: a. the core parsing code, that +reads characters from a stream and returns XML elements, stored in +`xml.lisp`; b. a series of wrappers over the parser that take the parser +results and give them a particular structure; and c. an interface +between (a) and (b), stored in `dom.lisp`. In fact, one could say that +the layering goes exactly the other way around: the code in (b) provides +a set of functions for the parser (a), while the parser takes a +string/stream, reads it and calls the functions provided by (b) so it +can decide what to do once it has all the required data, e.g. tags, +attributes etc. The wrappers in (b) are stored in `xml-struct-dom.lisp`, +`lxml-dom.lisp` and `sxml-dom.lisp`[^3]. + +The advantage of this design is that it doesn't constrain the user to +any particular DOM tree representation. Personally, I find this +so-called feature to be entirely useless and not much of an advantage +after all, as I don't have any plans to parse XML files into multiple +tree formats, not even mentioning the fact that "doesn't constrain the +user" is the perfect recipe for +[hallucinated freedom][hallucinated-freedom], and not even discussing +the extra lines of (mostly dead) code this adds. This +[cleverness][clever] is more for its own sake than anything of +substance, and thus eventually some hero or another will surgically +excise this particular tumour. + +Until then, however, the thing works, so it's all the better to publish +it than to wait for the moment when said hero gets off his or her +ass. Until then, the more pressing matter for yours truly, and the next +episode of this series, will involve publishing a small RSS/Atom parser +based on S-XML. + +[^1]: This style, nowadays immediately recognizable by TMSR citizens as + the "FFA style", draws from [Asciilifeform][alf]'s + [Finite Field Arithmetic][loper-os] series. + +[^2]: It's been powering Feedbot for some months now. + +[^3]: Corresponding, respectively, to a defstruct-based format, Franz's + [LXML][lxml] format and the so-called [SXML][sxml]. The latter two + structure XML markup as S-expressions. Currently, Feedbot's RSS + parser uses LXML, for no reason in particular other than it being + the implicit option. + +[tmsr-schedule-i]: /posts/y05/082-tmsr-schedule-i.html#selection-63.114-63.159 +[intellectual-ownership]: /posts/y04/069-on-intellectual-ownership.html +[feedbot]: http://btcbase.org/log-search?q=feedbot +[alf]: http://wot.deedbot.org/17215D118B7239507FAFED98B98228A001ABFFC7.html +[loper-os]: http://www.loper-os.org/?cat=49 +[fits-in-head]: http://btcbase.org/log-search?q=fits-in-head +[botworks]: /posts/y05/080-botworks-regrind.html +[s-xml]: https://archive.is/Ra6bA +[nonperson]: http://btcbase.org/log-search?q=nonperson +[s-xml-v]: http://lucian.mogosanu.ro/src/ TODO update this with actual dir +[lxml]: https://web.archive.org/web/20080108082030/http://opensource.franz.com/xmlutils/xmlutils-dist/pxml.htm +[sxml]: https://archive.is/Sl4Ev +[hallucinated-freedom]: http://trilema.com/2017/the-practical-costs-of-hallucinated-freedom/ +[clever]: http://btcbase.org/log-search?q=clever