From d33fa3e81573e23c1b5cdbc097a686f003bddfc3 Mon Sep 17 00:00:00 2001 From: Lucian Mogosanu Date: Fri, 1 Mar 2019 21:57:48 +0200 Subject: [PATCH] posts: 086 --- drafts/086-s-xml.markdown | 86 ------------------------------------------ posts/y05/086-s-xml.markdown | 84 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 84 insertions(+), 86 deletions(-) delete mode 100644 drafts/086-s-xml.markdown create mode 100644 posts/y05/086-s-xml.markdown diff --git a/drafts/086-s-xml.markdown b/drafts/086-s-xml.markdown deleted file mode 100644 index 1af3895..0000000 --- a/drafts/086-s-xml.markdown +++ /dev/null @@ -1,86 +0,0 @@ ---- -postid: 086 -title: An XML parser for Common Lisp programs -date: February 18, 2019 -author: Lucian Mogoșanu -tags: tech, tmsr ---- - -This post is part of a [series][tmsr-schedule-i] of published -[artifacts][intellectual-ownership] that (will) represent components for -The Republic's RSS bot, [Feedbot][feedbot]. The idea behind this series -is to grow Feedbot piece by piece[^1], starting from the smallest -elements that [fit in head][fits-in-head], then using them as building -blocks for the actual product, which will flow downstream from the -[botworks][botworks] V tree. - -The first item to be published is S-XML, an XML parser written in Common -Lisp. Both the name and the code have been lifted from files -[published][s-xml] by one [nonperson][nonperson] known as Sven Van -Caekenberghe, who, fortunately, wrote a library that is relatively small -(around a thousand LoC), is organized so that it can be grasped in a -relatively short time, and is known to work[^2]. Unfortunately though, -as with most (all?) heathen programs encountered, this one isn't without -warts. Thus, in addition to providing a patch, this article discusses -the structure of S-XML and its current problems. - -The patch for S-XML is available in my -[V source repository][s-xml-v]. Now, as to the library itself, it is -structured as follows. - -S-XML contains three layers of abstraction: a. the core parsing code, -that reads characters from a stream and returns XML elements, stored in -`xml.lisp`; b. a series of so-called "wrappers" over the parser that -take its results and give them a particular structure; and c. an -interface between (a) and (b), stored in `dom.lisp`. In fact, one could -say that the layering goes exactly the other way around: the code in (b) -provides a set of functions for the parser (a), while the parser takes a -string/stream, processes it and calls the functions provided by (b) so -that it can decide what to do once it has all the required data, -e.g. tags, attributes etc. The wrappers in (b) are stored in -`xml-struct-dom.lisp`, `lxml-dom.lisp` and `sxml-dom.lisp`[^3]. - -The advantage of this design is that it doesn't constrain the user to -any particular DOM tree representation. Personally, I find this -so-called feature to be entirely useless and not much of an advantage -after all, as I don't have any plans to parse XML files into multiple -tree formats, not even mentioning the fact that "doesn't constrain the -user" is the perfect recipe for -[hallucinated freedom][hallucinated-freedom], and not even discussing -the extra lines of (mostly dead) code this adds. This -[cleverness][clever] is more for its own sake than anything of -substance, and thus eventually some hero or another will surgically -excise this particular tumour. - -Until then, however, the thing works, so it's all the better to publish -it than to wait for the moment when said hero gets off his or her ass -and makes the thing shine. Meanwhile, the more pressing matter for yours -truly, and the next episode of this series, will involve publishing a -small RSS/Atom parser based on S-XML DOM trees. - -[^1]: This style, nowadays immediately recognizable by TMSR citizens as - the "FFA style", draws from [Asciilifeform][alf]'s - [Finite Field Arithmetic][loper-os] series. - -[^2]: It's been powering Feedbot for some months now. - -[^3]: Corresponding, respectively, to a defstruct-based format, Franz's - [LXML][lxml] format and the so-called [SXML][sxml]. The latter two - structure XML markup as S-expressions. Currently, Feedbot's RSS - parser uses LXML, for no reason in particular other than it being - the implicit option. - -[tmsr-schedule-i]: /posts/y05/082-tmsr-schedule-i.html#selection-63.114-63.159 -[intellectual-ownership]: /posts/y04/069-on-intellectual-ownership.html -[feedbot]: http://btcbase.org/log-search?q=feedbot -[alf]: http://wot.deedbot.org/17215D118B7239507FAFED98B98228A001ABFFC7.html -[loper-os]: http://www.loper-os.org/?cat=49 -[fits-in-head]: http://btcbase.org/log-search?q=fits-in-head -[botworks]: /posts/y05/080-botworks-regrind.html -[s-xml]: https://archive.is/Ra6bA -[nonperson]: http://btcbase.org/log-search?q=nonperson -[s-xml-v]: http://lucian.mogosanu.ro/src/s-xml -[lxml]: https://archive.is/DpZnN -[sxml]: https://archive.is/Sl4Ev -[hallucinated-freedom]: http://trilema.com/2017/the-practical-costs-of-hallucinated-freedom/ -[clever]: http://btcbase.org/log-search?q=clever diff --git a/posts/y05/086-s-xml.markdown b/posts/y05/086-s-xml.markdown new file mode 100644 index 0000000..831359d --- /dev/null +++ b/posts/y05/086-s-xml.markdown @@ -0,0 +1,84 @@ +--- +postid: 086 +title: An XML parser for Common Lisp programs +date: March 1, 2019 +author: Lucian Mogoșanu +tags: tech, tmsr +--- + +This post is part of a [series][tmsr-schedule-i] of published +[artifacts][intellectual-ownership] that (will) represent components for +The Republic's RSS bot, [Feedbot][feedbot]. The idea behind this series +is to grow Feedbot piece by piece[^1], starting from the smallest +elements that [fit in head][fits-in-head], then using them as building +blocks for the actual product, which will flow downstream from the +[botworks][botworks] V tree. + +The first item to be published is S-XML, an XML parser written in Common +Lisp. Both the name and the code have been lifted from files +[published][s-xml] by one [nonperson][nonperson] known as Sven Van +Caekenberghe, who, fortunately, wrote a library that is relatively small +(around a thousand LoC), is organized so that it can be grasped in a +relatively short time, and is known to work[^2]. Unfortunately though, +as with most (all?) heathen programs encountered, this one isn't without +warts. Thus, in addition to providing a patch, this article discusses +the structure of S-XML and its current problems. + +The patch for S-XML is available in my +[V source repository][s-xml-v]. Now, as to the library itself, it is +structured as follows. + +S-XML contains three layers of abstraction: a. the core parsing code, +that reads characters from a stream and returns XML elements, stored in +`xml.lisp`; b. a series of so-called "wrappers" over the parser that +take its results and give them a particular structure; and c. an +interface between (a) and (b), stored in `dom.lisp`. In fact, one could +say that the layering goes exactly the other way around: the code in (b) +provides a set of functions for the parser (a), while the parser takes a +string/stream, processes it and calls the functions provided by (b) so +that it can decide what to do once it has all the required data, +e.g. tags, attributes etc. The wrappers in (b) are stored in +`xml-struct-dom.lisp`, `lxml-dom.lisp` and `sxml-dom.lisp`[^3]. + +The advantage of this design is that it doesn't constrain the user to +any particular DOM tree representation. Personally, I find this +so-called feature to be entirely useless, as the need to parse XML files +into multiple tree formats using a *single* library looks like the +perfect recipe for [hallucinated freedom][hallucinated-freedom], not to +mention the extra lines of (mostly dead) code added. This +[cleverness][clever] is more for its own sake than anything of +substance, and thus eventually some hero or another will surgically +excise this particular tumour. + +Until then, however, the thing works, so it's all the better to publish +it than to wait for the moment when said hero gets off his or her ass +and makes the thing shine. Meanwhile, the more pressing matter for yours +truly, and the next episode of this series, will involve publishing a +small RSS/Atom parser based on S-XML DOM trees. + +[^1]: This style, nowadays immediately recognizable by TMSR citizens as + the "FFA style", draws from [Asciilifeform][alf]'s + [Finite Field Arithmetic][loper-os] series. + +[^2]: It's been powering Feedbot for some months now. + +[^3]: Corresponding, respectively, to a defstruct-based format, Franz's + [LXML][lxml] format and the so-called [SXML][sxml]. The latter two + structure XML markup as S-expressions. Currently, Feedbot's RSS + parser uses LXML, for no reason in particular other than it being + the implicit option. + +[tmsr-schedule-i]: /posts/y05/082-tmsr-schedule-i.html#selection-63.114-63.159 +[intellectual-ownership]: /posts/y04/069-on-intellectual-ownership.html +[feedbot]: http://btcbase.org/log-search?q=feedbot +[alf]: http://wot.deedbot.org/17215D118B7239507FAFED98B98228A001ABFFC7.html +[loper-os]: http://www.loper-os.org/?cat=49 +[fits-in-head]: http://btcbase.org/log-search?q=fits-in-head +[botworks]: /posts/y05/080-botworks-regrind.html +[s-xml]: https://archive.is/Ra6bA +[nonperson]: http://btcbase.org/log-search?q=nonperson +[s-xml-v]: http://lucian.mogosanu.ro/src/s-xml +[lxml]: https://archive.is/DpZnN +[sxml]: https://archive.is/Sl4Ev +[hallucinated-freedom]: http://trilema.com/2017/the-practical-costs-of-hallucinated-freedom/ +[clever]: http://btcbase.org/log-search?q=clever -- 1.7.10.4