From: Lucian Mogosanu Date: Wed, 6 Mar 2019 18:19:23 +0000 (+0200) Subject: posts: 087 X-Git-Tag: v0.11~85 X-Git-Url: https://git.mogosanu.ro/?a=commitdiff_plain;h=8d22f2d4a66aefbaa5ee3be1d1b5c504b31eb299;p=thetarpit.git posts: 087 --- diff --git a/drafts/087-feedparse.markdown b/drafts/087-feedparse.markdown deleted file mode 100644 index 4a22e85..0000000 --- a/drafts/087-feedparse.markdown +++ /dev/null @@ -1,132 +0,0 @@ ---- -postid: 087 -title: A feed parser for Common Lisp programs -date: March 6, 2019 -author: Lucian Mogoșanu -tags: tech, tmsr ---- - -This is the second part of [a series][tmsr-schedule-i] on building -blocks for [Feedbot][feedbot]. - -Now that we have an [XML parser][s-xml] on hand, we can use that to -obtain [RSS][rss] and [Atom][rss] feeds in a structured format. Once -again, as far as heathendom goes we are lucky -- one Kyle Isom has -provided us with a feed parser, cl-feedparse, that is under three -hundred lines of neat Lisp code, and quite importantly, it meets the -spec, as can be seen below. - -The code is available as a patch on the [S-XML V tree][s-xml-v]. - -The remainder of this post a. describes the structure and functionality -of feedparse; and b. it provides some usage examples. - -(cl-)feedparse provides the following simple structure for (RSS/Atom) -feeds. A feed can be decomposed into its title, kind (RSS or Atom), URL -and list of feed items. A feed item has the following elements: an ID, a -title, a (publication) date, a link and a body/description. - -In order to obtain a feed, feedparse: - -* performs an HTTP request, obtaining the raw feed data; -* parses the feed, using S-XML's `parse-xml-string`; -* identifies the kind of feed (`parser-dispatch`); and -* calls the appropriate parser, `parse-rss` or `parse-atom`. - -To this, I have added bolt-on functionality -(`http-request-with-timeout`)[^1] which performs the HTTP request on a -separate thread, so that, for all the obvious reasons, the operation can -be aborted after a user-configured timeout period expires. - -The code, in all its depth, is very easy to understand; the reader is -encouraged to peruse `feedparse.lisp`. - -Now, for the usage bit. After we've pressed to `s-xml-feedparse`: - -~~~~ -$ vk.pl p s-xml s-xml-feedparse.vpatch -$ cd s-xml -~~~~ - -and imported all the dependencies[^2], we can now run our parser: - -~~~~ {.commonlisp} -... fire up CL-tron, load dependencies, feedparse, e.g. -> (asdf:load-system :feedparse) -> (defvar *ttp-feed* - (feedparse:parse-feed "http://thetarpit.org/rss.xml")) -*TTP-FEED* -> (feedparse:feed-title *ttp-feed*) -"The Tar Pit - " -> (defvar *ttp-latest* - (car (feedparse:feed-items *ttp-feed*))) -*TTP-LATEST* -> (feedparse:item-title *ttp-latest*) -"An XML parser for Common Lisp programs - " -> (feedparse:item-link *ttp-latest*) -"http://thetarpit.org/posts/y05/086-s-xml.html - " -~~~~ - -In the next episode of this series, we will use feedparse to write a -program that automatically checks for new feeds and populates a -user-defined feed database. - -[^1]: In a normal world, where everything is genesized, properly seated - in its own place and performing its own duties, this function would - be part of the [Drakma][drakma] curl library. As things stand, - however, I've just implemented it directly in feedparse -- this - *will* have to eventually be addressed, so that feed parsing code - will lie with the feed parser, HTTP request code with the curl code - and so on and so forth. - -[^2]: Unfortunately, feedparse depends on Lisp code which is not yet - V-ified, namely [Drakma][drakma] and flexi-streams, which on their - own depend on other packages. The full set of dependencies is: - usocket(1), chipz(2), flexi-streams(3), trivial-gray-streams(4), - chunga(5), cl-base64(6), cl-puri(7) and drakma(8). - - 1. A so-called "portability layer" for TCP and UDP sockets, over - various OS and CL implementations. Required by Drakma. - 2. A gzip library for Common Lisp. Required by Drakma, because - apparently you can't have HTTP without compression nowadays. - 3. The implementation of a binary "stream" data structure. Required - by both Drakma and cl-feedparse, because apparently you can send - arbitrary binary data over HTTP nowadays. - 4. An, I quote, "thin compatibility layer" for - [gray streams][gray-streams]. Required by flexi-streams, because - compatibility layer on top of compatibility layer. - 5. "Chunked streams". Streams on top of streams on top of streams -- - it's streams all the way down! Required by Drakma, - because... well, some HTTP fuckery or another, I won't bother the - reader with details. - 6. Base64 implementation for Common Lisp. - 7. URL parser for Common Lisp. - 8. Curl implementation for Common Lisp. Full of usefuls, but bloated - with crap such as SSLisms, which, by the by, can be "disabled" - (see how SSL fortunately is not part of this dependency - list). All that mess is still there though. - - All this just to grab a piece of XML serialized into text, transform - it into an S-expression and further parse that S-expression into the - structure described in this article. Now, far be it from me to - debate the usefulness of all this stuff, but remember, whenever you - run, e.g.: - - ~~~~ {.commonlisp} - > (ql:quickload :feedparse) - ~~~~ - - you *are* importing it, whether you see it or not and whether you - like it or not. All this burden, you can't just wave it away. - -[tmsr-schedule-i]: /posts/y05/082-tmsr-schedule-i.html#selection-63.114-63.159 -[feedbot]: http://btcbase.org/log-search?q=feedbot -[s-xml]: /posts/y05/086-s-xml.html -[rss]: https://archive.is/3rHl -[atom]: https://archive.is/rdl5b -[s-xml-v]: http://lucian.mogosanu.ro/src/s-xml -[drakma]: https://archive.is/Ms1EQ -[gray-streams]: https://archive.is/jndFX diff --git a/posts/y05/087-feedparse.markdown b/posts/y05/087-feedparse.markdown new file mode 100644 index 0000000..3dc7be4 --- /dev/null +++ b/posts/y05/087-feedparse.markdown @@ -0,0 +1,135 @@ +--- +postid: 087 +title: A feed parser for Common Lisp programs +date: March 6, 2019 +author: Lucian Mogoșanu +tags: tech, tmsr +--- + +This is the second part of [a series][tmsr-schedule-i] on building +blocks for [Feedbot][feedbot]. + +Now that we have an [XML parser][s-xml] on hand, we can use it to obtain +[RSS][rss] and [Atom][atom] feeds in a structured format. Once again, as +far as heathendom goes we are lucky -- one Kyle Isom has provided us +with a feed parser, cl-feedparse, that is under three hundred lines of +neat Lisp code, and quite importantly, it meets the spec, as can be seen +below. + +The code is available as a patch on the [S-XML V tree][s-xml-v]. + +The remainder of this post a. describes the structure and functionality +of feedparse; and b. it provides some usage examples. + +(cl-)feedparse provides the following simple structure for (RSS/Atom) +feeds. A feed can be decomposed into: its title, kind (RSS or Atom), URL +and list of feed items. A feed item has the following elements: an ID, a +title, a (publication) date, a link to the actual item and a +body/description. + +In order to obtain a feed, feedparse: + +* performs an HTTP request, obtaining raw feed data; +* parses the feed, using S-XML's `parse-xml-string`; +* identifies the kind of feed (`parser-dispatch`); and +* calls the appropriate parser, `parse-rss` or `parse-atom`. + +To this, I have added bolt-on functionality +(`http-request-with-timeout`)[^1] which performs the HTTP request on a +separate thread, so that, for all the obvious reasons, the operation can +be aborted after a user-configured timeout period expires. + +The code, in all its depth, is very easy to understand; the reader is +encouraged to peruse the linked [V items][s-xml-v], in particular +s-xml-feedparse.vpatch, and in particular feedparse.lisp. + +Now, for the usage bit. After we've pressed to s-xml-feedparse: + +~~~~ +$ vk.pl p s-xml s-xml-feedparse.vpatch +$ cd s-xml +~~~~ + +and imported all its dependencies[^2], we can now run our parser: + +~~~~ {.commonlisp} +... fire up CL-tron, load dependencies, feedparse, e.g. +> (asdf:load-system :feedparse) +> (defvar *ttp-feed* + (feedparse:parse-feed "http://thetarpit.org/rss.xml")) +*TTP-FEED* +> (feedparse:feed-title *ttp-feed*) +"The Tar Pit + " +> (defvar *ttp-latest* + (car (feedparse:feed-items *ttp-feed*))) +*TTP-LATEST* +> (feedparse:item-title *ttp-latest*) +"A feed parser for Common Lisp programs + " +> (feedparse:item-link *ttp-latest*) +"http://thetarpit.org/posts/y05/087-feedparse.html + " +~~~~ + +In the next episode of this series, we will use feedparse to write a +program that automatically checks for new feeds and populates a +user-defined feed database. + +[^1]: In a normal world, where everything is genesized, properly seated + in its own place and performing its own duties, this function would + be part of the [Drakma][drakma] curl library. As things stand, + however, I've just implemented it directly in feedparse -- this + *will* have to eventually be addressed, so that feed parsing code + will lie with the feed parser, HTTP request code with the curl code + and so on and so forth. See the following footnote for a few gory + details. + +[^2]: Unfortunately, feedparse depends on Lisp code which is not yet + V-ified, namely [Drakma][drakma] and flexi-streams, which on their + own depend on other packages. The full set of dependencies is: + usocket(1), chipz(2), flexi-streams(3), trivial-gray-streams(4), + chunga(5), cl-base64(6), cl-puri(7) and drakma(8): + + 1. A so-called "portability layer" for TCP and UDP sockets, over + various OS and CL implementations. Required by Drakma. + 2. A gzip library for Common Lisp. Required by Drakma, because + apparently you can't have HTTP without compression nowadays. + 3. The implementation of a binary "stream" data structure. Required + by both Drakma and cl-feedparse, because HTTP and arbitrary + binary data. + 4. A so-called "thin compatibility layer" for + [gray streams][gray-streams]. Required by flexi-streams, because + compatibility layer on top of compatibility layer. + 5. "Chunked streams". Streams on top of streams on top of streams -- + it's streams all the way down! Required by Drakma, + because... well, some HTTP fuckery or another, I won't bother the + reader with details. + 6. Base64 implementation for Common Lisp. + 7. URL parser for Common Lisp. + 8. Curl implementation for Common Lisp. Full of usefuls, but bloated + with crap such as SSLisms, which, by the by, can be "disabled" + (see how SSL is -- fortunately! -- not part of this dependency + list). All that mess is still there though. + + All this just to grab a piece of XML serialized into text, transform + it into an S-expression and further parse that S-expression into the + structure described in this article. Now, far be it from me to + debate the usefulness of all this stuff, but remember, whenever you + run, e.g.: + + ~~~~ {.commonlisp} + > (ql:quickload :feedparse) + ~~~~ + + you *are* importing it, whether you see it or not and whether you + like it or not. All this burden, you can't just wave it away. + +[tmsr-schedule-i]: /posts/y05/082-tmsr-schedule-i.html#selection-63.114-63.159 +[feedbot]: http://btcbase.org/log-search?q=feedbot +[s-xml]: /posts/y05/086-s-xml.html +[rss]: https://archive.is/3rHl +[atom]: https://archive.is/rdl5b +[s-xml-v]: http://lucian.mogosanu.ro/src/s-xml +[drakma]: https://archive.is/Ms1EQ +[gray-streams]: https://archive.is/jndFX