From 61943a5a82db5ab28921707627d3d36eeac97ef9 Mon Sep 17 00:00:00 2001 From: Lucian Mogosanu Date: Tue, 5 Mar 2019 22:20:31 +0200 Subject: [PATCH] drafts, 087: Add full draft --- drafts/087-feedparse.markdown | 99 ++++++++++++++++++++++++++++++++++------- 1 file changed, 83 insertions(+), 16 deletions(-) diff --git a/drafts/087-feedparse.markdown b/drafts/087-feedparse.markdown index f6d00c7..4a22e85 100644 --- a/drafts/087-feedparse.markdown +++ b/drafts/087-feedparse.markdown @@ -1,7 +1,7 @@ --- postid: 087 title: A feed parser for Common Lisp programs -date: February 18, 2019 +date: March 6, 2019 author: Lucian Mogoșanu tags: tech, tmsr --- @@ -33,33 +33,100 @@ In order to obtain a feed, feedparse: * identifies the kind of feed (`parser-dispatch`); and * calls the appropriate parser, `parse-rss` or `parse-atom`. -Both parsers are very easy to understand; the reader is encouraged to -peruse `feedparse.lisp`. +To this, I have added bolt-on functionality +(`http-request-with-timeout`)[^1] which performs the HTTP request on a +separate thread, so that, for all the obvious reasons, the operation can +be aborted after a user-configured timeout period expires. -After we've pressed to `s-xml-feedparse`: +The code, in all its depth, is very easy to understand; the reader is +encouraged to peruse `feedparse.lisp`. + +Now, for the usage bit. After we've pressed to `s-xml-feedparse`: ~~~~ -TODO +$ vk.pl p s-xml s-xml-feedparse.vpatch +$ cd s-xml ~~~~ -and imported all the dependencies[^1], we can now run our parser: +and imported all the dependencies[^2], we can now run our parser: +~~~~ {.commonlisp} +... fire up CL-tron, load dependencies, feedparse, e.g. +> (asdf:load-system :feedparse) +> (defvar *ttp-feed* + (feedparse:parse-feed "http://thetarpit.org/rss.xml")) +*TTP-FEED* +> (feedparse:feed-title *ttp-feed*) +"The Tar Pit + " +> (defvar *ttp-latest* + (car (feedparse:feed-items *ttp-feed*))) +*TTP-LATEST* +> (feedparse:item-title *ttp-latest*) +"An XML parser for Common Lisp programs + " +> (feedparse:item-link *ttp-latest*) +"http://thetarpit.org/posts/y05/086-s-xml.html + " ~~~~ -TODO -~~~~ -[^1]: Unfortunately, both feedparse depends on Lisp code which is not - yet V-ified, namely [Drakma][drakma] and flexi-streams, which on - their own depend on other packages. The full set of dependencies is: - usocket, chipz, flexi-streams, trivial-gray-streams, chunga, - cl-base64, cl-puri and drakma. +In the next episode of this series, we will use feedparse to write a +program that automatically checks for new feeds and populates a +user-defined feed database. + +[^1]: In a normal world, where everything is genesized, properly seated + in its own place and performing its own duties, this function would + be part of the [Drakma][drakma] curl library. As things stand, + however, I've just implemented it directly in feedparse -- this + *will* have to eventually be addressed, so that feed parsing code + will lie with the feed parser, HTTP request code with the curl code + and so on and so forth. + +[^2]: Unfortunately, feedparse depends on Lisp code which is not yet + V-ified, namely [Drakma][drakma] and flexi-streams, which on their + own depend on other packages. The full set of dependencies is: + usocket(1), chipz(2), flexi-streams(3), trivial-gray-streams(4), + chunga(5), cl-base64(6), cl-puri(7) and drakma(8). + + 1. A so-called "portability layer" for TCP and UDP sockets, over + various OS and CL implementations. Required by Drakma. + 2. A gzip library for Common Lisp. Required by Drakma, because + apparently you can't have HTTP without compression nowadays. + 3. The implementation of a binary "stream" data structure. Required + by both Drakma and cl-feedparse, because apparently you can send + arbitrary binary data over HTTP nowadays. + 4. An, I quote, "thin compatibility layer" for + [gray streams][gray-streams]. Required by flexi-streams, because + compatibility layer on top of compatibility layer. + 5. "Chunked streams". Streams on top of streams on top of streams -- + it's streams all the way down! Required by Drakma, + because... well, some HTTP fuckery or another, I won't bother the + reader with details. + 6. Base64 implementation for Common Lisp. + 7. URL parser for Common Lisp. + 8. Curl implementation for Common Lisp. Full of usefuls, but bloated + with crap such as SSLisms, which, by the by, can be "disabled" + (see how SSL fortunately is not part of this dependency + list). All that mess is still there though. + + All this just to grab a piece of XML serialized into text, transform + it into an S-expression and further parse that S-expression into the + structure described in this article. Now, far be it from me to + debate the usefulness of all this stuff, but remember, whenever you + run, e.g.: + + ~~~~ {.commonlisp} + > (ql:quickload :feedparse) + ~~~~ - TODO: describe these. + you *are* importing it, whether you see it or not and whether you + like it or not. All this burden, you can't just wave it away. [tmsr-schedule-i]: /posts/y05/082-tmsr-schedule-i.html#selection-63.114-63.159 [feedbot]: http://btcbase.org/log-search?q=feedbot -[s-xml]: TODO +[s-xml]: /posts/y05/086-s-xml.html [rss]: https://archive.is/3rHl [atom]: https://archive.is/rdl5b [s-xml-v]: http://lucian.mogosanu.ro/src/s-xml -[drakma]: TODO +[drakma]: https://archive.is/Ms1EQ +[gray-streams]: https://archive.is/jndFX -- 1.7.10.4