+++ /dev/null
----
-postid: 08a
-title: Feedbot [i]: the feed db; its manipulation; and the feed checker
-date: March 22, 2019
-author: Lucian Mogoșanu
-tags: tech, tmsr
----
-
-This post introduces the first major component of [Feedbot][feedbot]:
-the feed checker, accompanied by its building blocks. Although this part
-contains no IRC code[^1], Feedbot is part of the [Botworks][botworks] V
-tree, resting on top of [Trilemabot][trilemabot]:
-
-* the [V patch][feedbot-checker.vpatch]; and
-* my [seal][feedbot-checker.vpatch.spyked.sig].
-
-The rest of this post discusses design and implementation details,
-although most of the content here is already contained in the patch[^2].
-
-**I. The feed db**
-
-Feedbot operation is centered around a data structure that I've so
-pompously called a "feed database" (or feed db). In more detail:
-
-**\*** A feed db is a list of feeds.
-
-**\*** A feed is a list of the form:
-
-~~~~ {.commonlisp}
-(url :title title :entries entries :rcpts rcpts)
-~~~~
-
-where `url` and `title` are strings, `entries` is a list of entries,
-`rcpts` is a list of recipients[^3].
-
-**\*** An entry is a list of the form:
-
-~~~~ {.commonlisp}
-(entry :id id :title title :link url)
-~~~~
-
-where `entry` is the CL symbol ENTRY; `id`, `title` and `url` are
-strings. Note that in practice, `id` and `url` often (but not always)
-match.
-
-**\*** A recipient is a string denoting a nick who will receive new
-entries when they are added to the database.
-
-**II. Feed db manipulation**
-
-Functionality pertaining to the feed db is split into the following
-categories:
-
-**a\.** "Low-level" functions operating on the feed db, feeds, entries
-and recipients; examples include setting the title of a feed, adding
-entries or searching for a recipient.
-
-For example, the functions below:
-
-~~~~ {.commonlisp}
-(defun lookup-feed! (feed-id feed-db)
- (assoc feed-id feed-db :test #'string=))
-
-(defun find-entry-in-feed! (feed entry-id)
- (find entry-id (get-feed-entries! feed)
- :key #'get-entry-id
- :test #'string=))
-~~~~
-
-lookup a feed in a given feed db and, respectively, an entry within a
-feed.
-
-**b\.** A macro, `with-feed-db`, providing a thread-safe context for
-feed db processing; see notes below for further details. The
-implementation is reproduced below:
-
-~~~~ {.commonlisp}
-(defmacro with-feed-db ((feed-db) bot &body body)
- "Execute code within the thread-safe `feed-db' scope of `bot'."
- (with-gensyms (db-mutex)
- `(with-slots ((,feed-db feed-db) (,db-mutex db-mutex))
- ,bot
- (with-mutex (,db-mutex)
- ,@body))))
-~~~~
-
-**c\.** Interface, or "high-level" methods to be called by e.g. the bot
-operator or by the IRC-facing code. These typically bear the following
-form:
-
-~~~~ {.commonlisp}
-(defmethod feedbot-... ((bot feedbot) arg1 arg2 ...)
- ...)
-~~~~
-
-For example:
-
-~~~~ {.commonlisp}
-(defmethod feedbot-get-or-create-feed ((bot feedbot) feed-id)
- "Get feed with id `feed-id' from the feed db of `bot'.
-
-If `feed-id' doesn't point to a feed, a new feed with that id is created
-and inserted into the feed db."
- (with-feed-db (feed-db) bot
- (let ((feed (lookup-feed! feed-id feed-db)))
- (when (not feed)
- (setq feed (list feed-id :title "" :entries nil :rcpts nil))
- (push feed feed-db))
- feed)))
-~~~~
-
-is self-explanatory.
-
-*Note*: feedbot operates in a concurrent environment, where multiple
-threads may access the feed db at a given time; for example, the feed
-checker and SBCL's shell. Thus, all (c) functions are implemented in
-terms of (a), and furthermore, they use (b) in order to ensure
-thread-safety. We distinguish between thread-safe and unsafe functions
-by employing the following convention:
-
-> Feed db functions whose name end in ! (also named below !-functions)
-> are thread unsafe and should be used *only* in conjunction with the
-> db-mutex or `with-feed-db`.
-
-*Note*: the feed db should also reside on a persistent medium. This
-functionality will be implemented later.
-
-**III. The feed checker**
-
-The feed checker runs on a so-called "checker thread", that periodically
-(see `*check-freq*`) runs the feed db update code. Additionally, the
-feed checker delegates new (previously unseen) entries to a so-called
-"announcer"[^4].
-
-To test feedbot feed checker functionality, simply run:
-
-~~~~ {.commonlisp}
-> (defvar *feedbot*
- (make-instance 'feedbot:feedbot))
-> (feedbot:feedbot-start-checker-thread *feedbot*)
-> (feedbot:feedbot-get-or-create-feed
- *feedbot* "http://thetarpit.org/rss.xml")
-~~~~
-
-then sit back and relax.
-
-[^1]: Achtung! Spoilers below:
-
- At the moment of writing, Feedbot is supposed to comprise three
- parts, in this order: one. a feed checker; two. a feed announcer;
- and three. an IRC-based interface.
-
- The disadvantage of doing it this way is that for two of the three
- parts, I'm pushing patches downstream of [ircbot][ircbot] that don't
- use any ircbot\*. But let's imagine for a moment that I did it the
- other way around -- now the reader can stand up a nice IRC bot
- implementing some commands that do what, more precisely? They call
- empty functions? They mock [the IRC interface][manual]?
-
- So that's how things are: at the end of part one, you have a working
- bit that checks for new feeds and looks at them and so on and so
- forth; at the end of part two, you also have a (small) bit that
- looks at new content and consumes it; and finally, after part three
- you have the whole thing.
-
- \-\-\-
- \* As an aside: notice how Feedbot imports the
- [Feedparse][feedparse] code *ad litteram*? The separate V tree still
- exists if you want to use it in your own thing, but otherwise that
- item's been completely glued to [Feedbot][feedbot], and thus to
- [Botworks][botworks]. This is no news, the same happened before with
- [Eucrypt and MPI][eucrypt-mpi].
-
-[^2]: For the record:
-
- ~~~~
- $ wc -l *.lisp
- 307 feedbot.lisp
- 28 feedbot-utils.lisp
- 26 package.lisp
- 361 total
- $ grep '^ *;' *.lisp | wc -l
- 104
- ~~~~
-
- That is, comments comprise about one third of the code. This is
- roughly similar to [other republican code][ffa-comments]
-
-[^3]: These so-called "recipients" are for now completely meaningless
- and thus useless. So why add them? Let's explore the possibilities;
- we could have the feed db organized as: a. a list of feeds, each
- containing a list of entries and recipients; b. a list of
- recipients, each with its own list of feeds, each with a list of
- entries; and c. two separate lists, one with recipients, each
- recipient with a list of subscriptions (feed IDs), and one with
- feeds, each feed with a list of entries.
-
- First, we observe that in all cases, entries are subordinated to
- feeds -- this sounds like the correct relation, I hope that we're on
- the same page here. Second, we observe that each of a, b and c has
- its own trade-offs in terms of space and time: for example,
- "recipients" are useless if for some reason there's only one feedbot
- user; also, subordinating feeds to recipients leads to duplicated
- feed checking, at least unless we do some tricks with references to
- lists; also, separating feeds and recipients requires an efficient
- lookup algorithm for feeds and recipients, in turn requiring a data
- structure that's "better" than S-expressions (at least in this
- respect), which then later will require a (more complex)
- serialization/deserialization piece when we're saving things to the
- disk; also, feel free to add any pros and cons to this list.
-
- So stemming from these trade-offs, I made the decision to use this
- particular structure, from which stems this particular feed checker
- implementation. Maybe a "better" one will arise at some point,
- although experience shows "better" tends to arrive at the problem of
- "worse" [after a while][worse-is-better].
-
-[^4]: This "so-called" announcer is supposed to notify recipients of new
- content whenever it's posted. For now, the announcer is a stub that
- prints new entries to standard output.
-
-[feedbot]: http://btcbase.org/log-search?q=feedbot
-[botworks]: /posts/y05/080-botworks-regrind.html
-[trilemabot]: /posts/y05/078-trilemabot-ii.html
-[ircbot]: http://trinque.org/2016/08/10/ircbot-genesis/
-[manual]: /posts/y05/081-feedbot-manual.html
-[feedparse]: /posts/y05/087-feedparse.html
-[eucrypt-mpi]: http://btcbase.org/log/2017-12-14#1751589
-[feedbot-checker.vpatch]: TODO
-[feedbot-checker.vpatch.spyked.sig]: TODO
-[ffa-comments]: http://btcbase.org/log/2019-01-29#1890573
-[worse-is-better]: http://btcbase.org/log-search?q=%22worse+is+better%22
--- /dev/null
+---
+postid: 08a
+title: Feedbot [i]: the feed db; its manipulation; and the feed checker
+date: March 23, 2019
+author: Lucian Mogoșanu
+tags: tech, tmsr
+---
+
+This post introduces the first major component of [Feedbot][feedbot]:
+the feed checker, accompanied by its building blocks. Although this part
+contains no actual IRC code[^1], it rests on top of the
+[Botworks][botworks] V tree, more precisely [Trilemabot][trilemabot]:
+
+* the [V patch][feedbot-checker.vpatch]; and
+* my [seal][feedbot-checker.vpatch.spyked.sig].
+
+The rest of this post discusses design and implementation details,
+although most of the content here is already contained in the patch[^2].
+
+**I. The feed db**
+
+Feedbot operation is centered around a data structure that I've so
+pompously called a "feed database" (or feed db). In more detail:
+
+**\*** A feed db is a list of feeds.
+
+**\*** A feed is a list of the form:
+
+~~~~ {.commonlisp}
+(url :title title :entries entries :rcpts rcpts)
+~~~~
+
+where `url` and `title` are strings, `entries` is a list of entries,
+`rcpts` is a list of recipients[^3].
+
+**\*** An entry is a list of the form:
+
+~~~~ {.commonlisp}
+(entry :id id :title title :link url)
+~~~~
+
+where `entry` is the CL symbol ENTRY; `id`, `title` and `url` are
+strings. Note that in practice, `id` and `url` often (but not always)
+match.
+
+**\*** A recipient is a string denoting a nick who will receive new
+entries when they are added to the database.
+
+**II. Feed db manipulation**
+
+Functionality pertaining to the feed db is split into the following
+categories:
+
+**a\.** "Low-level" functions operating on the feed db, feeds, entries
+and recipients; examples include setting the title of a feed, adding
+entries or searching for a recipient.
+
+For example, the functions below:
+
+~~~~ {.commonlisp}
+(defun lookup-feed! (feed-id feed-db)
+ (assoc feed-id feed-db :test #'string=))
+
+(defun find-entry-in-feed! (feed entry-id)
+ (find entry-id (get-feed-entries! feed)
+ :key #'get-entry-id
+ :test #'string=))
+~~~~
+
+lookup a feed in a given feed db and, respectively, an entry within a
+feed.
+
+**b\.** A macro, `with-feed-db`, providing a thread-safe context for
+feed db processing; see notes below for further details. The
+implementation is reproduced below:
+
+~~~~ {.commonlisp}
+(defmacro with-feed-db ((feed-db) bot &body body)
+ "Execute code within the thread-safe `feed-db' scope of `bot'."
+ (with-gensyms (db-mutex)
+ `(with-slots ((,feed-db feed-db) (,db-mutex db-mutex))
+ ,bot
+ (with-mutex (,db-mutex)
+ ,@body))))
+~~~~
+
+**c\.** Interface, or "high-level" methods to be called by e.g. the bot
+operator or by the IRC-facing code. These typically bear the following
+form:
+
+~~~~ {.commonlisp}
+(defmethod feedbot-... ((bot feedbot) arg1 arg2 ...)
+ ...)
+~~~~
+
+For example:
+
+~~~~ {.commonlisp}
+(defmethod feedbot-get-or-create-feed ((bot feedbot) feed-id)
+ "Get feed with id `feed-id' from the feed db of `bot'.
+
+If `feed-id' doesn't point to a feed, a new feed with that id is created
+and inserted into the feed db."
+ (with-feed-db (feed-db) bot
+ (let ((feed (lookup-feed! feed-id feed-db)))
+ (when (not feed)
+ (setq feed (list feed-id :title "" :entries nil :rcpts nil))
+ (push feed feed-db))
+ feed)))
+~~~~
+
+is self-explanatory.
+
+*Note*: feedbot operates in a concurrent environment, where multiple
+threads may access the feed db at a given time; for example, the feed
+checker and SBCL's shell. Thus, all (c) functions are implemented in
+terms of (a), and furthermore, they use (b) in order to ensure
+thread-safety. We distinguish between thread-safe and unsafe functions
+by employing the following convention:
+
+> Feed db functions whose name end in ! (also named below !-functions)
+> are thread unsafe and should be used *only* in conjunction with the
+> db-mutex or `with-feed-db`.
+
+*Note*: the feed db should also reside on a persistent medium. This
+functionality will be implemented later.
+
+**III. The feed checker**
+
+The feed checker runs on a so-called "checker thread", that periodically
+(see `*check-freq*`) runs the feed db update code. Additionally, the
+feed checker delegates new (previously unseen) entries to a so-called
+"announcer"[^4].
+
+To test feedbot feed checker functionality, simply run:
+
+~~~~ {.commonlisp}
+> (defvar *feedbot*
+ (make-instance 'feedbot:feedbot))
+> (feedbot:feedbot-start-checker-thread *feedbot*)
+> (feedbot:feedbot-get-or-create-feed
+ *feedbot* "http://thetarpit.org/rss.xml")
+~~~~
+
+then sit back and enjoy the feeds as they come.
+
+[^1]: Achtung! Spoilers below:
+
+ At the moment of writing, Feedbot is supposed to comprise three
+ parts, in this order: one. a feed checker; two. a feed announcer;
+ and three. an IRC-based interface.
+
+ The disadvantage of doing it this way is that for two of the three
+ parts, I'm pushing patches downstream of [ircbot][ircbot] that don't
+ use any ircbot\*. But let's imagine for a moment that I did it the
+ other way around -- now the reader can stand up a nice IRC bot
+ implementing some commands that do what, more precisely? They call
+ empty functions? They mock [the IRC interface][manual]?
+
+ So that's how things are: at the end of part one, you have a working
+ bit that checks for new feeds and looks at them and so on and so
+ forth; at the end of part two, you also have a (small) bit that
+ looks at new content and consumes it; and finally, after part three
+ you have the whole thing.
+
+ \-\-\-
+ \* As an aside: notice how Feedbot imports the
+ [Feedparse][feedparse] code *ad litteram*. The separate V tree still
+ exists if you want to use it in your own thing, but otherwise that
+ item's been completely glued to [Feedbot][feedbot], and thus to
+ [Botworks][botworks]. This is no news, the same happened before with
+ [Eucrypt and MPI][eucrypt-mpi].
+
+[^2]: For the record:
+
+ ~~~~
+ $ wc -l *.lisp
+ 307 feedbot.lisp
+ 28 feedbot-utils.lisp
+ 26 package.lisp
+ 361 total
+ $ grep '^ *;' *.lisp | wc -l
+ 104
+ ~~~~
+
+ That is, comments comprise about one third of the code. This is
+ roughly similar to [other republican code][ffa-comments]
+
+[^3]: These so-called "recipients" are for now completely meaningless
+ and thus useless. So why add them? Let's explore the possibilities;
+ we could have the feed db organized as: a. a list of feeds, each
+ containing a list of entries and recipients; b. a list of
+ recipients, each with its own list of feeds, each with a list of
+ entries; and c. two separate lists, one with recipients, each
+ recipient with a list of subscriptions (feed IDs), and one with
+ feeds, each feed with a list of entries.
+
+ First, we observe that in all cases, entries are subordinated to
+ feeds -- this sounds like the correct relation, I hope that we're on
+ the same page here. Second, we observe that each of a, b and c has
+ its own trade-offs in terms of space and time usage. For example,
+ "recipients" are useless if for some reason there's only one feedbot
+ user; also, subordinating feeds to recipients leads to duplicated
+ feed checking, at least unless we do some tricks with references to
+ lists; also, separating feeds and recipients requires an efficient
+ lookup algorithm for feeds and recipients, in turn requiring a data
+ structure that's "better" than S-expressions (at least in this
+ respect), which then later will require a (more complex)
+ serialization/deserialization piece when we're saving things to the
+ disk; also, feel free to add any pros and cons to this list.
+
+ So stemming from these trade-offs, I made the decision to use this
+ particular structure, from which stems this particular feed checker
+ implementation. Maybe a "better" one will arise at some point,
+ although experience shows "better" tends to arrive at the problem of
+ "worse" [after a while][worse-is-better].
+
+[^4]: This "so-called" announcer is supposed to notify recipients of new
+ content whenever it's posted. For now, the announcer is a stub that
+ prints new entries to standard output.
+
+[feedbot]: http://btcbase.org/log-search?q=feedbot
+[botworks]: /posts/y05/080-botworks-regrind.html
+[trilemabot]: /posts/y05/078-trilemabot-ii.html
+[ircbot]: http://trinque.org/2016/08/10/ircbot-genesis/
+[manual]: /posts/y05/081-feedbot-manual.html
+[feedparse]: /posts/y05/087-feedparse.html
+[eucrypt-mpi]: http://btcbase.org/log/2017-12-14#1751589
+[feedbot-checker.vpatch]: http://lucian.mogosanu.ro/src/botworks/v/patches/feedbot-checker.vpatch
+[feedbot-checker.vpatch.spyked.sig]: http://lucian.mogosanu.ro/src/botworks/v/seals/feedbot-checker.vpatch.spyked.sig
+[ffa-comments]: http://btcbase.org/log/2019-01-29#1890573
+[worse-is-better]: http://btcbase.org/log-search?q=%22worse+is+better%22