From: Lucian Mogosanu Date: Tue, 19 Mar 2019 19:48:24 +0000 (+0200) Subject: drafts: 08a X-Git-Tag: v0.11~78 X-Git-Url: https://git.mogosanu.ro/?a=commitdiff_plain;h=58af9f185f376286b675482e3e502dd6c261b704;p=thetarpit.git drafts: 08a --- diff --git a/drafts/08a-feedbot-i.markdown b/drafts/08a-feedbot-i.markdown new file mode 100644 index 0000000..822faeb --- /dev/null +++ b/drafts/08a-feedbot-i.markdown @@ -0,0 +1,216 @@ +--- +postid: 08a +title: Feedbot [i]: the feed db; its manipulation; and the feed checker +date: March 22, 2019 +author: Lucian Mogoșanu +tags: tech, tmsr +--- + +This post introduces the first major component of [Feedbot][feedbot]: +the feed checker, accompanied by its building blocks. Although this part +contains no IRC code[^1], Feedbot is part of the [Botworks][botworks] V +tree, resting on top of [Trilemabot][trilemabot]: + +* the [V patch][feedbot-checker.vpatch]; and +* my [seal][feedbot-checker.vpatch.spyked.sig]. + +The rest of this post discusses design and implementation details, +although most of the content here is already contained in the patch[^2]. + +**I. The feed db** + +Feedbot operation is centered around a data structure that I've so +pompously called a "feed database" (or feed db). In more detail: + +**\*** A feed db is a list of feeds. + +**\*** A feed is a list of the form: + +~~~~ {.commonlisp} +(url :title title :entries entries :rcpts rcpts) +~~~~ + +where `url` and `title` are strings, `entries` is a list of entries, +`rcpts` is a list of recipients[^3]. + +**\*** An entry is a list of the form: + +~~~~ {.commonlisp} +(entry :id id :title title :link url) +~~~~ + +where `entry` is the CL symbol ENTRY; `id`, `title` and `url` are +strings. Note that in practice, `id` and `url` often (but not always) +match. + +**\*** A recipient is a string denoting a nick who will receive new +entries when they are added to the database. + +**II. Feed db manipulation** + +Functionality pertaining to the feed db is split into the following +categories: + +**a\.** "Low-level" functions operating on the feed db, feeds, entries +and recipients; examples include setting the title of a feed, adding +entries or searching for a recipient. + +For example, the functions below: + +~~~~ {.commonlisp} +(defun lookup-feed! (feed-id feed-db) + (assoc feed-id feed-db :test #'string=)) + +(defun find-entry-in-feed! (feed entry-id) + (find entry-id (get-feed-entries! feed) + :key #'get-entry-id + :test #'string=)) +~~~~ + +lookup a feed in a given feed db and, respectively, an entry within a +feed. + +**b\.** A macro, `with-feed-db`, providing a thread-safe context for +feed db processing; see notes below for further details. The +implementation is reproduced below: + +~~~~ {.commonlisp} +(defmacro with-feed-db ((feed-db) bot &body body) + "Execute code within the thread-safe `feed-db' scope of `bot'." + (with-gensyms (db-mutex) + `(with-slots ((,feed-db feed-db) (,db-mutex db-mutex)) + ,bot + (with-mutex (,db-mutex) + ,@body)))) +~~~~ + +**c\.** Interface, or "high-level" methods to be called by e.g. the bot +operator or by the IRC-facing code. These typically bear the following +form: + +~~~~ {.commonlisp} +(defmethod feedbot-... ((bot feedbot) arg1 arg2 ...) + ...) +~~~~ + +For example: + +~~~~ {.commonlisp} +(defmethod feedbot-get-or-create-feed ((bot feedbot) feed-id) + "Get feed with id `feed-id' from the feed db of `bot'. + +If `feed-id' doesn't point to a feed, a new feed with that id is created +and inserted into the feed db." + (with-feed-db (feed-db) bot + (let ((feed (lookup-feed! feed-id feed-db))) + (when (not feed) + (setq feed (list feed-id :title "" :entries nil :rcpts nil)) + (push feed feed-db)) + feed))) +~~~~ + +is self-explanatory. + +*Note*: feedbot operates in a concurrent environment, where multiple +threads may access the feed db at a given time; for example, the feed +checker and SBCL's shell. Thus, all (c) functions are implemented in +terms of (a), and furthermore, they use (b) in order to ensure +thread-safety. We distinguish between thread-safe and unsafe functions +by employing the following convention: + +> Feed db functions whose name end in ! (also named below !-functions) +> are thread unsafe and should be used *only* in conjunction with the +> db-mutex or `with-feed-db`. + +*Note*: the feed db should also reside on a persistent medium. This +functionality will be implemented later. + +**III. The feed checker** + +The feed checker runs on a so-called "checker thread", that periodically +(see `*check-freq*`) runs the feed db update code. Additionally, the +feed checker delegates new (previously unseen) entries to a so-called +"announcer"[^4]. + +To test feedbot feed checker functionality, simply run: + +~~~~ {.commonlisp} +> (defvar *feedbot* + (make-instance 'feedbot:feedbot)) +> (feedbot:feedbot-start-checker-thread *feedbot*) +~~~~ + +[^1]: Achtung! Spoilers below: + + At the moment of writing, Feedbot is supposed to comprise three + parts, in this order: one. a feed checker; two. a feed announcer; + and three. an IRC-based interface. + + The disadvantage of doing it this way is that for two of the three + parts, I'm pushing patches downstream of [ircbot][ircbot] that don't + use any ircbot. But let's imagine for a moment that I did it the + other way around -- now the reader can stand up a nice IRC bot + implementing some commands that do what, more precisely? They call + empty functions? They mock [the IRC interface][manual]? + + So that's how things are: at the end of part one, you have a working + bit that checks for new feeds and looks at them and so on and so + forth; at the end of part two, you also have a (small) bit that + looks at new content and consumes it; and finally, after part three + you have the whole thing. + +[^2]: For the record: + + ~~~~ + $ wc -l *.lisp + 294 feedbot.lisp + 28 feedbot-utils.lisp + 26 package.lisp + 348 total + $ grep '^ *;' *.lisp | wc -l + 101 + ~~~~ + + Translating this: about one third of the code is made up of + comments. + +[^3]: These so-called "recipients" are for now completely meaningless + and thus useless. So why add them? Let's explore the possibilities; + we could have the feed db organized as: a. a list of feeds, each + containing a list of entries and recipients; b. a list of + recipients, each with its own list of feeds, each with a list of + entries; and c. two separate lists, one with recipients, each + recipient with a list of subscriptions (feed IDs), and one with + feeds, each feed with a list of entries. + + First, we observe that in all cases, entries are subordinated to + feeds -- this sounds like the correct relation, I hope that we're on + the same page here. Second, we observe that each of a, b and c has + its own trade-offs in terms of space and time: for example, + "recipients" are useless if for some reason there's only one feedbot + user; also, subordinating feeds to recipients leads to duplicated + feed checking, at least unless we do some tricks with references to + lists; also, separating feeds and recipients requires an efficient + lookup algorithm for feeds and recipients, in turn requiring a data + structure that's "better" than S-expressions (at least in this + respect), which then later will require a (more complex) + serialization/deserialization piece when we're saving things to the + disk; also, feel free to add any pros and cons to this list. + + So stemming from these trade-offs, I made the decision to use this + particular structure, from which stems this particular feed checker + implementation. Maybe a "better" one will arise at some point, + although experience shows "better" tends to arrive at the problem of + "worse" [after a while][worse-is-better]. + +[^4]: For now, the announcer is a stub code that prints to standard + output whenever new entries are posted. + +[feedbot]: http://btcbase.org/log-search?q=feedbot +[botworks]: /posts/y05/080-botworks-regrind.html +[trilemabot]: /posts/y05/078-trilemabot-ii.html +[ircbot]: http://trinque.org/2016/08/10/ircbot-genesis/ +[manual]: /posts/y05/081-feedbot-manual.html +[feedbot-checker.vpatch]: TODO +[feedbot-checker.vpatch.spyked.sig]: TODO +[worse-is-better]: http://btcbase.org/log-search?q=%22worse+is+better%22