From: Lucian Mogosanu Date: Sat, 23 Mar 2019 12:29:02 +0000 (+0200) Subject: posts: 08a X-Git-Tag: v0.11~74 X-Git-Url: https://git.mogosanu.ro/?a=commitdiff_plain;h=a437c866e4295a4995b492b5c7888f638aa7dd69;p=thetarpit.git posts: 08a --- diff --git a/drafts/08a-feedbot-i.markdown b/drafts/08a-feedbot-i.markdown deleted file mode 100644 index a956ffe..0000000 --- a/drafts/08a-feedbot-i.markdown +++ /dev/null @@ -1,232 +0,0 @@ ---- -postid: 08a -title: Feedbot [i]: the feed db; its manipulation; and the feed checker -date: March 22, 2019 -author: Lucian Mogoșanu -tags: tech, tmsr ---- - -This post introduces the first major component of [Feedbot][feedbot]: -the feed checker, accompanied by its building blocks. Although this part -contains no IRC code[^1], Feedbot is part of the [Botworks][botworks] V -tree, resting on top of [Trilemabot][trilemabot]: - -* the [V patch][feedbot-checker.vpatch]; and -* my [seal][feedbot-checker.vpatch.spyked.sig]. - -The rest of this post discusses design and implementation details, -although most of the content here is already contained in the patch[^2]. - -**I. The feed db** - -Feedbot operation is centered around a data structure that I've so -pompously called a "feed database" (or feed db). In more detail: - -**\*** A feed db is a list of feeds. - -**\*** A feed is a list of the form: - -~~~~ {.commonlisp} -(url :title title :entries entries :rcpts rcpts) -~~~~ - -where `url` and `title` are strings, `entries` is a list of entries, -`rcpts` is a list of recipients[^3]. - -**\*** An entry is a list of the form: - -~~~~ {.commonlisp} -(entry :id id :title title :link url) -~~~~ - -where `entry` is the CL symbol ENTRY; `id`, `title` and `url` are -strings. Note that in practice, `id` and `url` often (but not always) -match. - -**\*** A recipient is a string denoting a nick who will receive new -entries when they are added to the database. - -**II. Feed db manipulation** - -Functionality pertaining to the feed db is split into the following -categories: - -**a\.** "Low-level" functions operating on the feed db, feeds, entries -and recipients; examples include setting the title of a feed, adding -entries or searching for a recipient. - -For example, the functions below: - -~~~~ {.commonlisp} -(defun lookup-feed! (feed-id feed-db) - (assoc feed-id feed-db :test #'string=)) - -(defun find-entry-in-feed! (feed entry-id) - (find entry-id (get-feed-entries! feed) - :key #'get-entry-id - :test #'string=)) -~~~~ - -lookup a feed in a given feed db and, respectively, an entry within a -feed. - -**b\.** A macro, `with-feed-db`, providing a thread-safe context for -feed db processing; see notes below for further details. The -implementation is reproduced below: - -~~~~ {.commonlisp} -(defmacro with-feed-db ((feed-db) bot &body body) - "Execute code within the thread-safe `feed-db' scope of `bot'." - (with-gensyms (db-mutex) - `(with-slots ((,feed-db feed-db) (,db-mutex db-mutex)) - ,bot - (with-mutex (,db-mutex) - ,@body)))) -~~~~ - -**c\.** Interface, or "high-level" methods to be called by e.g. the bot -operator or by the IRC-facing code. These typically bear the following -form: - -~~~~ {.commonlisp} -(defmethod feedbot-... ((bot feedbot) arg1 arg2 ...) - ...) -~~~~ - -For example: - -~~~~ {.commonlisp} -(defmethod feedbot-get-or-create-feed ((bot feedbot) feed-id) - "Get feed with id `feed-id' from the feed db of `bot'. - -If `feed-id' doesn't point to a feed, a new feed with that id is created -and inserted into the feed db." - (with-feed-db (feed-db) bot - (let ((feed (lookup-feed! feed-id feed-db))) - (when (not feed) - (setq feed (list feed-id :title "" :entries nil :rcpts nil)) - (push feed feed-db)) - feed))) -~~~~ - -is self-explanatory. - -*Note*: feedbot operates in a concurrent environment, where multiple -threads may access the feed db at a given time; for example, the feed -checker and SBCL's shell. Thus, all (c) functions are implemented in -terms of (a), and furthermore, they use (b) in order to ensure -thread-safety. We distinguish between thread-safe and unsafe functions -by employing the following convention: - -> Feed db functions whose name end in ! (also named below !-functions) -> are thread unsafe and should be used *only* in conjunction with the -> db-mutex or `with-feed-db`. - -*Note*: the feed db should also reside on a persistent medium. This -functionality will be implemented later. - -**III. The feed checker** - -The feed checker runs on a so-called "checker thread", that periodically -(see `*check-freq*`) runs the feed db update code. Additionally, the -feed checker delegates new (previously unseen) entries to a so-called -"announcer"[^4]. - -To test feedbot feed checker functionality, simply run: - -~~~~ {.commonlisp} -> (defvar *feedbot* - (make-instance 'feedbot:feedbot)) -> (feedbot:feedbot-start-checker-thread *feedbot*) -> (feedbot:feedbot-get-or-create-feed - *feedbot* "http://thetarpit.org/rss.xml") -~~~~ - -then sit back and relax. - -[^1]: Achtung! Spoilers below: - - At the moment of writing, Feedbot is supposed to comprise three - parts, in this order: one. a feed checker; two. a feed announcer; - and three. an IRC-based interface. - - The disadvantage of doing it this way is that for two of the three - parts, I'm pushing patches downstream of [ircbot][ircbot] that don't - use any ircbot\*. But let's imagine for a moment that I did it the - other way around -- now the reader can stand up a nice IRC bot - implementing some commands that do what, more precisely? They call - empty functions? They mock [the IRC interface][manual]? - - So that's how things are: at the end of part one, you have a working - bit that checks for new feeds and looks at them and so on and so - forth; at the end of part two, you also have a (small) bit that - looks at new content and consumes it; and finally, after part three - you have the whole thing. - - \-\-\- - \* As an aside: notice how Feedbot imports the - [Feedparse][feedparse] code *ad litteram*? The separate V tree still - exists if you want to use it in your own thing, but otherwise that - item's been completely glued to [Feedbot][feedbot], and thus to - [Botworks][botworks]. This is no news, the same happened before with - [Eucrypt and MPI][eucrypt-mpi]. - -[^2]: For the record: - - ~~~~ - $ wc -l *.lisp - 307 feedbot.lisp - 28 feedbot-utils.lisp - 26 package.lisp - 361 total - $ grep '^ *;' *.lisp | wc -l - 104 - ~~~~ - - That is, comments comprise about one third of the code. This is - roughly similar to [other republican code][ffa-comments] - -[^3]: These so-called "recipients" are for now completely meaningless - and thus useless. So why add them? Let's explore the possibilities; - we could have the feed db organized as: a. a list of feeds, each - containing a list of entries and recipients; b. a list of - recipients, each with its own list of feeds, each with a list of - entries; and c. two separate lists, one with recipients, each - recipient with a list of subscriptions (feed IDs), and one with - feeds, each feed with a list of entries. - - First, we observe that in all cases, entries are subordinated to - feeds -- this sounds like the correct relation, I hope that we're on - the same page here. Second, we observe that each of a, b and c has - its own trade-offs in terms of space and time: for example, - "recipients" are useless if for some reason there's only one feedbot - user; also, subordinating feeds to recipients leads to duplicated - feed checking, at least unless we do some tricks with references to - lists; also, separating feeds and recipients requires an efficient - lookup algorithm for feeds and recipients, in turn requiring a data - structure that's "better" than S-expressions (at least in this - respect), which then later will require a (more complex) - serialization/deserialization piece when we're saving things to the - disk; also, feel free to add any pros and cons to this list. - - So stemming from these trade-offs, I made the decision to use this - particular structure, from which stems this particular feed checker - implementation. Maybe a "better" one will arise at some point, - although experience shows "better" tends to arrive at the problem of - "worse" [after a while][worse-is-better]. - -[^4]: This "so-called" announcer is supposed to notify recipients of new - content whenever it's posted. For now, the announcer is a stub that - prints new entries to standard output. - -[feedbot]: http://btcbase.org/log-search?q=feedbot -[botworks]: /posts/y05/080-botworks-regrind.html -[trilemabot]: /posts/y05/078-trilemabot-ii.html -[ircbot]: http://trinque.org/2016/08/10/ircbot-genesis/ -[manual]: /posts/y05/081-feedbot-manual.html -[feedparse]: /posts/y05/087-feedparse.html -[eucrypt-mpi]: http://btcbase.org/log/2017-12-14#1751589 -[feedbot-checker.vpatch]: TODO -[feedbot-checker.vpatch.spyked.sig]: TODO -[ffa-comments]: http://btcbase.org/log/2019-01-29#1890573 -[worse-is-better]: http://btcbase.org/log-search?q=%22worse+is+better%22 diff --git a/posts/y05/08a-feedbot-i.markdown b/posts/y05/08a-feedbot-i.markdown new file mode 100644 index 0000000..e66fc03 --- /dev/null +++ b/posts/y05/08a-feedbot-i.markdown @@ -0,0 +1,232 @@ +--- +postid: 08a +title: Feedbot [i]: the feed db; its manipulation; and the feed checker +date: March 23, 2019 +author: Lucian Mogoșanu +tags: tech, tmsr +--- + +This post introduces the first major component of [Feedbot][feedbot]: +the feed checker, accompanied by its building blocks. Although this part +contains no actual IRC code[^1], it rests on top of the +[Botworks][botworks] V tree, more precisely [Trilemabot][trilemabot]: + +* the [V patch][feedbot-checker.vpatch]; and +* my [seal][feedbot-checker.vpatch.spyked.sig]. + +The rest of this post discusses design and implementation details, +although most of the content here is already contained in the patch[^2]. + +**I. The feed db** + +Feedbot operation is centered around a data structure that I've so +pompously called a "feed database" (or feed db). In more detail: + +**\*** A feed db is a list of feeds. + +**\*** A feed is a list of the form: + +~~~~ {.commonlisp} +(url :title title :entries entries :rcpts rcpts) +~~~~ + +where `url` and `title` are strings, `entries` is a list of entries, +`rcpts` is a list of recipients[^3]. + +**\*** An entry is a list of the form: + +~~~~ {.commonlisp} +(entry :id id :title title :link url) +~~~~ + +where `entry` is the CL symbol ENTRY; `id`, `title` and `url` are +strings. Note that in practice, `id` and `url` often (but not always) +match. + +**\*** A recipient is a string denoting a nick who will receive new +entries when they are added to the database. + +**II. Feed db manipulation** + +Functionality pertaining to the feed db is split into the following +categories: + +**a\.** "Low-level" functions operating on the feed db, feeds, entries +and recipients; examples include setting the title of a feed, adding +entries or searching for a recipient. + +For example, the functions below: + +~~~~ {.commonlisp} +(defun lookup-feed! (feed-id feed-db) + (assoc feed-id feed-db :test #'string=)) + +(defun find-entry-in-feed! (feed entry-id) + (find entry-id (get-feed-entries! feed) + :key #'get-entry-id + :test #'string=)) +~~~~ + +lookup a feed in a given feed db and, respectively, an entry within a +feed. + +**b\.** A macro, `with-feed-db`, providing a thread-safe context for +feed db processing; see notes below for further details. The +implementation is reproduced below: + +~~~~ {.commonlisp} +(defmacro with-feed-db ((feed-db) bot &body body) + "Execute code within the thread-safe `feed-db' scope of `bot'." + (with-gensyms (db-mutex) + `(with-slots ((,feed-db feed-db) (,db-mutex db-mutex)) + ,bot + (with-mutex (,db-mutex) + ,@body)))) +~~~~ + +**c\.** Interface, or "high-level" methods to be called by e.g. the bot +operator or by the IRC-facing code. These typically bear the following +form: + +~~~~ {.commonlisp} +(defmethod feedbot-... ((bot feedbot) arg1 arg2 ...) + ...) +~~~~ + +For example: + +~~~~ {.commonlisp} +(defmethod feedbot-get-or-create-feed ((bot feedbot) feed-id) + "Get feed with id `feed-id' from the feed db of `bot'. + +If `feed-id' doesn't point to a feed, a new feed with that id is created +and inserted into the feed db." + (with-feed-db (feed-db) bot + (let ((feed (lookup-feed! feed-id feed-db))) + (when (not feed) + (setq feed (list feed-id :title "" :entries nil :rcpts nil)) + (push feed feed-db)) + feed))) +~~~~ + +is self-explanatory. + +*Note*: feedbot operates in a concurrent environment, where multiple +threads may access the feed db at a given time; for example, the feed +checker and SBCL's shell. Thus, all (c) functions are implemented in +terms of (a), and furthermore, they use (b) in order to ensure +thread-safety. We distinguish between thread-safe and unsafe functions +by employing the following convention: + +> Feed db functions whose name end in ! (also named below !-functions) +> are thread unsafe and should be used *only* in conjunction with the +> db-mutex or `with-feed-db`. + +*Note*: the feed db should also reside on a persistent medium. This +functionality will be implemented later. + +**III. The feed checker** + +The feed checker runs on a so-called "checker thread", that periodically +(see `*check-freq*`) runs the feed db update code. Additionally, the +feed checker delegates new (previously unseen) entries to a so-called +"announcer"[^4]. + +To test feedbot feed checker functionality, simply run: + +~~~~ {.commonlisp} +> (defvar *feedbot* + (make-instance 'feedbot:feedbot)) +> (feedbot:feedbot-start-checker-thread *feedbot*) +> (feedbot:feedbot-get-or-create-feed + *feedbot* "http://thetarpit.org/rss.xml") +~~~~ + +then sit back and enjoy the feeds as they come. + +[^1]: Achtung! Spoilers below: + + At the moment of writing, Feedbot is supposed to comprise three + parts, in this order: one. a feed checker; two. a feed announcer; + and three. an IRC-based interface. + + The disadvantage of doing it this way is that for two of the three + parts, I'm pushing patches downstream of [ircbot][ircbot] that don't + use any ircbot\*. But let's imagine for a moment that I did it the + other way around -- now the reader can stand up a nice IRC bot + implementing some commands that do what, more precisely? They call + empty functions? They mock [the IRC interface][manual]? + + So that's how things are: at the end of part one, you have a working + bit that checks for new feeds and looks at them and so on and so + forth; at the end of part two, you also have a (small) bit that + looks at new content and consumes it; and finally, after part three + you have the whole thing. + + \-\-\- + \* As an aside: notice how Feedbot imports the + [Feedparse][feedparse] code *ad litteram*. The separate V tree still + exists if you want to use it in your own thing, but otherwise that + item's been completely glued to [Feedbot][feedbot], and thus to + [Botworks][botworks]. This is no news, the same happened before with + [Eucrypt and MPI][eucrypt-mpi]. + +[^2]: For the record: + + ~~~~ + $ wc -l *.lisp + 307 feedbot.lisp + 28 feedbot-utils.lisp + 26 package.lisp + 361 total + $ grep '^ *;' *.lisp | wc -l + 104 + ~~~~ + + That is, comments comprise about one third of the code. This is + roughly similar to [other republican code][ffa-comments] + +[^3]: These so-called "recipients" are for now completely meaningless + and thus useless. So why add them? Let's explore the possibilities; + we could have the feed db organized as: a. a list of feeds, each + containing a list of entries and recipients; b. a list of + recipients, each with its own list of feeds, each with a list of + entries; and c. two separate lists, one with recipients, each + recipient with a list of subscriptions (feed IDs), and one with + feeds, each feed with a list of entries. + + First, we observe that in all cases, entries are subordinated to + feeds -- this sounds like the correct relation, I hope that we're on + the same page here. Second, we observe that each of a, b and c has + its own trade-offs in terms of space and time usage. For example, + "recipients" are useless if for some reason there's only one feedbot + user; also, subordinating feeds to recipients leads to duplicated + feed checking, at least unless we do some tricks with references to + lists; also, separating feeds and recipients requires an efficient + lookup algorithm for feeds and recipients, in turn requiring a data + structure that's "better" than S-expressions (at least in this + respect), which then later will require a (more complex) + serialization/deserialization piece when we're saving things to the + disk; also, feel free to add any pros and cons to this list. + + So stemming from these trade-offs, I made the decision to use this + particular structure, from which stems this particular feed checker + implementation. Maybe a "better" one will arise at some point, + although experience shows "better" tends to arrive at the problem of + "worse" [after a while][worse-is-better]. + +[^4]: This "so-called" announcer is supposed to notify recipients of new + content whenever it's posted. For now, the announcer is a stub that + prints new entries to standard output. + +[feedbot]: http://btcbase.org/log-search?q=feedbot +[botworks]: /posts/y05/080-botworks-regrind.html +[trilemabot]: /posts/y05/078-trilemabot-ii.html +[ircbot]: http://trinque.org/2016/08/10/ircbot-genesis/ +[manual]: /posts/y05/081-feedbot-manual.html +[feedparse]: /posts/y05/087-feedparse.html +[eucrypt-mpi]: http://btcbase.org/log/2017-12-14#1751589 +[feedbot-checker.vpatch]: http://lucian.mogosanu.ro/src/botworks/v/patches/feedbot-checker.vpatch +[feedbot-checker.vpatch.spyked.sig]: http://lucian.mogosanu.ro/src/botworks/v/seals/feedbot-checker.vpatch.spyked.sig +[ffa-comments]: http://btcbase.org/log/2019-01-29#1890573 +[worse-is-better]: http://btcbase.org/log-search?q=%22worse+is+better%22