From: Lucian Mogosanu Date: Sun, 14 Feb 2016 17:17:17 +0000 (+0200) Subject: posts: 042, 043 X-Git-Tag: v0.5~7 X-Git-Url: https://git.mogosanu.ro/?a=commitdiff_plain;h=7c26ce4aa05935b872195883b4063b140eac8f23;p=thetarpit.git posts: 042, 043 --- diff --git a/posts/y02/042-category-theory-software-engineering.markdown b/posts/y02/042-category-theory-software-engineering.markdown new file mode 100644 index 0000000..0f475a4 --- /dev/null +++ b/posts/y02/042-category-theory-software-engineering.markdown @@ -0,0 +1,310 @@ +--- +postid: 042 +title: Category theory and its application in software engineering +date: January 30, 2016 +author: Lucian Mogoșanu +tags: math, tech +--- + +I have touched on the subject of category theory in the past, motivated +partly by my enthusiasm of working with a mathematical framework that is +so simple yet so powerful, and partly by the usefulness of categorical +models in software. This essay draws from previous posts +[on the old blog][bricks1] and from my previous experience with the +subject, and I am posting it hoping that it will represent a starting +point for other interesting writings. + +I am fairly sure that most of the work in this post is in no way +original; that is, there are other publications where categorical +approaches to software modeling, and matters pertaining to category +theory in general, are already explained, and most probably better than +they are here. For example Steve Awodey has an excellent book providing +an in-depth mathematical exploration of category theory[^1]; Robert +Harper discusses on the (major) impact of categories on type theory[^2], +computation and computer programs; Brent Yorgey makes a really good +overview of the relation between categories and Haskell type +classes[^3]. There is much more material on the web and in books, and +while you're not required to peruse it in order to read this, I +certainly encourage you to have a look. + +## Category theory: introduction, definitions + +While mathematics is an exact "science"[^4], its methodology differs +from that of, say, physics or biology, which have fundamentally +different ojectives, although the latter very often make use of +mathematical means to make sense of the world. Instead, it'd be fairer +to find the origins of mathematics in philosophy, which discusses +concepts, or ideas, or essences, rather than objective experience. + +For the last century or so all mathematicians and philosophers have been +in agreement on the fact that mathematics must have a philosophical and +logical basis. For quite a long time, that basis was, and to some degree +still is, set theory; the limitations of naïve set theory[^5] have been +thoroughly explored in the 20th century and the need for a "more +complete" theory of mathematics was and is still felt by +mathematicians. Even though nowadays we prefer using computers to solve +problems requiring mathematics, this has nothing to do with computers +themselves, although it has everything to do with the theory of +computation. + +Category theory was for a while believed to be this new, previously +missing, foundation of mathematics. This doesn't seem to be the +consensus among mathematicians anymore, but despite that, categories +still play an important role in defining the new framework[^6]. Also +note that in Harper's Holy Trinity, the categorical approach defines the +so-called "universe of reasoning" in terms of mappings and structures, a +view that is very much in sync with that of software architecture and +software engineering. + +What is then a category? According to the definition, any category +necessarily consists of the following three: *objects*, *morphisms* (or +*arrows*) and a *composition law* bearing well-defined properties. + +Intuitively, any mathematical object could constitute an **object** in a +category. Category theory classes often provide sets as the most +intuitive example of objects; that is, any set is an object in the +category of sets. Note that the categorical view doesn't necessarily +care about how an object is *defined*, but rather about its properties +in relation to the given category's arrows and the overall category's +structure. Formally, given a category $\mathcal{C}$, we can denote its +set of objects as $\text{Ob}(\mathcal{C})$. + +Also intuitively, any mapping between two objects could constitute an +**arrow** in a category. The canonical example here is represented by +functions, i.e. mappings between sets, but many other binary relations +fit this description. An interesting example is that of +[partially-ordered sets][posets]. Formally, for a given category +$\mathcal{C}$ and two objects $A, B \in \text{Ob}({\mathcal{C}})$, +$\text{Hom}_{\mathcal{C}}(A, B)$ denotes the set of arrows from $A$ to +$B$; however, the function notation $\forall f, f : A \rightarrow B$ is +also often used. + +Finally, **composition** is denoted using the "$\circ$" operator or +juxtaposition, and it represents a binary operation on two arrows in a +category. Intuitively, one may see composition similarly to function +composition: given a category $\mathcal{C}$, three arbitrary objects $A, +B, C \in \text{Ob}(\mathcal{C})$ two arrows $f \in +\text{Hom}_{\mathcal{C}}(A, B)$ and $g \in \text{Hom}_{\mathcal{C}}(B, +C)$, then there exists an arrow $h \in \text{Hom}_{\mathcal{C}}(A, C)$, +where $h \equiv g \circ f$. A good intuition is that the "path" from $A$ +to $C$ could be represented as another arrow in $\mathcal{C}$. + +Composition is *associative*; that is, given $f : A \rightarrow B$, $g : +B \rightarrow C$ and $h : C \rightarrow D$, then: + +$(h \circ g) \circ f \equiv h \circ (g \circ f)$ + +Intuitively, this tells us that composition "paths" are unique and that +the order of application of composition doesn't matter. + +Additionally, every object has an associated *identity* arrow; $\forall +A \in \text{Ob}(\mathcal{C})$, then: + +$\exists 1_{A} \in \text{Hom}_{\mathcal{C}}(A, A)$ + +which is invariant under composition. That is, $\forall A, B \in +\text{Ob}(\mathcal{C})$, $\forall f : A \rightarrow B$, + +$1_{B} \circ f = f \circ 1_{A} = f$. + +These are all the elements defining a category. Intuitively, they +naturally apply to sets and functions, giving rise to the category of +sets, denoted **Set**: all sets are objects and all functions are +arrows; functions may be composed associatively and every set has an +identity function. + +There are other examples of categories in the world of mathematics and +computer science, which I advise you to explore on your own. The +concepts of *functors* and *natural transformations* are also +fundamental to category theory, but I will skip them for now due to lack +of space. I will instead leave the remainder of this essay to a more +interesting example and attempt to model version control systems as +categories. This, to my knowledge, provides a new perspective on the +subject, so I'm hoping it will prove to be interesting and maybe even a +bit challenging. + +## Example: The Git category + +Those of you who are coming from software engineering should be familiar +with version control systems (VCS). VCS have been devised as +collaborative tools between programmers who want to share code and have +a means to keep track of changes in the code base of some particular +piece of software. They remain crucial to software development, although +nowadays technical people are using them to maintain all sorts of other, +usually text-based projects such as papers or web sites. The popularity +of [GitHub][github] has also drawn less technical people to this world +of programming, so everyone and their dog can keep a public project +nowadays. + +One particular case of version control system are distributed version +control systems (DVCS). All VCS maintain a *repository* where code is +stored and where the entire history of a project is maintained as a set +of *commits*. In particular, DVCS state that every contributor to a +project has their own copy of the repository offline, and they can keep +their changes in sync with a remote repository by *pushing* their local +copy. We're not particularly interested in this aspect at the moment, +but it's interesting to note that our categorical model should also +apply to distributed systems. + +Let's take the Linux kernel as an example: Linux is kept under version +control using [Git][git]. It has multiple branches and forks (remote +copies of a repository) and the code base of the kernel changes as new +commits are added to the remote repository. The code is therefore in a +particular **state** at a given point in time and its state changes with +each commit, usually by applying a patch, or a **diff**, which holds as +information the "difference", in lines of code (LOC) added or deleted, +between the old state and the new one. So far, so good. + +Given that there are many possible modifications that could arise from a +given state, the code might diverge into multiple **branches** which +will later need to be **merged** or **rebased**. I won't go into detail +regarding these concepts, but they should nevertheless prove to be +interesting from a categorical point of view. For now we assume that the +repository goes through a list (as opposed to a graph) of states as it +changes, each change, or set of changes, being marked by a diff. + +Intuitively, it should be fairly obvious that repository states can be +viewed as objects in a category: assuming for example that the commit +hashes in a Git repository are unique[^7], each hash marks the +identifier of a "version" of the code in that repository. If we wanted +to prove an isomorphism between code revisions and mathematical sets, we +would intuitively see each revision as a set comprising arbitrary +strings, i.e. the actual code. + +Also intuitively enough, we could look at commit diffs in the same way +we look at a categorical arrow, each diff providing a mapping between +two states in the same way a function provides a mapping between two +sets. For example, in git, this difference is provided in terms of lines +added and removed from a certain code base[^8]. + +This representation gives rise to a small complication. In practice +there is usually more than one way to go from one revision to +another. Given for example a certain code base upon which various +modifications have been made, the developer may choose to either create +a big commit containing all the changes, or various smaller commits, +each comprising a unit of their work[^9]. For the sake of making our +model simpler, we can define a "minimal commit" unit, represented by the +removal or addition of a certain line in a code base. + +We also note that commit diffs are composable most of the +time[^10]. Given two successive commits, one may represent them as a +single commit, e.g. by [squashing][git-squash] them in Git, or by simply +applying git-diff between two commit hashes. This is fortunate for us, +as it allows us to represent a possible commit as a chain of +compositions of multiple "minimal commits". The possible compositions +are conceptually very similar to a [Hasse diagram][hasse-diagram], +which, interestingly enough, provides an analogy between commits and +posets. + +Finally, we can look at the empty diff, i.e. the diff with no additions +and no removals, as the canonical representation of an identity +arrow. Git doesn't actually allow empty commits, given that the new +generated repository state would be (needlessly) identical to the old +one, but we can model them anyway, as we know for sure that a git-diff +between an arbitrary commit hash and itself will always be empty. + +From all the above emerges the Git category. The usefulness of this +representation is a whole different problem, but I am guessing that +various operations, e.g. merging, rebasing, defining submodules or other +useful operations that haven't been yet designed into state of the art +DVCS, can be represented as monadic actions. This of course would +involve answering deeper questions, such as what is an endofunctor in +the Git category, but for the sake of brevity we will stop this train of +thought here. + +## Exercise: The Blockchain category, analogy with DVCS + +The [blockchain][blockchain] is a database design coming from +Bitcoin[^11]. Although the idea was conceived specially for implementing +a new form of [representing money][infrastructure-iii], its uses may +theoretically go [beyond that][infrastructure-iv], into other +distributed systems and applications. + +Simply put, the blockchain is a distributed chain of transactions. It is +distributed in the sense that all the participants, e.g. in the Bitcoin +system, should hold a copy of it. It contains transactions, that is, +statements that a certain piece of information, e.g. money in Bitcoin's +case, is transferred from one participant to the other, in the broad +sense that a "participant" is the same thing as an +account. Transactions, and more specifically parent transactions, are +identified by their hashes. + +There is an immediate analogy between VCS and blockchains. The +categorical likeness of the two follows from that directly: in both +cases, system states are objects and transitions between states are +arrows; in both cases, arrow composition is representable and both allow +the existence of a conceptual identity transaction. This shows that the +architectural differences between the two are very few. + +The design and implementation differences are in the +details. Transactions are inserted in the blockchain by a consensus +protocol; in Git, the policy for insertion is determined by the +computing systems where the bare repositories are stored. Git +transactions are independent of their content, containing anything from +source code to binary data; blockchain transactions have a more +restrictive format, depending on their application. + +In theory one could generalize databases[^12] using categories. These +examples show that category theory is or could be, among other +mathematical abstractions, very useful to defining software both +architecturally and at the implementation level. Given that software +developers are faced with the pain of building robust and/or resilient +systems in a context where software verification and specification +doesn't scale, such abstractions are (arguably) needed now more than +ever. + +[^1]: Awodey, Steve. Category theory. Vol. 49. Oxford University Press, + 2006. + +[^2]: [The Holy Trinity][trinitarianism] + +[^3]: [Typeclassopedia][typeclassopedia] + +[^4]: In the broadest sense of the word "science", that coming from its +Latin root, where its meaning overlaps with that of "knowledge". + +[^5]: [Russell's paradox][russell], for example. + +[^6]: Univalent Foundations Program. [Homotopy Type Theory: Univalent +Foundations of Mathematics][hott]. Univalent Foundations, 2013. + +[^7]: Which, by the way, they aren't. Fortunately the basic properties +of the SHA-1 hash make collisions [highly improbable][sha-1-git], and in +theory one could devise a (D)VCS commit addressing scheme that +completely avoids this problem. + +[^8]: I am deliberately avoiding to see repositories as collections of +files, as this would make our definition a lot more complicated. + +[^9]: This is not an easy problem, as seen in [Commit Often, Perfect Later, +Publish Once][git-best-practices]. + +[^10]: There is an interesting mention to be made here regarding merge +conflicts. In mathematical terms, this only tells us that the "minimal +diff" doesn't provide a full mesh of mappings between repository states. + +[^11]: Nakamoto, Satoshi. "[Bitcoin: A peer-to-peer electronic cash +system.][bitcoin]" Consulted 1.2012 (2008): 28. + +[^12]: Transactions are of particular interest to us in this post, but other +aspects such as relational algebra could be seen as a particular case of +categories. See "[Category Theory as a Unifying Database Formalism][database]" +for more details. + +[bricks1]: http://lucian.mogosanu.ro/bricks/o-introducere-usor-neobisnuita-in-domeniul-arhitecturii-software/ +[trinitarianism]: http://existentialtype.wordpress.com/2011/03/27/the-holy-trinity/ +[typeclassopedia]: https://www.haskell.org/haskellwiki/Typeclassopedia +[russell]: http://en.wikipedia.org/wiki/Russell%27s_paradox +[hott]: http://homotopytypetheory.org/book/ +[posets]: http://en.wikipedia.org/wiki/Partially_ordered_set +[github]: https://github.com/ +[git]: http://git-scm.com/ +[sha-1-git]: http://git-scm.com/book/es/v2/Git-Tools-Revision-Selection +[git-best-practices]: https://sethrobertson.github.io/GitBestPractices/ +[git-squash]: http://git-scm.com/book/en/v2/Git-Tools-Rewriting-History#Squashing-Commits +[hasse-diagram]: http://mathworld.wolfram.com/HasseDiagram.html +[blockchain]: https://en.bitcoin.it/wiki/Block_chain +[bitcoin]: https://bitcoin.org/bitcoin.pdf +[infrastructure-iii]: /posts/y01/027-bitcoin-as-infrastructure-iii.html +[infrastructure-iv]: /posts/y01/031-bitcoin-as-infrastructure-iv.html +[database]: http://math.mit.edu/~dspivak/informatics/notes/unorganized/PODS.pdf diff --git a/posts/y02/043-on-the-failure-of-marketing.markdown b/posts/y02/043-on-the-failure-of-marketing.markdown new file mode 100644 index 0000000..630bc27 --- /dev/null +++ b/posts/y02/043-on-the-failure-of-marketing.markdown @@ -0,0 +1,159 @@ +--- +postid: 043 +title: On the failure of marketing (and civilization in general) +date: February 14, 2016 +author: Lucian Mogoșanu +tags: asphalt +--- + +"Marketing" is, or should be, in fact a bit of an umbrella term for at +least two or three things. + +Firstly, marketing is, or should be, the science that studies the needs +of the market, or more exactly the needs of the people that make up the +market. This is the so-called "market research": what products do people +*need* and what *can* they (afford to) buy? + +Secondly, marketing is, or should be, a set of techniques for making a market, +or rather the people that make up a market, aware of the existence of some +product, no more, no less. This is roughly the same as what people nowadays +call "public relations". + +It happens, as the history goes, that in the past few decades[^1] a slow +but sure rupture between the term and its meaning occured, among others, +in marketing, and this phenomenon will, I am assuming, continue along +its path towards a slow and painful death. The meaning of marketing has +already inflated, or rather, it has become more and more diluted, as +definitions such as the couple above have shifted more from is to +should-be. + +To illustrate this, we will take a very simple example, that of the +mobile phone[^2]. A first observation would be that nowadays' mobile +phones are no longer phones in the classic sense of the word, such that +their stupid creators[^3] had to pollute the space of ideas with the new +concept of smartphone. And this has been going on and on with the +tablet, phablet and who knows what's next. + +Notice how these new products are not really innovative. Smartphones are +in fact mobile phones with an integrated camera of lower quality than +previous dedicated cameras[^4] and an integrated computer of much less +power[^5] than the average desktop computer, among the other integrated +products, usually of lower quality than their predecessors. Tablets are +bigger smartphones that can pack a bit more hardware, while phablets +are, I don't know, FSM[^6] knows what. The next thing they'll do is try +to put the same thing on the head unit of your car and in your fridge, +in a desperate attempt to mix stuff together in the other new +meta-buzzword called "the Internet of Things". + +What's more outrageous is that the mobile phone has an artificially +induced lifespan of about one to two years[^7]. That is because most +modern organizations impose themselves this magical thing called +time-to-market, which means that a given product must imperatively be +released until some given date. It doesn't matter that it's unusable, +that it has bugs or that +[software engineering is a myth][software-engineering], they'll want it +out by then and the armies of employees will have to work their asses +off for that. That is, until the next iteration, when they'll ship with +some other useless "features" and a set of new, shiny handicaps that'll +make your life a nightmare. And I thought it was now long established +that the only product worth buying once a year was the calendar, as per +the ol' communist centralized planned economy model. + +

⁂

+ +Although it doesn't look like it on a first glance, marketing is failing +because it doesn't inform people of the existence of things that they +need to buy. What it does instead is to aggressively lure them into +wanting, that is, into believing that they need to buy a certain +product, regardless of whether they actually do; or, more importantly, +not. + +How product owners do that is a whole different story. Branding is in +fact not so harmful as one might believe. The introduction of jargon up +to saturation is however a great source of confusion for clients, who +don't feel safe delving into technical details, and thus they're given +some weird term to cater to their naïveté. Returning to the smartphone +example, tell me what Corning Gorilla Glass *actually* means and you win +a prize. No, you don't know, you just trust[^8] what you're told, and +the producers could give you a piece of post-processed horse manure as +far as they're concerned, you'll still buy it. + +This, combined with 24/7 mass propaganda are *the* things that make the +market go round, "tech start-ups" gain billions of fake dollars[^9] and +pop stars chill with their homies in their cribs. + +Now, why they do that is yet another different story. They do it because +it's easy, first and foremost. It didn't use to be easy back in the day, +but it's gotten progressively so as the generations got dumber[^10] and +the dumb taught their children to be even dumber, so that they just +returned to shopping shortly after the airplanes took down at least a +part of the non-dumbness that was left in this otherwise dumb +"civilized" world. + +

⁂

+ +Of course, "it's not marketing's fault"[^11] that marketing is failing, +or has failed. The fault -- not a moral fault, but a deep, technical +fault, in the sense of "failure" -- lies in a culture who found it +easier to manipulate adults than to educate their children properly, +where memes, tropes and quotes taken out of context hold more value than +a book and where one must "do what they enjoy"[^12]. + +The net effect of this marketing that is not a marketing is +[post-religion][post-religion], transitioning into full-blown +[fundamentalism][religiousness]. + +Still think I'm full of shit? Here's what: take a popular video on +YouTube, preferably one that you also like; look in the comments +section, but promise you're going to read it in its entirety. If you +don't see anything wrong with what's going on there, then there's the +door, have fun with your Bieber and your tablet and stop wasting your +time and my bandwidth. + +[^1]: About roughly the same time as my age. Is this a coincidence? I +have no idea. + +[^2]: Although any product would do. Really. Go ahead, choose +one. You'll be surprised by how most things have been twisted into +useless junk by today's "marketing". + +[^3]: Yes, I am looking at you, rotten Steve Jobs. + +[^4]: Although the gap between the two has narrowed and it continues to + do so. + +[^5]: Not in terms of *raw* computing power, but in terms of what -- and +this is a very broad "what" -- its master can do with it. You don't even +own your smartphone, so you can only use it for whatever "apps" your +master has designed for you. Oh, and the gap between *these* two will +only continue to widen. Just look at your +[average mobile operating system][android]. + +[^6]: Flying Spaghetti Monster. + +[^7]: Nobody cares of the poor hardware. Most sane people can and will +still make use of that old Nokia 3310, *and* break someone's head with +it in self-defense. There, integration! + +[^8]: The takeaway message here is that trust does indeed mean +something, only not on the mass market. No, not when you're one of the +billion clueless consumers. So whatever you'd say, they tricked you into +buying their latest and greatest. + +[^9]: Don't tell me you thought WhatsApp are *really* worth that +much. Well, you'll be surprised, sooner rather than later. + +[^10]: Or maybe "the generations got dumber" is just bias? It might be, +but this is a story for another time. + +[^11]: On the same note as "information wants to be free". + +[^12]: I'm probably a hedonist at least as much as anybody else, but the +question is: if you look around you, can you easily spot the things that +you don't enjoy? And moreover, what are you going do to purge them out +of your life? Starting, say, yesterday. + +[android]: /posts/y02/03f-android-the-bad-and-the-ugly.html +[software-engineering]: /posts/y02/03c-the-myth-of-software-engineering.html +[post-religion]: /posts/y00/018-on-post-religion.html +[religiousness]: /posts/y01/034-the-transition-back-into-religiousness.html