From: Lucian Mogosanu <lucian.mogosanu@gmail.com>
Date: Sat, 31 Aug 2013 07:38:01 +0000 (+0300)
Subject: posts: 00d
X-Git-Tag: v0.2~2
X-Git-Url: https://git.mogosanu.ro/?a=commitdiff_plain;h=b04ade0c69bf7289c64dcd5e13d6928b8a23b96b;p=thetarpit.git

posts: 00d
---

diff --git a/posts/y00/00d-composition-operators-pipes.markdown b/posts/y00/00d-composition-operators-pipes.markdown
new file mode 100644
index 0000000..3ba374e
--- /dev/null
+++ b/posts/y00/00d-composition-operators-pipes.markdown
@@ -0,0 +1,193 @@
+---
+postid: 00d
+title: Composition operators: pipes
+excerpt: In which we describe Unix pipes using application and composition.
+author: Lucian MogoÈanu
+date: August 31, 2013
+tags: math, tech
+---
+
+While this post will seem trivial to the average mathematician, it is
+well-aimed at beginning functional programmers. To calm down the former, we
+note that pipes aren't equivalent to (function) composition in the mathematical
+sense.
+
+First, let us see what's composition and what are pipes, and then we'll bring
+them together. I'll illustrate the concepts using mathematical language and
+Bash/Unix Shell where necessary, and Haskell most of the time.
+
+<!--more-->
+
+## Function composition
+
+Assuming we know what a function is, we can define a binary algebraic operator
+$\circ$ that operates on two arbitrary functions $f$ and $g$, yielding a third
+function, $h \equiv f \circ g$. Now, there aren't many ways we can combine
+functions, and it doesn't make sense for us to try doing this without knowing
+the signatures of $f$, $g$ and $h$, so we'll assume the following:
+
+$f : C \rightarrow D$
+
+$g : A \rightarrow B$
+
+where $A$, $B$, $C$ and $D$ are arbitrary sets. However, you have to agree with
+me that we cannot combine functions defined in terms of *purely* arbitrary
+sets. For example, addition is always between well-defined numbers: the
+addition of a natural number and a real number will yield a real number only
+due to the fact that $\mathbb{N} \subset \mathbb{R}$ and will not, generally
+speaking, yield a natural number. There must therefore exist a link between
+some of the sets $A$, $B$, $C$ and $D$. We'll choose $B = C$; we don't know why
+we did this yet, there's no clear intuition at this point, but we do know that
+the result $h$ must also be a function, say $h : E \rightarrow F$.
+
+Let's suppose for a moment that we're really dumb and we read the signature of
+a function as "$A$ goes to $B$" and "$C$ goes to $D$" and so on[^1]. When
+combining two functions, it's pretty natural to apply our $B = C$ restriction,
+i.e. it's pretty ok to say "$A$ goes to ($B = C$) goes to $D$". We have some
+kind of pipeline there, but in fact our pipeline goes from $A$ to $D$, so it's
+once again natural to think of our pipeline as a new function $h : A
+\rightarrow D$. So there it is, the algebraic definition of function
+composition; sort of.
+
+But what's the meaning of "$A$ goes to $B$"? Well, it actually means "you give
+$f$ an element of $A$ and it gives back an element of $B$". The pipeline would
+thus go from $x \in A$ to $y \in B$ (or $y \in C$, whichever you prefer) and
+then $g$ would take $y$ and turn it into a $z \in D$. Formally, we would say
+that:
+
+$z = h(x) = f(g(x))$[^2]
+
+Haskell defines function composition as a (Haskell) function[^3]:
+
+~~~~ {.haskell}
+(.) :: (b -> c) -> (a -> b) -> a -> c
+(.) f g = \ x -> f (g x)
+~~~~
+
+## Unix pipes
+
+Pipes are a lot simpler to define if we rely on shell scripting background.
+They are grounded in the Unix philosophy, from which a lot of simple utilities
+(such as `ls`, `cat` or `grep`) sprung out. To do useful, automatized stuff
+with them, administrators had to have a way to somehow combine them in a
+meaningful manner. For example `tail` prints out only the last few lines of
+some output, making it essentially a filter. We might, for some arbitrary
+reason, only want to see the last few processes outputted by `ps aux`, in which
+case we'll write `ps aux | tail`.
+
+Indeed, we can do stuff that's a lot more complicated. For example, if we want
+to print only the second column of the output, we'll squeeze the repeated
+spaces with `tr -s ' '` and use the space separator to select the second
+column, with `cut -d ' ' -f2`. The final one-liner looks as follows:
+
+~~~~ {.bash}
+$ ps aux | tail | tr -s ' ' | cut -d ' ' -f2
+~~~~
+
+We notice how the pipe (`|`) operator looks very similar to the function
+composition described earlier. A programmer might thus wonder, "how can I use
+the same model in Haskell?". Fortunately, the implementation is only a few
+steps away.
+
+As I mentioned before, pipes and function composition are not equivalent. The
+first obvious difference is of syntactic nature: pipes run backwards, or rather
+composed functions do. A rough approximation of how the above shell command
+would look in Haskell is the following snippet:
+
+~~~~ {.haskell}
+cut ' ' 2 . tr ' ' . tail $ ps "aux"
+~~~~
+
+We observe that while pipes "flow" from left to right[^4], function composition
+runs from right to left. Besides, in Haskell we had to use the application
+(`$`) function on the first argument. That is because `(.)` composes functions,
+while `($)` composes a function with an argument:
+
+~~~~ {.haskell}
+($) :: (a -> b) -> a -> b
+~~~~
+
+Thus we can define a "pipe" function in the following way:
+
+~~~~ {.haskell}
+-- act the same as ($), only left-associative
+set infixl 0 -|
+(-|) :: a -> (a -> b) -> b
+(-|) = flip ($)
+~~~~
+
+and apply it on Haskell statements and functions:
+
+~~~~ {.haskell}
+*Main> 1 + 1 -| (+ 2) -| (* 2)
+8
+*Main> iterate (++ "a") "" -| take 5 -| tail -| filter (\ s -> length s < 4)
+["a","aa","aaa"]
+*Main> let tt = [(False, False), (False, True), (True, False), (True, True)]
+*Main> tt -| map (uncurry (&&))
+[False,False,False,True]
+*Main> tt -| map (uncurry (||))
+[False,True,True,True]
+~~~~
+
+## Taking it further
+
+The keener eyes might have quickly observed that our `(-|)` function is in fact
+very similar to the monadic bind (`>>=`) operator:
+
+~~~~ {.haskell}
+(>>=) :: Monad m => m a -> (a -> m b) -> m b
+~~~~
+
+In fact we can rewrite our previous examples using `(>>=)` and `return`. I will
+include the first example, the rest will be left as an exercise to the reader:
+
+~~~~ {.haskell}
+*Main> return (1 + 1) >>= return . (+ 2) >>= return . (* 2)
+8
+~~~~
+
+or, to illustrate that we run in the [identity][1] functor/monad:
+
+~~~~ {.haskell}
+*Main Data.Functor.Identity> :m +Data.Functor.Identity
+*Main Data.Functor.Identity> let r = return :: a -> Identity a
+*Main Data.Functor.Identity> runIdentity $ r (1 + 1) >>= r . (+ 2) >>= r . (* 2)
+8
+~~~~
+
+Of course, it doesn't stop here. Pipes are implemented as a design pattern in
+the [Pipes][2] Haskell library, where the `(>+>)` operator is used to compose
+pipe objects (see [Control.Pipe][3] for more details). What's interesting is
+that `(>+>)` is implemented as the Arrow `(>>>)`, or a flip of `(<+<)`/`(<<<)`,
+which satisfies the [category laws][4]. Thus `(<+<)` and `(>+>)` can also be
+considered composition operators in the mathematical (algebraic/categorial)
+sense[^5].
+
+A paradigm which focuses entirely on piping (and stacking) is [concatenative
+programming][5]. As a programming style, it is reminiscent of old Reverse
+Polish Notation calculators such as the ones produced by HP in the 1980s. As an
+abstraction tool, it's more well-suited for some cases, while being less
+well-suited for others. We won't go into the details here.
+
+[^1]: Well, we actually kinda do that anyway.
+
+[^2]: From what I remember, mathematicians prefer saying that
+$f \circ g \equiv g(f(x))$. I've modified it for the sake of consistency, but
+it's really the same thing.
+
+[^3]: Things get a wee bit more complicated here. Haskell conventions are
+actually well-grounded in mathematical formalisms. If you're interested in the
+whys and hows and don't know where to start, look up natural transformations.
+Function composition can be regarded as one.
+
+[^4]: This looks more natural to people living in the western world.
+
+[^5]: Unfortunately, I won't expand further on this subject right here, since
+it belongs to Category Theory 101.
+
+[1]: http://hackage.haskell.org/packages/archive/transformers/0.2.0.0/doc/html/Data-Functor-Identity.html
+[2]: http://www.haskell.org/haskellwiki/Pipes
+[3]: http://hackage.haskell.org/packages/archive/pipes/3.3.0/doc/html/Control-Pipe.html
+[4]: http://en.wikipedia.org/wiki/Category_(mathematics)#Definition
+[5]: http://evincarofautumn.blogspot.ro/2012/02/why-concatenative-programming-matters.html