Scrap Your Constructors: Church Encoding Algebraic Types
Algebraic data types are one of Haskell’s best known and most useful features. ADTs serve as a powerful data modeling tool, and along with pattern matching and totality checking, they enable concise, readable, and checkable code. However, there’s another way to represent data in Haskell (or indeed anything else that looks a bit like the lambda calculus) that is equally powerful and in some cases preferable to ADTs.
Church Encoded Booleans in the Lambda Calculus
Church encoding uses functions (and only functions) to represent data. Since functions are all we’ll need to represent data, we’ll use the untyped lambda calculus for our first example. Before we jump in, a brief note on notation. Informally, imagine writing Haskell with nothing but lambdas… Now you’re writing in the lambda calculus, although we’ll use the conventional mathematical notation in our examples. The lambda calculus syntax for what one would write as \f x y -> f y x
in Haskell is \(\lambda f . \lambda x . \lambda y . f\ y\ x\). Those dots are merely delimiters, not function composition. We’ll allow top-level assignment for convenience, flip = \f x y -> f y x
is morally equivalent to \(\operatorname{flip} = \lambda f . \lambda x . \lambda y . f\ y\ x\). Most presentations of the lambda calculus only allow for functions of a single argument. Just think of this as making currying explicit.
Informally, Church encoding works by representing a value with a function that can handle any possible case that could occur for a value of that type. In the simple untyped lambda calculus, it doesn’t make much any sense to speak of a value’s “type,” but the informal metaphor is useful. These functions work a bit like a weird case
expression; a value is just a function that takes as arguments the right-hand-sides of the patterns in the case, and the function simply “chooses” a pattern by reducing to one of its arguments.
There are two choices for Boolean values, so our Church encoded Booleans will be functions of two arguments. Let’s (arbitrarily, but conventionally) use the first argument to represent the true case and the second argument to represent the false case:
\[ \begin{align} \operatorname{true} &= \lambda t . \lambda f . t \\ \operatorname{false} &= \lambda t . \lambda f . f \end{align} \]
With these definitions of true and false, it’s easy to implement a combinator for if-then-else:
\[ \operatorname{if} = \lambda c . \lambda t . \lambda f . c\ t\ f \]
Here \(c\) is our discriminating Boolean value, \(t\) is the expression \(\operatorname{if}\) reduces to if \(c = \operatorname{true}\), and \(f\) is the expression \(\operatorname{if}\) reduces to if \(c = \operatorname{false}\). Working with Church encoded Booleans, conditionals are just function application. This isn’t terribly useful on its own; let’s implement some more more Boolean operators as combinators:
\[ \begin{align} \operatorname{and }&= \lambda x . \lambda y . \lambda t . \lambda f . x\ (y\ t\ f)\ f \\ \operatorname{or} &= \lambda x . \lambda y . \lambda t . \lambda f . x\ t\ (y\ t\ f) \\ \operatorname{not} &= \lambda x . \lambda t . \lambda f . x\ f\ t \end{align} \]
To keep things straight, think of the first and second arguments to \(x\) and \(y\) in these definitions as pattern matching on true and false respectively. In order for \(\operatorname{and}\) to reduce to \(\operatorname{true}\), \(x\) must be equal to \(\operatorname{true}\) and \(y\) must be equal to \(\operatorname{true}\). The expression for the first argument of \(x\) is the \(\operatorname{true}\) case for \(x\), so all we have to do is check that \(y\) is also equal to \(\operatorname{true}\). To do that, we simply apply \(y\). Reason through \(\operatorname{or}\) yourself to check that it indeed implements logical conjunction over our definitions of \(\operatorname{true}\) and \(\operatorname{false}\). Notice that \(\operatorname{not}\) is equal to \(\operatorname{flip}\) from the notation example.
Since a Church encoded Boolean is a function of two arguments, \(\operatorname{and}\), \(\operatorname{or}\), \(\operatorname{not}\) as we’ve written them above are meant to be partially applied. Since \(\operatorname{true}\) always returns its first argument and \(\operatorname{false}\) always returns its second argument, we can write these more succinctly as:
\[ \begin{align} and &= \lambda x . \lambda y . x\ y\ x \\ or &= \lambda x . \lambda y . x\ x\ y \end{align} \]
Let’s check these definitions with GHCi. If you run ghci -XNoImplicitPrelude
we can get away with using the same names (except for \(\operatorname{if}\)).
import Prelude
= \t f -> t
true = \t f -> f
false
= b True False toBool b
Since there’s no meaningful Show
instance for functions, we’ll use toBool
to check our answers1.
= \b t f -> b t f
iff
-- Equivalently:
= id
iff
and = \x y t f -> x (y t f) f
or = \x y t f -> x t (y t f)
not = \x t f -> x f t
-- Equivalently:
not = flip
true
and false
should behave exactly as one would expect:
>>> toBool (true `and` false)
False
>>> toBool (true `and` true)
True
>>> toBool (not (false `or` true))
False
Note that whether or not a Church encoded Boolean represents true or false is simply whether or not the representative function returns its first or second argument. Compare the first definitions of and
and or
above with these definitions using plain Haskell Bool
s and case
expressions:
and x y = case x of True -> (case y of True -> True
False -> False)
False -> False
or x y = case x of True -> True
False -> (case y of True -> True
False -> False)
Church Encoded Maybe
data Maybe a = Nothing
| Just a
A Maybe
value has two “choices,” just like Booleans, so our Church encoded maybe values will also be functions of two arguments:
\[ \begin{align} \operatorname{nothing} &= \lambda n . \lambda j . n \\ \operatorname{just} &= \lambda x . \lambda n . \lambda j . j\ x \end{align} \]
\(\operatorname{nothing}\) works just like \(\operatorname{true}\) or \(\operatorname{false}\) in the Boolean example, but \(\operatorname{just}\) assumes that the term passed in for the \(j\) case is a function. This is analogous to pattern matching on Just
bringing a variable into scope. Let’s implement some functions from the Data.Maybe
module:
\[ \begin{align} \operatorname{isNothing} &= \lambda m . m\ \operatorname{true}\ (\lambda x . \operatorname{false}) \\ \operatorname{isJust} &= \lambda m . m\ \operatorname{false}\ (\lambda x. \operatorname{true}) \\ \operatorname{fromJust} &= \lambda m . m\ \bot\ (\lambda x . x) \\ \operatorname{maybe} &= \lambda d . \lambda f . \lambda m . m\ d\ f \end{align} \]
\(\bot\) (pronounced “bottom”) is morally equivalent to undefined
in Haskell, representing a divergent computation. Notice that \(\operatorname{maybe}\) is simply function application, just like \(\operatorname{if}\) from the Boolean example. Both of these operations are nothing more than total pattern matches on their respective types. Let’s drop back into Haskell and implement these combinators. We’ll define a type for Church encoded maybes:
{-# LANGUAGE RankNTypes #-}
module ChurchMaybe where
newtype ChurchMaybe a =
ChurchMaybe { runChurchMaybe :: forall r. r -> (a -> r) -> r }
The forall
bit is important. We’re explicitly stating that our functions representing maybe values have to work for any possible result type r
. When pattern matching on ADTs, the type being matched on has no bearing on the type of the right hand side of the =
or ->
2. Equipped with forall
, we can implement the simplified Boolean combinator definitions:
and :: (forall a. a -> a -> a)
-> (forall a. a -> a -> a)
-> (forall a. a -> a -> a)
and = \x y -> x y x
or :: (forall a. a -> a -> a)
-> (forall a. a -> a -> a)
-> (forall a. a -> a -> a)
or = \x y -> x x y
Without the explicit forall
, GHC would try (and fail) to unify a ~ a -> a -> a
. On with the maybe combinators:
nothing :: ChurchMaybe a
= ChurchMaybe $ \n _ -> n
nothing
just :: a -> ChurchMaybe a
= ChurchMaybe $ \_ j -> j x
just x
isNothing :: ChurchMaybe a -> Bool
ChurchMaybe m) = m True (const False)
isNothing (
isJust :: ChurchMaybe a -> Bool
ChurchMaybe m) = m False (const True)
isJust (
fromJust :: ChurchMaybe a -> a
ChurchMaybe m) = m (error "fromJust") id
fromJust (
maybe :: b -> (a -> b) -> ChurchMaybe a -> b
maybe d f (ChurchMaybe m) = m d f
These functions should behave just like their Data.Maybe
counterparts, without a single pattern match in sight! (well, except for newtype
unwrapping, but that doesn’t count). Let’s move on to the Maybe
class instances. We’ll define Eq
first:
instance Eq a => Eq (ChurchMaybe a) where
ChurchMaybe x) == (ChurchMaybe y) =
(True (const False)) (y False . (==)) x (y
Here there are four branches in our “pattern matches.” The first argument to x
is the nothing
case for the first (==)
argument. In this case, if y
is also nothing
then (==)
reduces to True
, otherwise it reduces to False
. The second argument to x
is the just
case for the first (==)
argument. In this case, if y
is nothing
then (==)
simply reduces to False
. y
in this context has the type forall r a. Eq a => r -> (a -> r) -> r
, so y False
has type forall a. Eq a => (a -> Bool) -> Bool
. The second argument of x
needs to have the type forall a. Eq a => a -> Bool
, so we compose (==) :: forall a. Eq a => a -> a -> Bool
with y False
. The equivalent eta abstracted definition:
ChurchMaybe x) == (ChurchMaybe y) = x (y True (\y' -> False))
(-> y False (\y' -> x' == y')) (\x'
The Ord
instance has a similar structure. Show
follows straightforwardly from the corresponding Maybe
definition (we’ll omit the Read
instance for now):
instance Ord a => Ord (ChurchMaybe a) where
compare (ChurchMaybe x) (ChurchMaybe y) =
EQ (const LT)) (y GT . compare)
x (y
instance Show a => Ord (ChurchMaybe a) where
show (ChurchMaybe m) = m "Nothing" (("Just " ++) . show)
fmap
simply composes a new function with the just
branch, pure
simply selects the just
case, and (<*>)
has similar structure to (==)
and compare
:
instance Functor ChurchMaybe where
fmap f (ChurchMaybe x) = ChurchMaybe $ \n j -> x n (j . f)
instance Applicative ChurchMaybe where
pure = ChurchMaybe $ \_ j -> j x
ChurchMaybe f) <*> (ChurchMaybe x) = f nothing (x nothing . (just .)) (
(>>=)
simply substitutes the just
branch with a new function that must return a new ChurchMaybe
:
instance Monad ChurchMaybe where
pure = return
ChurchMaybe x) >>= f = x nothing f (
foldMap
from the Foldable
class has a similar structure, instead substituting the just
branch with a new function returning any monoid:
instance Foldable ChurchMaybe where
foldMap f (ChurchMaybe x) = x mempty f
traverse
from Traversable
has the same structure, but also threads ChurchMaybe
through an applicative functor:
instance Traversable ChurchMaybe where
traverse f (ChurchMaybe x) = x (pure nothing) ((just <$>) . f)
Finally, there’s the Monoid a => Monoid (Maybe a)
instance. This instance simply provides an “extra” mempty
value to an existing Monoid
instance, for example:
-- Data.Monoid provides (<>) = mappend
>>> (Just [1]) <> (Just [2])
Just [1, 2]
>>> (Just [1]) <> (Just [])
Just [1]
>>> (Just [1]) <> Nothing
Just [1]
>>> mempty :: Maybe [Int]
Nothing
Here’s the instance for ChurchMaybe
implementing this behavior:
instance Monoid a => Monoid (ChurchMaybe a) where
mempty = nothing
mappend (ChurchMaybe x) (ChurchMaybe y) =
mempty just)
x (y <$> just <*> ((just .) . mappend)) (y
If x
is nothing
, then mappend
reduces to mempty
if y
is nothing, and wraps the underlying monoid in just
if y
is just
. Look carefully at the second case. It’s defined applicatively, but which Applicative
instance are we using? No peeking…
That’s right, it’s the instance for ((->) r)
. To see how this works, let’s completely eta abstract that expression:
-> y (just x') (\y' -> just (mappend x' y'))) (\x'
Both arguments to y
are functions of x'
, so we can use the (->)
instances to eliminate the outer lambda:
<$> just <*> (\x' -> (\y' -> just (mappend x' y'))) y
Now we can eta reduce the second y
argument, which has the type forall a. Monoid a => a -> a -> ChurchMaybe a
. mappend
has the type forall a. Monoid a => a -> a -> a
, and in this branch of the enclosing definition we know we’ll be returning just
something. Eliminating y'
first:
<$> just <*> (\x' -> just . mappend x') y
(just .)
has the type (a -> b) -> a -> ChurchMaybe b
. If we let b = a
and treat mappend
as having the type forall a. Monoid a => a -> (a -> a)
, then we can compose (just .)
with mappend
, yielding:
<$> just <*> ((just .) . mappend) y
In practice one might not go to such trouble to fully eta reduce an expression, but it’s a great way to practice reasoning about types and flex your lambda calculus muscles. If you just want the pointfree street cred, you can get that automatically. Now try working with Church encoded lists. You could start with this type, but there are actually other ways3.
newtype ChurchList a = ChurchList {
runChurchList :: forall r. r -> (a -> ChurchList a -> r) -> r
}
Church encoded data is more than just a theoretical curiosity; there are situations in which Church encoded types are more convenient or even more performant than the equivalent ADT. Church encoded free monads are an example of the latter. If you’re unfamiliar with free monads and their applications I’d highly recommend Gabriel Gonzalez’s introduction. The type providing free monads is typically defined this way:
data Free f a = Pure a
| Free (f (Free f a))
liftF :: Functor f => f a -> Free f a
= Free . fmap pure liftF
A Functor
instance for this type must traverse the entire tree:
instance Functor f => Functor (Free f) where
fmap f (Pure x) = Pure (f x)
fmap f (Free f) = Free ((fmap f) <$> f)
And so must (<*>)
and (>>=)
:
instance Functor f => Applicative (Free f) where
pure = Pure
Pure f <*> Pure x = Pure (f x)
Pure f <*> Free x = Free ((fmap f) <$> x)
Free f <*> x = Free ((<*> x) <$> f)
instance Functor f => Monad (Free f) where
return = Pure
Pure x >>= f = f x
Free x >>= f = Free ((>>= f) <$> x)
This is especially troublesome for (>>=)
, since it is typical for code using do notation to deeply left-associate the bind operator, which is \(O(n^2)\). Suppose we have something like this:
data FExp a = GetSomething (Int -> a)
| WithSomething Int (Int -> a)
| DoSomething Int a
deriving (Functor)
getSomething :: Free FExp Int
= liftF $ GetSomething id
getSomething
withSomething :: Int -> Free FExp Int
= liftF . flip WithSomething id
withSomething
doSomething :: Int -> Free FExp ()
= liftF . flip DoSomething ()
doSomething
ohNo :: Free FExp ()
= do
ohNo <- getSomething
x <- withSomething x
y <- withSomething y
z doSomething z
Here the withSomething x
binding must traverse Free GetSomething (...)
, the withSomething y
binding must traverse Free GetSomething (\x -> Free WithSomething x (...))
, the withSomething y
binding must traverse Free GetSomething (\x -> Free WithSomething x (\y -> Free WithSomething y (...)))
… This can become problematic for large monadic “blocks.”
The Control.Monad.Free.Church
module in Edward Kmett’s free
provides an alternative Church encoded variant:
newtype F f a = F { runF :: forall r. (a -> r) -> (f r -> r) -> r }
This type lets us take advantage of fmap
fusion:
fmap f . fmap g == fmap (f . g)
and the associativity of (>>=)
:
>>= f) >>= g == x >>= (\x' -> f x' >>= g) (x
to ensure that transformations on F
trees take place in a single pass. Edward Kmett’s Free Monads for Less part 1 and part 2 go into formal detail. Bryan O’Sullivan’s attoparsec library also takes advantage of Church encoding with its core Parser
type. Church encoded data plays nicely with continuation passing style and can be used in certain performance-critical situations to short-circuit explicit pattern matching.
Module containing this post’s code.
With
-XFlexibleInstances
one could write an instance forShow (a -> a -> a)
.↩︎Ertugrul Söylemez wrote a good introduction to the topic of higher-ranked polymorphism on Oliver Charles’ series 24 Days of GHC Extensions. The respective Haskell Wikibook page also provides some motivating examples.↩︎
I have since learned that I was mistaken in calling these “Church encoded lists.” This is actually Morgensen-Scott encoding. Kwang Yul Seo has a good exposition of Scott encoding in Haskell. A true Church encoded list is the list’s foldr.↩︎