Scrap Your Constructors: Church Encoding Algebraic Types

Travis Whitaker - November 7, 2016

Algebraic data types are one of Haskell’s best known and most useful features. ADTs serve as a powerful data modeling tool, and along with pattern matching and totality checking, they enable concise, readable, and checkable code. However, there’s another way to represent data in Haskell (or indeed any other superset of the lambda calculus) that is equally powerful and in some cases preferable to ADTs.

Church Encoded Booleans in the Lambda Calculus

Church encoding uses functions (and only functions) to represent data. Since functions are all we’ll need to represent data, we’ll use the untyped lambda calculus for our first example. Before we jump in, a brief note on notation. Informally, imagine writing Haskell with nothing but lambdas… Now you’re writing in the lambda calculus, although we’ll use the conventional mathematical notation in our examples. The lambda calculus syntax for what one would write as \f x y -> f y x in Haskell is \(\lambda f . \lambda x . \lambda y . f\ y\ x\). Those dots are merely delimiters, not function composition. We’ll allow top-level assignment for convenience, flip = \f x y -> f y x is morally equivalent to \(\operatorname{flip} = \lambda f . \lambda x . \lambda y . f\ y\ x\). Most presentations of the lambda calculus only allow for functions of a single argument. Just think of this as making currying explicit.

Informally, Church encoding works by representing a value with a function that can handle any possible case that could occur for a value of that type. In the simple untyped lambda calculus, it doesn’t make much any sense to speak of a value’s “type,” but the informal metaphor is useful. These functions work a bit like a weird case expression; a value is just a function that takes as arguments the right-hand-sides of the patterns in the case, and the function simply “chooses” a pattern by reducing to one of its arguments.

There are two choices for Boolean values, so our Church encoded Booleans will be functions of two arguments. Let’s (arbitrarily, but conventionally) use the first argument to represent the true case and the second argument to represent the false case:

\[ \begin{align} \operatorname{true} &= \lambda t . \lambda f . t \\ \operatorname{false} &= \lambda t . \lambda f . f \end{align} \]

With these definitions of true and false, it’s easy to implement a combinator for if-then-else:

\[ \operatorname{if} = \lambda c . \lambda t . \lambda f . c\ t\ f \]

Here \(c\) is our discriminating Boolean value, \(t\) is the expression \(\operatorname{if}\) reduces to if \(c = \operatorname{true}\), and \(f\) is the expression \(\operatorname{if}\) reduces to if \(c = \operatorname{false}\). Working with Church encoded Booleans, conditionals are just function application. This isn’t terribly useful on its own; let’s implement some more more Boolean operators as combinators:

\[ \begin{align} \operatorname{and }&= \lambda x . \lambda y . \lambda t . \lambda f . x\ (y\ t\ f)\ f \\ \operatorname{or} &= \lambda x . \lambda y . \lambda t . \lambda f . x\ t\ (y\ t\ f) \\ \operatorname{not} &= \lambda x . \lambda t . \lambda f . x\ f\ t \end{align} \]

To keep things straight, think of the first and second arguments to \(x\) and \(y\) in these definitions as pattern matching on true and false respectively. In order for \(\operatorname{and}\) to reduce to \(\operatorname{true}\), \(x\) must be equal to \(\operatorname{true}\) and \(y\) must be equal to \(\operatorname{true}\). The expression for the first argument of \(x\) is the \(\operatorname{true}\) case for \(x\), so all we have to do is check that \(y\) is also equal to \(\operatorname{true}\). To do that, we simply apply \(y\). Reason through \(\operatorname{or}\) yourself to check that it indeed implements logical conjunction over our definitions of \(\operatorname{true}\) and \(\operatorname{false}\). Notice that \(\operatorname{not}\) is equal to \(\operatorname{flip}\) from the notation example.

Since a Church encoded Boolean is a function of two arguments, \(\operatorname{and}\), \(\operatorname{or}\), \(\operatorname{not}\) as we’ve written them above are meant to be partially applied. Since \(\operatorname{true}\) always returns its first argument and \(\operatorname{false}\) always returns its second argument, we can write these more succinctly as:

\[ \begin{align} and &= \lambda x . \lambda y . x\ y\ x \\ or &= \lambda x . \lambda y . x\ x\ y \end{align} \]

Let’s check these definitions with GHCi. If you run ghci -XNoImplicitPrelude we can get away with using the same names (except for \(\operatorname{if}\)).

import Prelude

true = \t f -> t
false = \t f -> f

toBool b = b True False

Since there’s no meaningful Show instance for functions, we’ll use toBool to check our answers1.

iff = \b t f -> b t f

-- Equivalently:
iff = id

and = \x y t f -> x (y t f) f

or = \x y t f -> x t (y t f)

not = \x t f -> x f t

-- Equivalently:
not = flip

true and false should behave exactly as one would expect:

>>> toBool (true `and` false)
>>> toBool (true `and` true)
>>> toBool (not (false `or` true))

Note that whether or not a Church encoded Boolean represents true or false is simply whether or not the representative function returns its first or second argument. Compare the first definitions of and and or above with these definitions using plain Haskell Bools and case expressions:

and x y = case x of True  -> (case y of True  -> True
                                        False -> False)
                    False -> False

or x y = case x of True  -> True
                   False -> (case y of True  -> True
                                       False -> False)

Church Encoded Maybe

data Maybe a = Nothing
             | Just a

A Maybe value has two “choices,” just like Booleans, so our Church encoded maybe values will also be functions of two arguments:

\[ \begin{align} \operatorname{nothing} &= \lambda n . \lambda j . n \\ \operatorname{just} &= \lambda x . \lambda n . \lambda j . j\ x \end{align} \]

\(\operatorname{nothing}\) works just like \(\operatorname{true}\) or \(\operatorname{false}\) in the Boolean example, but \(\operatorname{just}\) assumes that the term passed in for the \(j\) case is a function. This is analogous to pattern matching on Just bringing a variable into scope. Let’s implement some functions from the Data.Maybe module:

\[ \begin{align} \operatorname{isNothing} &= \lambda m . m\ \operatorname{true}\ (\lambda x . \operatorname{false}) \\ \operatorname{isJust} &= \lambda m . m\ \operatorname{false}\ (\lambda x. \operatorname{true}) \\ \operatorname{fromJust} &= \lambda m . m\ \bot\ (\lambda x . x) \\ \operatorname{maybe} &= \lambda d . \lambda f . \lambda m . m\ d\ f \end{align} \]

\(\bot\) (pronounced “bottom”) is morally equivalent to undefined in Haskell, representing a divergent computation. Notice that \(\operatorname{maybe}\) is simply function application, just like \(\operatorname{if}\) from the Boolean example. Both of these operations are nothing more than total pattern matches on their respective types. Let’s drop back into Haskell and implement these combinators. We’ll define a type for Church encoded maybes:

{-# LANGUAGE RankNTypes #-}

module ChurchMaybe where

newtype ChurchMaybe a = ChurchMaybe { runChurchMaybe :: forall r. r -> (a -> r) -> r }

The forall bit is important. We’re explicitly stating that our functions representing maybe values have to work for any possible result type. When pattern matching on ADTs, the type being matched on has no bearing on the type of the right hand side of the = or ->2. Equipped with forall, we can implement the simplified Boolean combinator definitions:

and :: (forall a. a -> a -> a)
    -> (forall a. a -> a -> a)
    -> (forall a. a -> a -> a)
and = \x y -> x y x

or :: (forall a. a -> a -> a)
   -> (forall a. a -> a -> a)
   -> (forall a. a -> a -> a)
or = \x y -> x x y

Without the explicit forall, GHC would try (and fail) to unify a ~ a -> a -> a. On with the maybe combinators:

nothing :: ChurchMaybe a
nothing = ChurchMaybe $ \n _ -> n

just :: a -> ChurchMaybe a
just x = ChurchMaybe $ \_ j -> j x

isNothing :: ChurchMaybe a -> Bool
isNothing (ChurchMaybe m) = m True (const False)

isJust :: ChurchMaybe a -> Bool
isJust (ChurchMaybe m) = m False (const True)

fromJust :: ChurchMaybe a -> a
fromJust (ChurchMaybe m) = m (error "fromJust") id

maybe :: b -> (a -> b) -> ChurchMaybe a -> b
maybe d f (ChurchMaybe m) = m d f

These functions should behave just like their Data.Maybe counterparts, without a single pattern match in sight! (well, except for newtype unwrapping, but that doesn’t count) Let’s move on to the Maybe class instances. We’ll define Eq first:

instance Eq a => Eq (ChurchMaybe a) where
    (ChurchMaybe x) == (ChurchMaybe y) = x (y True (const False)) (y False . (==))

Here there are four branches in our “pattern matches.” The first argument to x is the nothing case for the first (==) argument. In this case, if y is also nothing then (==) reduces to True, otherwise it reduces to False. The second argument to x is the just case for the first (==) argument. In this case, if y is nothing then (==) simply reduces to False. y in this context has the type forall r a. Eq a => r -> (a -> r) -> r, so y False has type forall a. Eq a => (a -> Bool) -> Bool. The second argument of x needs to have the type forall a. Eq a => a -> Bool, so we compose (==) :: forall a. Eq a => a -> a -> Bool with y False. The equivalent eta abstracted definition:

(ChurchMaybe x) == (ChurchMaybe y) = x (y True (\y' -> False)) (\x' -> y False (\y' -> x' == y'))

The Ord instance has a similar structure. Show follows straightforwardly from the corresponding Maybe definition (we’ll omit the Read instance for now):

instance Ord a => Ord (ChurchMaybe a) where
    compare (ChurchMaybe x) (ChurchMaybe y) = x (y EQ (const LT)) (y GT . compare)

instance Show a => Ord (ChurchMaybe a) where
    show (ChurchMaybe m) = m "Nothing" (("Just " ++) . show)

fmap simply composes a new function with the just branch, pure simply selects the just case, and (<*>) has similar structure to (==) and compare:

instance Functor ChurchMaybe where
    fmap f (ChurchMaybe x) = ChurchMaybe $ \n j -> x n (j . f)

instance Applicative ChurchMaybe where
    pure = ChurchMaybe $ \_ j -> j x
    (ChurchMaybe f) <*> (ChurchMaybe x) = f nothing (x nothing . (just .))

(>>=) simply substitutes the just branch with a new function that must return a new ChurchMaybe:

instance Monad ChurchMaybe where
    pure = return
    (ChurchMaybe x) >>= f = x nothing f

foldMap from the Foldable class has a similar structure, instead substituting the just branch with a new function returning any monoid:

instance Foldable ChurchMaybe where
    foldMap f (ChurchMaybe x) = x mempty f

traverse from Traversable has the same structure, but also threads ChurchMaybe through an applicative functor:

instance Traversable ChurchMaybe where
    traverse f (ChurchMaybe x) = x (pure nothing) ((just <$>) . f)

Finally, there’s the Monoid a => Monoid (Maybe a) instance. This instance simply provides an “extra” mempty value to an existing Monoid instance, for example:

-- Data.Monoid provides (<>) = mappend
>>> (Just [1]) <> (Just [2])
Just [1, 2]
>>> (Just [1]) <> (Just [])
Just [1]
>>> (Just [1]) <> Nothing
Just [1]
>>> mempty :: Maybe [Int]

Here’s the instance for ChurchMaybe implementing this behavior:

instance Monoid a => Monoid (ChurchMaybe a) where
    mempty = nothing
    mappend (ChurchMaybe x) (ChurchMaybe y) = x (y mempty just)
                                                (y <$> just <*> ((just .) . mappend))

If x is nothing, then mappend reduces to mempty if y is nothing, and wraps the underlying monoid in just if y is just. Look carefully at the second case. It’s defined applicatively, but which Applicative instance are we using? No peeking…

That’s right, it’s the instance for ((->) r). To see how this works, let’s fully eta abstract that expression:

(\x' -> y (just x') (\y' -> just (mappend x' y')))

Both arguments to y are functions of x', so we can use the (->) instances to eliminate the outer lambda:

y <$> just <*> (\x' -> (\y' -> just (mappend x' y')))

Now we can eta reduce the second y argument, which has the type forall a. Monoid a => a -> a -> ChurchMaybe a. mappend has the type forall a. Monoid a => a -> a -> a, and in this branch of the enclosing definition we know we’ll be returining just something. Eliminating y' first:

y <$> just <*> (\x' -> just . mappend x')

(just .) has the type (a -> b) -> a -> ChurchMaybe b. If we let b = a and treat mappend as having the type forall a. Monoid a => a -> (a -> a), then we can compose (just .) with mappend, yielding:

y <$> just <*> ((just .) . mappend)

In practice one might not go to such trouble to fully eta reduce an expression, but it’s a great way to practice reasoning about types and flex your lambda calculus muscles. If you just want the pointfree street cred, you can get that automatically. Now try working with Church encoded lists. You could start with this type, but there are actually other ways3.

newtype ChurchList a = ChurchList {
    runChurchList :: forall r. r -> (a -> ChurchList a -> r) -> r

Church encoded data is more than just a theoretical curiosity; there are situations in which Church encoded types are more conventient or even more performant than the equivalent ADT. Church encoded free monads are an example of the latter. If you’re unfamiliar with free monads and their applications I’d highly recommend Gabriel Gonzalez’s intruduction. The type providing free monads is typically defined this way:

data Free f a = Pure a
              | Free (f (Free f a))

liftF :: Functor f => f a -> Free f a
liftF = Free . fmap pure

A Functor instance for this type must traverse the entire tree:

instance Functor f => Functor (Free f) where
    fmap f (Pure x) = Pure (f x)
    fmap f (Free f) = Free ((fmap f) <$> f)

And so must (<*>) and (>>=):

instance Functor f => Applicative (Free f) where
    pure = Pure
    Pure f <*> Pure x = Pure (f x)
    Pure f <*> Free x = Free ((fmap f) <$> x)
    Free f <*> x      = Free ((<*> x) <$> f)

instance Functor f => Monad (Free f) where
    return = Pure
    Pure x >>= f = f x
    Free x >>= f = Free ((>>= f) <$> x)

This is especially troublesome for (>>=), since it is typical for code using do notation to deeply left-associate the bind operator, which is \(O(n^2)\). Suppose we have something like this:

data FExp a = GetSomething (Int -> a)
            | WithSomething Int (Int -> a)
            | DoSomething Int a
            deriving (Functor)

getSomething :: Free FExp Int
getSomething = liftF $ GetSomething id

withSomething :: Int -> Free FExp Int
withSomething = liftF . flip WithSomething id

doSomething :: Int -> Free FExp ()
doSomething = liftF . flip DoSomething ()

ohNo :: Free FExp ()
ohNo = do
    x <- getSomething
    y <- withSomething x
    z <- withSomething y
    doSomething z

Here the withSomething x binding must traverse Free GetSomething (...), the withSomething y binding must traverse Free GetSomething (\x -> Free WithSomething x (...)), the withSomething y binding must traverse Free GetSomething (\x -> Free WithSomething x (\y -> Free WithSomething y (...)))… This can become problematic for large monadic “blocks.”

The Control.Monad.Free.Church module in Edward Kmett’s free provides an alternative Church encoded variant:

newtype F f a = F { runF :: forall r. (a -> r) -> (f r -> r) -> r }

This type lets us take advantage of fmap fusion:

fmap f . fmap g == fmap (f . g)

and the associativity of (>>=):

(x >>= f) >>= g == x >>= (\x' -> f x' >>= g)

to ensure that transformations on F trees take place in a single pass. Edward Kmett’s Free Monads for Less part 1 and part 2 go into formal detail. Bryan O’Sullivan’s attoparsec library also takes advantage of Church encoding with its core Parser type. Church encoded data plays nicely with continuation passing style and can be used in certain performance-critical situations to short-circuit explicit pattern matching.

Module containing this post’s code.

  1. With -XFlexibleInstances one could write an instance for Show (a -> a -> a).

  2. Ertugrul Söylemez wrote a good introduction to the topic of higher-ranked polymorphism on Oliver Charles’ series 24 Days of GHC Extensions. The respective Haskell Wikibook page also provides some motivating examples.

  3. Morgensen-Scott encoding is another technique for embedding ADTs into the lambda calculus. A Haskell list is also equivalent to its foldr.