Scrap Your Constructors: Church Encoding Algebraic Types
Algebraic data types are one of Haskell’s best known and most useful features. ADTs serve as a powerful data modeling tool, and along with pattern matching and totality checking, they enable concise, readable, and checkable code. However, there’s another way to represent data in Haskell (or indeed anything else that looks a bit like the lambda calculus) that is equally powerful and in some cases preferable to ADTs.
Church Encoded Booleans in the Lambda Calculus
Church encoding uses functions (and only functions) to represent data. Since
functions are all we’ll need to represent data, we’ll use the untyped lambda
calculus for our first example. Before we jump in, a brief note on notation.
Informally, imagine writing Haskell with nothing but lambdas… Now you’re
writing in the lambda calculus, although we’ll use the conventional mathematical
notation in our examples. The lambda calculus syntax for what one would write as
\f x y -> f y x
in Haskell is \(\lambda f . \lambda x . \lambda y . f\ y\ x\).
Those dots are merely delimiters, not function composition. We’ll allow
top-level assignment for convenience, flip = \f x y -> f y x
is morally
equivalent to
\(\operatorname{flip} = \lambda f . \lambda x . \lambda y . f\ y\ x\). Most
presentations of the lambda calculus only allow for functions of a single
argument. Just think of this as making currying explicit.
Informally, Church encoding works by representing a value with a function that
can handle any possible case that could occur for a value of that type. In the
simple untyped lambda calculus, it doesn’t make much any sense to speak of a
value’s “type,” but the informal metaphor is useful. These functions work a bit
like a weird case
expression; a value is just a function that takes as
arguments the right-hand-sides of the patterns in the case, and the function
simply “chooses” a pattern by reducing to one of its arguments.
There are two choices for Boolean values, so our Church encoded Booleans will be functions of two arguments. Let’s (arbitrarily, but conventionally) use the first argument to represent the true case and the second argument to represent the false case:
\[ \begin{align} \operatorname{true} &= \lambda t . \lambda f . t \\ \operatorname{false} &= \lambda t . \lambda f . f \end{align} \]
With these definitions of true and false, it’s easy to implement a combinator for if-then-else:
\[ \operatorname{if} = \lambda c . \lambda t . \lambda f . c\ t\ f \]
Here \(c\) is our discriminating Boolean value, \(t\) is the expression \(\operatorname{if}\) reduces to if \(c = \operatorname{true}\), and \(f\) is the expression \(\operatorname{if}\) reduces to if \(c = \operatorname{false}\). Working with Church encoded Booleans, conditionals are just function application. This isn’t terribly useful on its own; let’s implement some more more Boolean operators as combinators:
\[ \begin{align} \operatorname{and }&= \lambda x . \lambda y . \lambda t . \lambda f . x\ (y\ t\ f)\ f \\ \operatorname{or} &= \lambda x . \lambda y . \lambda t . \lambda f . x\ t\ (y\ t\ f) \\ \operatorname{not} &= \lambda x . \lambda t . \lambda f . x\ f\ t \end{align} \]
To keep things straight, think of the first and second arguments to \(x\) and \(y\) in these definitions as pattern matching on true and false respectively. In order for \(\operatorname{and}\) to reduce to \(\operatorname{true}\), \(x\) must be equal to \(\operatorname{true}\) and \(y\) must be equal to \(\operatorname{true}\). The expression for the first argument of \(x\) is the \(\operatorname{true}\) case for \(x\), so all we have to do is check that \(y\) is also equal to \(\operatorname{true}\). To do that, we simply apply \(y\). Reason through \(\operatorname{or}\) yourself to check that it indeed implements logical conjunction over our definitions of \(\operatorname{true}\) and \(\operatorname{false}\). Notice that \(\operatorname{not}\) is equal to \(\operatorname{flip}\) from the notation example.
Since a Church encoded Boolean is a function of two arguments, \(\operatorname{and}\), \(\operatorname{or}\), \(\operatorname{not}\) as we’ve written them above are meant to be partially applied. Since \(\operatorname{true}\) always returns its first argument and \(\operatorname{false}\) always returns its second argument, we can write these more succinctly as:
\[ \begin{align} and &= \lambda x . \lambda y . x\ y\ x \\ or &= \lambda x . \lambda y . x\ x\ y \end{align} \]
Let’s check these definitions with GHCi. If you run ghci -XNoImplicitPrelude
we can get away with using the same names (except for \(\operatorname{if}\)).
import Prelude
= \t f -> t
true = \t f -> f
false
= b True False toBool b
Since there’s no meaningful Show
instance for functions, we’ll use toBool
to
check our answers1.
= \b t f -> b t f
iff
-- Equivalently:
= id
iff
and = \x y t f -> x (y t f) f
or = \x y t f -> x t (y t f)
not = \x t f -> x f t
-- Equivalently:
not = flip
true
and false
should behave exactly as one would expect:
>>> toBool (true `and` false)
False
>>> toBool (true `and` true)
True
>>> toBool (not (false `or` true))
False
Note that whether or not a Church encoded Boolean represents true or false is
simply whether or not the representative function returns its first or second
argument. Compare the first definitions of and
and or
above with these
definitions using plain Haskell Bool
s and case
expressions:
and x y = case x of True -> (case y of True -> True
False -> False)
False -> False
or x y = case x of True -> True
False -> (case y of True -> True
False -> False)
Church Encoded Maybe
data Maybe a = Nothing
| Just a
A Maybe
value has two “choices,” just like Booleans, so our Church encoded
maybe values will also be functions of two arguments:
\[ \begin{align} \operatorname{nothing} &= \lambda n . \lambda j . n \\ \operatorname{just} &= \lambda x . \lambda n . \lambda j . j\ x \end{align} \]
\(\operatorname{nothing}\) works just like \(\operatorname{true}\) or
\(\operatorname{false}\) in the Boolean example, but \(\operatorname{just}\) assumes
that the term passed in for the \(j\) case is a function. This is analogous to
pattern matching on Just
bringing a variable into scope. Let’s implement some
functions from the Data.Maybe
module:
\[ \begin{align} \operatorname{isNothing} &= \lambda m . m\ \operatorname{true}\ (\lambda x . \operatorname{false}) \\ \operatorname{isJust} &= \lambda m . m\ \operatorname{false}\ (\lambda x. \operatorname{true}) \\ \operatorname{fromJust} &= \lambda m . m\ \bot\ (\lambda x . x) \\ \operatorname{maybe} &= \lambda d . \lambda f . \lambda m . m\ d\ f \end{align} \]
\(\bot\) (pronounced “bottom”) is morally equivalent to undefined
in Haskell,
representing a divergent computation. Notice that \(\operatorname{maybe}\) is
simply function application, just like \(\operatorname{if}\) from the Boolean
example. Both of these operations are nothing more than total pattern matches on
their respective types. Let’s drop back into Haskell and implement these
combinators. We’ll define a type for Church encoded maybes:
{-# LANGUAGE RankNTypes #-}
module ChurchMaybe where
newtype ChurchMaybe a =
ChurchMaybe { runChurchMaybe :: forall r. r -> (a -> r) -> r }
The forall
bit is important. We’re explicitly stating that our functions
representing maybe values have to work for any possible result type r
. When
pattern matching on ADTs, the type being matched on has no bearing on the type
of the right hand side of the =
or ->
2. Equipped with forall
, we
can implement the simplified Boolean combinator definitions:
and :: (forall a. a -> a -> a)
-> (forall a. a -> a -> a)
-> (forall a. a -> a -> a)
and = \x y -> x y x
or :: (forall a. a -> a -> a)
-> (forall a. a -> a -> a)
-> (forall a. a -> a -> a)
or = \x y -> x x y
Without the explicit forall
, GHC would try (and fail) to unify
a ~ a -> a -> a
. On with the maybe combinators:
nothing :: ChurchMaybe a
= ChurchMaybe $ \n _ -> n
nothing
just :: a -> ChurchMaybe a
= ChurchMaybe $ \_ j -> j x
just x
isNothing :: ChurchMaybe a -> Bool
ChurchMaybe m) = m True (const False)
isNothing (
isJust :: ChurchMaybe a -> Bool
ChurchMaybe m) = m False (const True)
isJust (
fromJust :: ChurchMaybe a -> a
ChurchMaybe m) = m (error "fromJust") id
fromJust (
maybe :: b -> (a -> b) -> ChurchMaybe a -> b
maybe d f (ChurchMaybe m) = m d f
These functions should behave just like their Data.Maybe
counterparts, without
a single pattern match in sight! (well, except for newtype
unwrapping, but
that doesn’t count). Let’s move on to the Maybe
class instances. We’ll define
Eq
first:
instance Eq a => Eq (ChurchMaybe a) where
ChurchMaybe x) == (ChurchMaybe y) =
(True (const False)) (y False . (==)) x (y
Here there are four branches in our “pattern matches.” The first argument to x
is the nothing
case for the first (==)
argument. In this case, if y
is
also nothing
then (==)
reduces to True
, otherwise it reduces to False
.
The second argument to x
is the just
case for the first (==)
argument. In
this case, if y
is nothing
then (==)
simply reduces to False
. y
in
this context has the type forall r a. Eq a => r -> (a -> r) -> r
, so y False
has type forall a. Eq a => (a -> Bool) -> Bool
. The second argument of x
needs to have the type forall a. Eq a => a -> Bool
, so we compose
(==) :: forall a. Eq a => a -> a -> Bool
with y False
. The equivalent
eta abstracted definition:
ChurchMaybe x) == (ChurchMaybe y) = x (y True (\y' -> False))
(-> y False (\y' -> x' == y')) (\x'
The Ord
instance has a similar structure. Show
follows straightforwardly
from the corresponding Maybe
definition (we’ll omit the Read
instance for
now):
instance Ord a => Ord (ChurchMaybe a) where
compare (ChurchMaybe x) (ChurchMaybe y) =
EQ (const LT)) (y GT . compare)
x (y
instance Show a => Ord (ChurchMaybe a) where
show (ChurchMaybe m) = m "Nothing" (("Just " ++) . show)
fmap
simply composes a new function with the just
branch, pure
simply
selects the just
case, and (<*>)
has similar structure to (==)
and
compare
:
instance Functor ChurchMaybe where
fmap f (ChurchMaybe x) = ChurchMaybe $ \n j -> x n (j . f)
instance Applicative ChurchMaybe where
pure = ChurchMaybe $ \_ j -> j x
ChurchMaybe f) <*> (ChurchMaybe x) = f nothing (x nothing . (just .)) (
(>>=)
simply substitutes the just
branch with a new function that must
return a new ChurchMaybe
:
instance Monad ChurchMaybe where
pure = return
ChurchMaybe x) >>= f = x nothing f (
foldMap
from the Foldable
class has a similar structure, instead
substituting the just
branch with a new function returning any monoid:
instance Foldable ChurchMaybe where
foldMap f (ChurchMaybe x) = x mempty f
traverse
from Traversable
has the same structure, but also threads
ChurchMaybe
through an applicative functor:
instance Traversable ChurchMaybe where
traverse f (ChurchMaybe x) = x (pure nothing) ((just <$>) . f)
Finally, there’s the Monoid a => Monoid (Maybe a)
instance. This instance
simply provides an “extra” mempty
value to an existing Monoid
instance, for
example:
-- Data.Monoid provides (<>) = mappend
>>> (Just [1]) <> (Just [2])
Just [1, 2]
>>> (Just [1]) <> (Just [])
Just [1]
>>> (Just [1]) <> Nothing
Just [1]
>>> mempty :: Maybe [Int]
Nothing
Here’s the instance for ChurchMaybe
implementing this behavior:
instance Monoid a => Monoid (ChurchMaybe a) where
mempty = nothing
mappend (ChurchMaybe x) (ChurchMaybe y) =
mempty just)
x (y <$> just <*> ((just .) . mappend)) (y
If x
is nothing
, then mappend
reduces to mempty
if y
is nothing, and
wraps the underlying monoid in just
if y
is just
. Look carefully at the
second case. It’s defined applicatively, but which Applicative
instance are we
using? No peeking…
That’s right, it’s the instance for ((->) r)
. To see how this works, let’s
completely eta abstract that expression:
-> y (just x') (\y' -> just (mappend x' y'))) (\x'
Both arguments to y
are functions of x'
, so we can use the (->)
instances
to eliminate the outer lambda:
<$> just <*> (\x' -> (\y' -> just (mappend x' y'))) y
Now we can eta reduce the second y
argument, which has the type
forall a. Monoid a => a -> a -> ChurchMaybe a
. mappend
has the type
forall a. Monoid a => a -> a -> a
, and in this branch of the enclosing
definition we know we’ll be returning just
something. Eliminating y'
first:
<$> just <*> (\x' -> just . mappend x') y
(just .)
has the type (a -> b) -> a -> ChurchMaybe b
. If we let b = a
and
treat mappend
as having the type forall a. Monoid a => a -> (a -> a)
, then
we can compose (just .)
with mappend
, yielding:
<$> just <*> ((just .) . mappend) y
In practice one might not go to such trouble to fully eta reduce an expression, but it’s a great way to practice reasoning about types and flex your lambda calculus muscles. If you just want the pointfree street cred, you can get that automatically. Now try working with Church encoded lists. You could start with this type, but there are actually other ways3.
newtype ChurchList a = ChurchList {
runChurchList :: forall r. r -> (a -> ChurchList a -> r) -> r
}
Church encoded data is more than just a theoretical curiosity; there are situations in which Church encoded types are more convenient or even more performant than the equivalent ADT. Church encoded free monads are an example of the latter. If you’re unfamiliar with free monads and their applications I’d highly recommend Gabriel Gonzalez’s introduction. The type providing free monads is typically defined this way:
data Free f a = Pure a
| Free (f (Free f a))
liftF :: Functor f => f a -> Free f a
= Free . fmap pure liftF
A Functor
instance for this type must traverse the entire tree:
instance Functor f => Functor (Free f) where
fmap f (Pure x) = Pure (f x)
fmap f (Free f) = Free ((fmap f) <$> f)
And so must (<*>)
and (>>=)
:
instance Functor f => Applicative (Free f) where
pure = Pure
Pure f <*> Pure x = Pure (f x)
Pure f <*> Free x = Free ((fmap f) <$> x)
Free f <*> x = Free ((<*> x) <$> f)
instance Functor f => Monad (Free f) where
return = Pure
Pure x >>= f = f x
Free x >>= f = Free ((>>= f) <$> x)
This is especially troublesome for (>>=)
, since it is typical for code using
do notation to deeply left-associate the bind operator, which is \(O(n^2)\).
Suppose we have something like this:
data FExp a = GetSomething (Int -> a)
| WithSomething Int (Int -> a)
| DoSomething Int a
deriving (Functor)
getSomething :: Free FExp Int
= liftF $ GetSomething id
getSomething
withSomething :: Int -> Free FExp Int
= liftF . flip WithSomething id
withSomething
doSomething :: Int -> Free FExp ()
= liftF . flip DoSomething ()
doSomething
ohNo :: Free FExp ()
= do
ohNo <- getSomething
x <- withSomething x
y <- withSomething y
z doSomething z
Here the withSomething x
binding must traverse Free GetSomething (...)
, the
withSomething y
binding must traverse
Free GetSomething (\x -> Free WithSomething x (...))
, the withSomething y
binding must traverse
Free GetSomething (\x -> Free WithSomething x (\y -> Free WithSomething y (...)))
…
This can become problematic for large monadic “blocks.”
The Control.Monad.Free.Church
module in Edward Kmett’s
free
provides an alternative Church
encoded variant:
newtype F f a = F { runF :: forall r. (a -> r) -> (f r -> r) -> r }
This type lets us take advantage of fmap
fusion:
fmap f . fmap g == fmap (f . g)
and the associativity of (>>=)
:
>>= f) >>= g == x >>= (\x' -> f x' >>= g) (x
to ensure that transformations on F
trees take place in a single pass. Edward
Kmett’s Free Monads for Less
part 1 and
part 2 go into formal
detail. Bryan O’Sullivan’s
attoparsec library also takes
advantage of Church encoding with its core
Parser
type. Church encoded data plays nicely with continuation passing style and can
be used in certain performance-critical situations to short-circuit explicit
pattern matching.
Module containing this post’s code.
With
-XFlexibleInstances
one could write an instance forShow (a -> a -> a)
.↩︎Ertugrul Söylemez wrote a good introduction to the topic of higher-ranked polymorphism on Oliver Charles’ series 24 Days of GHC Extensions. The respective Haskell Wikibook page also provides some motivating examples.↩︎
I have since learned that I was mistaken in calling these “Church encoded lists.” This is actually Morgensen-Scott encoding. Kwang Yul Seo has a good exposition of Scott encoding in Haskell. A true Church encoded list is the list’s foldr.↩︎