Normal Forms
Haskell uses call-by-need (i.e. “laziness”) as its evaluation strategy. There’s no shortage of voices bellowing out “use strict data types, use lazy functions (or lazy ‘spines’);” “laziness by default was a mistake;” “laziness makes it hard to reason about performance.” Meanwhile, some Haskellers have shown concretely that strictness isn’t always the right default and that laziness is necessary to achieve good performance in some cases.
Laziness frees the programmer from ugly operational semantics hacks like “iterators,” allows for user-defined short-circuiting operations (versus strict languages, where boolean operators must use built-in magic1), and enables higher-order functions to be used ergonomically without performance sacrifices. However, there is one pitfall that is the source of all complaints about lazy evaluation:
Lazy programs will have live heap objects that are not in normal form.
What is “normal form?”
Normalization is Evaluation
GHCi has a command called :sprint
(that’s read like “s-print” and not
“sprint”2) that allows one to observe values as they are, without
performing any evaluation. We can define a list that has all of the natural
numbers:
>>> nats = [0..] :: [Integer]
But no evaluation has happened yet. :sprint
will show a
not-yet-evaluated-thing (a “thunk”) with _
:
>>> :sprint nats
nats = _
We can cause part of the nats
value to be evaluated by asking GHCi to print
some:
>>> take 3 nats
[0,1,2]
And :sprint
shows that just enough of the structure of nats
was evaluated
for GHCi to show what we asked for:
>>> :sprint nats
nats = 0 : 1 : 2 : _
Here we see that a constructor with multiple arguments can be in a state on the
heap where some of the constructor’s arguments have been fully evaluated, and
some have not. In the case of 2 : _
, we have a list cons cell whose contained
element is fully evaluated, but whose next part of the list is not. It’s
possible to end up with :
values in the opposite situation, where the next
part of the list is known but the contained element is not. Suppose we have a
list of all of the Fibonacci numbers:
>>> fibs = 0 : 1 : zipWith (+) fibs (tail fibs) :: [Integer]
Along with a predicate that checks whether or not a list of any type of element has a length of at least three; note that we don’t care about what the list elements are at all:
>>> :{
... atLeastThree :: [a] -> Bool
... atLeastThree (_:_:_:_) = True
... atLeastThree _ = False
... :}
>>> atLeastThree [1, 2]
False
>>> atLeastThree [1, 2, 3]
True
Now let’s ask GHCi if \(\infty \ge 3\):
>>> atLeastThree fibs
True
Let’s see what evaluation had to take place to evaluate atLeastThree fibs
:
>>> :sprint fibs
fibs = 0 : 1 : _ : _
Here we have a value with a mix of “fully evaluated” constructors (e.g. the
literals 0
and 1
we provided), constructors in which some of the arguments
are known (1 : _
), constructors in which no arguments are known (_ : _
), and
values about which nothing at all is known, even the “top-level” constructor
(_
). Note that GHCi does not parenthesize this value very clearly;
0 : (1 : (_ : _))
is more clear.3
“Normalization” refers to the process of applying an abstract rewriting system’s
rules (e.g. the evaluation rules of the lambda calculus) to transform terms. In
this setting, a term in “normal form” is a term which cannot be transformed by
the rules any further; it is, intuitively speaking, “fully evaluated.” When it
comes to Haskell, we can define “normalization” by analogy as “turning thunks
(e.g. _
) into concrete values by doing evaluation”. By extension, we can say
that a Haskell value is in “normal form.4” if there are no thunks left in
its structure, meaning that there’s no more evaluation that can be done.
What precisely is a “concrete value?” Rather than simply saying that it’s a value on which no more evaluation can be done, we can use the following set of rules for determining whether or not a value is in normal form:
- The value’s constructor is determined, e.g.
:sprint
shows us something other than a single_
, like it did in our very first:sprint nats
invocation above. - The constructor’s arguments, if any, are not thunks. For example,
_ : _
doesn’t obey this rule, since both its arguments are thunks. - The constructor’s arguments are in normal form. In other words, these rules must apply to all constructor arguments recursively.
In the case of the infinite values nats
and fibs
used in the definition
above, we can actually never reach normal form, because we’d have to evaluate
forever to do so. In this example, we’ll use a value that we can evaluate to
normal form:
>>> x = (1 + 1, 2 * 2) :: (Integer, Integer)
>>> :sprint x
x = (_,_)
>>> -- Not in normal form yet.
>>> fst x
2
>>> :sprint x
x = (2,_)
>>> -- Still not in normal form yet.
>>> snd x
4
>>> :sprint x
x = (2,4)
>>> -- How it's in normal form.
Weak Head Normal Form
Rather than causing evaluation to occur by having GHCi print values, base
provides a function called seq :: a -> b -> b
that can be used “force”
evaluation. This is a useful intuition, but “forcing” evaluation isn’t exactly
what seq
does, as the documentation explains:
The value of
seq a b
is bottom ifa
is bottom, and otherwise equal tob
. In other words, it evaluates the first argumenta
to weak head normal form (WHNF).seq
is usually introduced to improve performance by avoiding unneeded laziness.
Here, “bottom” just means some “divergent” computation, e.g. something that
would take infinite time to evaluate (f x = f x
) or would throw an exception
(error "slain!"
). This definition is a bit hand-wavy, but is sufficient to
explain what seq
does. In order for seq
to evaluate to “bottom” if its first
argument is bottom, it must evaluate its first argument. However, even though
seq
is introduced to control laziness, it is itself lazy, in that it does not
evaluate the first argument to “normal form” as we defined above. Instead, it
stops once the first of the normal form rules given above is satisfied,
i.e. when it has determined the first argument’s constructor. If that
constructor takes arguments and those arguments happen to be thunks, seq
doesn’t care, and the whole expression evaluates to the second argument.
In the nats
example from before, seq
stops evaluating as soon as it uncovers
the first list cons cell:
>>> nats = [0..] :: [Integer]
>>> :sprint nats
>>> nats = _
>>> seq nats ()
()
>>> :sprint nats
nats = 0 : _
While we’re playing around in GHCi, the second argument to seq
doesn’t matter,
since GHCi will evaluate whatever we ask so that it can print it. When using
seq
in a real program, typically the first argument will be a subexpression in
the second argument, so that the first argument will be evaluated before
evaluation of the containing expression proceeds. A classic example is the
strict left fold, foldl'
foldl' :: (b -> a -> b) -> b -> [a] -> b
= z
foldl' _ z [] :xs) = let z' = f z x
foldl' f z (xin seq z' (foldl' f z' xs)
Plenty of good writing already exists on how to use seq
, namely Simon Marlow’s
Parallel and Concurrent Programming in Haskell (this
chapter
in particular) and HaskellWiki’s article on
folds. Another note: in
practice it tends to be quite tedious to use seq
directly, so instead the
handy bang
patterns
language extension is typically used.
The seq
documentation uses the term “weak head normal form” to describe a
value in a state where its constructor is known, but the constructors’ arguments
may not be known. In The Implementation of Functional Programming Languages,
Simon Peyton Jones gives this more formal definition of weak head normal form:
A lambda expression is in weak head normal form (WHNF) if and only if it is of the form:
\(F \, E_1 \, E_2 \, \ldots \, E_n\)
where \(n \ge 0\)
and either \(F\) is a variable or data object, or \(F\) is a lambda abstraction or built-in function and \((F \, E_1 \, E_2 \, \ldots \, E_n)\) is not a redex for any \(m \le n\).
An expression has no top-level redex if and only if it is weak head normal form.
This extends the definition of weak head normal form to any sort of function
application in general, not simply application of data constructors. You can
think of a “top-level redex” as roughly equivalent to the lone _
that GHCi
would :sprint
when examining totally unevaluated values. The following bit is
quite important:
\((F \, E_1 \, E_2 \, \ldots \, E_n)\) is not a redex for any \(m \le n\)
This means that if \(F \, E_1\), or \(F \, E_1 \, E_2\), or so on, could be reduced, then the whole expression is not in weak head normal form. This is a formal statement of the intuitive notion from before that one must “evaluate until a constructor is determined” to reach WHNF.
For the remainder of this discussion, we will focus on the application of data
constructors, setting aside the application of arbitrary functions. Although we
can use seq
to control when function applications are reduced, we lack any
control over whether or not arbitrary lambda abstractions are in normal
form. The (->)
type is built-in magic in GHC, and it doesn’t make sense to
talk about what its “constructors” are.
deepseq
and NFData
Equipped with workable definitions of normal form and weak head normal form we
can examine (and indeed critique) a popular Haskell package called
deepseq
. This package provides a class called NFData
and a few associated
utility functions. NFData
has a single method, rnf :: a -> ()
that evaluates
its argument to normal form, then returns ()
. The documentation for the
Control.DeepSeq
module notes three useful applications of the class:
- Avoiding resource leaks in lazy IO programs.5
- Ensuring that lazy values containing divergent values do not “leak” outside the scope of their exception handler.6
- Forcing evaluation to occur on one thread that uses a synchronization primitive to send data to another thread.7
In practice, solving these sorts of issues is exactly what deepseq
is used
for. However, in the package documentation there is no discussion of what laws
NFData
instances should obey, nor is there any concrete definition of what
“normal form” means. The closest the documentation gets is mentioning that
“normal form data” is “data structures with no unevaluated components.”
Despite the lack of a solid definition of normal form, we find that almost all
of the NFData
instances provided in the package satisfy the three normal form
rules given earlier. A majority of the instances are for data types with no type
arguments, and with constructors that accept monomorphic arguments. In the case
of types like Bool
, Int
, Double
, etc., weak head normal form implies
normal form, since the data constructors take no arguments. WHNF implies that
the constructor is known, and only the first rule for normal forms is relevant
in the case of constructors with no arguments, so for such types a value in weak
head normal form is in normal form. In the case of monomorphic newtypes like
CInt
, the procedure for evaluating to normal form follows trivially from the
wrapped type.
A handful of the provided instances are structurally recursive. What should the
NFData
instance for lists look like? We can derive the implementation by
examining the normal form rules. By the first rule, we must determine if the
list value’s constructor is :
or []
. If it’s []
we’re done, since that
constructor takes no arguments. By the second rule, if the value’s constructor
is determined to be :
, we must evaluate its arguments. By the third rule, we
must evaluate those arguments to normal form; this means that the type of the
list elements must have an NFData
instance as well. As a corollary to the
third rule, in order to evaluate the list value to normal form, we must walk
down the entire length of the list. We can follow this same derivation procedure
to determine the correct instances for Maybe
, Either
, tuples, Ratio
,
Complex
, and so on.
Where normal forms can become quite confusing and where NFData
instances can
become quite controversial, is the case of types with phantom type variables. By
“phantom type variables,” I mean type constructors that take arguments that do
not appear in any of the data constructors’ arguments. This can lead to some
confusing or counterintuitive NFData
instances, but with the normal form
rules to guide us we can determine what the correct instances should be. Let’s
look at an uncontroversial instance first, the Proxy
type.
Proxy
values are meant for passing type information around at the value level,
and the type definition is quite simple:
data Proxy a = Proxy
Here we have a single constructor with no arguments, and a type constructor with
a single phantom argument. Since the type constructor takes no arguments, by the
first normal form rule, all we need to do is determine the constructor. Here’s
the NFData
instance:
instance NFData (Proxy a) where
Proxy = () rnf
Although I have never required this instance myself, it obeys the normal form
rules, and I’ve never encountered an argument that this instance should be
removed from the deepseq
package. The key difference between this instance and
the others we’ve discussed so far is that the type variable a
does not appear
in the constructor. A critical corollary: Whether or not a value of type Proxy a
is in normal form has nothing to do with any values of type a
. This
statement follows from the normal form rules, and it is absolutely critical to
bear this principle in mind when reasoning about NFData
instances for types
with phantom type variables.
Here’s another uncontroversial instance, the Const
functor. Here’s its
definition:
newtype Const a b = Const { getConst :: a }
In this case, we have one type variable that appears in a data constructor, and
one phantom type variable. From the normal form rules, this means that a
must
have an NFData
instance if we are to write an NFData
instance for Const
,
but b
does not require one, because whether or not a value of type Const a b
is in normal form has nothing to do with any values of type b
. Here is the
NFData
instance:
instance NFData a => NFData (Const a b) where
= rnf . getConst rnf
Let’s consider a trickier case, the Ptr
type. base
provides a class called
Storable
for types that can be read from and written to plain memory
addresses. A value of type Ptr a
is an address that points to some data that
may be read to produce a value of type a
, or that a value of type a
may be
written to. However, the data constructor for Ptr
only contains an Addr#
,
which is GHC’s primitive type for memory addresses, and a
is a phantom type
variable. Indeed, a function castPtr :: Ptr a -> Ptr b
is provided, which
allows us to change the pointer’s type to whatever we wish, just like a C style
cast. Here’s the type definition:
data Ptr a = Ptr Addr#
Here we have one constructor with a single argument. Addr#
is a unique in that
it is an “unboxed” type, for our purposes this just means that there are never
any thunks that can be evaluated to a lone Addr#
. If we have an Addr#
value
at all, then we know it’s evaluated. This means that Ptr a
is a type for which
WHNF implies normal form, and deepseq
provides a utility function for writing
NFData
instances for such types. Its definition is provided here, along with
the NFData
instance:
rwhnf :: a -> ()
= seq x ()
rwhnf x
instance NFData (Ptr a) where
= rwhnf rnf
I’ve noticed an extremely common misunderstanding surrounding NFData
instances
like the one for Ptr
. Because pointers can reference values, it is claimed
that some programmers might assume that using rnf
on a pointer will evaluate
the referred-to data. In the case of Ptr
, we cannot provide an rnf
that has
this functionality without using unsafePerformIO
, since the only way to read
or write the referred-to value is by doing IO. This same argument applies to
FunPtr
and ForeignPtr
.
A yet more controversial case is IORef
. IORef
is a bit like Ptr
, however,
the value referred to by an IORef
is a typical lifted heap object; it works
with any type and there is no need for a Storable
-like class for reading and
writing values. Furthermore, IORef
supports useful synchronization operations
like atomicModifyIORef
. Here’s its definition:
data IORef a = IORef (STRef RealWorld a)
The IO
type is built upon the ST
type, and an IORef
is simply a wrapper
around an STRef
with the right kind of state token. I won’t discuss the guts
of IO
and ST
further here, but this Haskell WikiBook chapter on mutable
objects provides a good
primer. Here’s the definition of STRef
:
data STRef a = STRef (MutVar# s a)
Like the Addr#
in Ptr
, a MutVar#
is always in normal form. MutVar#
is
interesting in that its first argument, the state token, behaves like a phantom
type variable (values of type
RealWorld
in the implementation of IO
behave similarly). With this information we can
determine the correct NFData
instances for STRef
and IORef
, according to
the normal form rules:
instance NFData (STRef s a) where
= rwhnf
rnf
instance NFData (IORef a) where
= rwhnf rnf
The same argument for these STRef
and IORef
instances applies to the
instances for MVar
(deepseq
currently has such an instance) and TVar
(deepseq
is, as of this writing, missing an instance for TVar
).
So far we’ve discussed arguments for the presence of certain (perhaps
unintuitive at first) NFData
instances. However, it should be noted that
deepseq
contains an instance that, according to the normal form rules,
cannot have an NFData
instance. This instance is provided for values of type
a -> b
:
instance NFData (a -> b) where
= rwhnf rnf
It is not meaningful to apply the normal form rules as they’re presented here to
values of function type. The first rule falls down: The ->
type has no
constructors available to the programmer, so there is no way to satisfy it.
A Proposal for NFData
Based on the definitions and arguments presented here, I offer the following
proposal to amend the deepseq
package as it currently exists:
- Add the normal form rules to the
deepseq
documentation as laws for theNFData
class, restating them more formally if necessary. - Add missing law-abiding instances for types in
base
, likeTVar
,ForeignPtr
, and others. - Remove law-breaking
NFData
instances, likea -> b
. - Reword package documentation to avoid any suggestion that
rnf
is a shotgun that can remove any and all thunks in its path; it’s inability to perform IO prohibits such functionality, and I believe that the perception that it provides such functionality is the root of widespread misunderstanding about what it means for value references to be in normal form.
If you ever find that you need missing NFData
instances for types in base
or
other boot packages, please add them to my
deepseq-instances
package. If you agree with the definitions I’ve formulated here, please consider
participating in discussions surrounding deepseq
on its GitHub
page and the Haskell libraries mailing
list.
Consider the C expression
a ? b : c
.b
will only be evaluated ifa
is not zero, andc
will only be evaluated ifa
is zero. It is not possible to define your own control structure in C that works with arbitrary values and has this lazy short-circuiting behavor. Go ahead and try to define your own function in C that has the exact same semantics as the ternary operator; without tricks like promoting the arguments to function types, you won’t be able to do it.↩︎There is also a
:print
GHCi command that will assign fresh names to all of the found thunks. We don’t need that feature to demonstrate laziness, though.↩︎Thanks to Lalaithion42 on Reddit for pointing this out!↩︎
In the book The Implementation of Functional Programming Languages, Simon Peyton Jones gives this definition of normal form: “If an expression contains no redexes then evaluation is complete, and the expression is said to be in normal form.” Thunks are how the RTS represents not-yet-reduced redexes at program runtume.↩︎
I wasn’t able to find a good example of this issue in the wild, but here’s a short example:
returnException :: IO (Int, Int) = do returnException let e = error "x" <- try (pure (e, e)) :: IO (Either SomeException (Int, Int)) r case r of Left e -> throwIO e Right r -> pure r
You might expect
returnException
to always throw, but instead it never throws!>>> r <- returnException >>> r (*** Exception: x CallStack (from HasCallStack): error, called at <interactive>:47:13 in interactive:Ghci11
This is because
try
is lazy. See theevaluate
function for a similar use case.↩︎See the section MVar as a Container for Shared State in Simon Marlow’s Parallel and Concurrent Programming in Haskell for a good example of this problem.↩︎