Discovering DataKinds at Runtime
This post assumes a cursory knowledge of DataKinds
.
Something almost universal to introductions to “type-level programming” in Haskell is an example involving sized vectors. Something like this:
import qualified Data.Vector as V
data SizedVector (n :: Nat) a = SizedVector (V.Vector a)
deriving ( Eq
Show
, )
This definition carries a lot of promise. It can be used to show1 that appending two vectors yields one as long as the sum of its parts:
(++) :: SizedVector n a -> SizedVector m a -> SizedVector (n + m) a
++) (SizedVector x) (SizedVector y) = SizedVector (x V.++ y) (
Mapping over a vector doesn’t change its length:
map :: (a -> b) -> SizedVector n a -> SizedVector n b
map f (SizedVector v) = SizedVector (V.map f v)
Nor does zipping:
zipWith :: (a -> b -> c)
-> SizedVector n a
-> SizedVector n b
-> SizedVector n c
zipWith f (Vector x) (Vector y) = Vector (V.zipWith f x y)
Folding doesn’t care about the length:
foldl :: (a -> b -> a) -> a -> SizedVector n b -> a
foldl f i (SizedVector x) = V.foldl f i x
We even get type-safe head
, tail
, init
, and last
:
head :: SizedVector (1 + n) a -> a
head (SizedVector x) = V.head x
tail :: SizedVector (1 + n) a -> SizedVector n a
tail (SizedVector x) = V.tail x
init :: SizedVector (n + 1) a -> SizedVector n a
init (SizedVector x) = V.init x
last :: SizedVector (n + 1) a -> a
last (SizedVector x) = V.last x
This type, these operations, and many more are available in the very nice vector-sized package.
Where the water gets murky (and where most of the type-level programming
tutorials I’ve found go silent) is actually constructing a
SizedVector
. natVal
from the KnownNat
class allows us to get ahold of the
Nat
vector length as a value. We could compare that value with the length of
some V.Vector
and return a SizedVector
if they match:
tryMakeSizedVector :: forall n a.
KnownNat n
=> V.Vector a
-> Maybe (SizedVector n a)
=
tryMakeSizedVector x let lenProxy = Proxy :: Proxy n
= fromIntegral (natVal lenProxy)
retLen in if V.length x == retLen
then Just (SizedVector x)
else Nothing
This function looks like it will work just fine with map
:
>>= :t map (+1) <$> (tryMakeSizedVector (V.fromList [1, 2, 3]))
map (+1) <$> (tryMakeSizedVector (V.fromList [1, 2, 3]))
:: (Num b, KnownNat n) => Maybe (SizedVector n b)
But GHC complains when we try to evaluate it:
>>= map (+1) <$> (tryMakeSizedVector (V.fromList [1, 2, 3]))
<interactive> error:
• No instance for (KnownNat n0) arising from a use of ‘it’
• In the first argument of ‘System.IO.print’, namely ‘it’
In a stmt of an interactive GHCi command: System.IO.print it
The issue is that we don’t really know which n
we’re going to get back from
tryMakeSizedVector
. In this simple case we know what it should be, and if we
use a type annotation it works:
>>= map (+1) <$>
(tryMakeSizedVector (V.fromList [1, 2, 3]) :: Maybe (SizedVector 3 Int))
Just (SizedVector [2,3,4])
But this isn’t that useful. If the vector comes from outside of our program (command line argument, read from a file, received from a socket, randomly generated, etc.) we won’t be able to provide the length in a type annotation.
Unification
By encoding the length of vectors in their types, we’ve lost something. It
might seem like we’ve lost the ability to mention vectors whose size we don’t
know, but that’s not quite what’s going on. Consider the type of map
from
above, with explicit forall
:
map :: forall n a b. (a -> b) -> SizedVector n a -> SizedVector n b
In the definition of map
we’re clearly working with vectors whose size we
don’t “know.” map
works the same way for any n
, a
, and b
, no matter what
they are. And yet, the n
in the signature for map
is somehow different than
the n
in the signature for tryMakeSizedVector
:
tryMakeSizedVector :: forall n a.
KnownNat n
=> V.Vector a
-> Maybe (SizedVector n a)
The n
in the signature for map
expresses a relationship between the type of
its second argument and the type of its result. When we gain information about
either the second argument (by applying map
to something) or the result (by
applying something to map
), we gain information about the other. Consider
these expression fragments:
let x :: SizedVector 5 Int
= map (+1) x
y -- map :: forall n a b. (a -> b) -> SizedVector n a -> SizedVector n b
-- n, a, and b are the free type variables.
-- We learn from the type of x that:
-- n = 5
-- a = Int
-- And we learn from the type of (+1) that:
-- b = Int
-- Therefore:
-- y :: SizedVector 5 Int
-> let i :: Num c => SizedVector 10 c
\j = map fromIntegral j
i -- map :: forall n a b. (a -> b) -> SizedVector n a -> SizedVector n b
-- n, a, and b are the free type variables.
-- We learn from the type of i that:
-- n = 10
-- a = forall c. Num c => c
-- And we learn from the type of fromIntegral that:
-- b = forall d. Integral d => d
-- Therefore:
-- j :: forall d. Integral d => SizedVector 10 d
Even if the argument and/or result of map
aren’t explicitly annotated with
types, the type of map
can be used to gain information about the relationship
between the types of the arguments and the result. This process of discovering
information about terms by applying relationships is called
unification. There’s even an entire
programming language
whose evaluation model is based on unification.
Contrast this with the information we gain by applying tryMakeSizedVector
to
its argument:
let x :: V.Vector Int
= tryMakeSizedVector x
y -- tryMakeSizedVector :: forall n a.
-- KnownNat n
-- => V.Vector a
-- -> Maybe (SizedVector n a)
-- n and a are the free type variables.
-- We learn from the type of x that:
-- a = Int
-- Therefore:
-- y :: forall n. KnownNat n => Maybe (SizedVector n Int)
In the case of tryMakeSizedVector
, we can only gain information about a
from
the argument. n
is a new universally quantified type variable introduced by
tryMakeSizedVector
without any relationship to other terms. We can concretize
n
by annotating the result of tryMakeSizedVector
or assigning it to a term
with a more concrete type:
>>= map (+1) <$>
(tryMakeSizedVector (V.fromList [1, 2, 3]) :: Maybe (SizedVector 3 Int))
Just (SizedVector [2,3,4])
This reveals the source of the problem: in order to do this in a useful way, we
need access to the length of the argument vector at the type level, the opposite
of what natVal
does.
Existential Quantification
GHC.TypeNats
contains this definition:
-- | This type represents unknown type-level natural numbers.
data SomeNat = forall n. KnownNat n => SomeNat (Proxy n)
By introducing a type variable on the right hand side of the definition (instead
of introducing it in the type constructor on the left hand side), we can “hide”
the type. The type is “hidden” in the sense that it does not appear in the type
constructor, and a function passed a value of type SomeNat
cannot learn the
specific n
that was hidden inside it, nor express any relationship between n
an another type. If SomeNat
is defined with GADT syntax, the analogy with
something like tryMakeSizedVector
becomes a bit clearer:
data SomeNat where
SomeNat :: forall n. KnownNat n => Proxy n -> SomeNat
tryMakeSizedVector :: forall n a.
KnownNat n
=> V.Vector a
-> Maybe (SizedVector n a)
tryMakeSizedVector
allows a new universally quantified type variable to be
introduced, while the SomeNat
constructor allows a universally quantified type
variable to be consumed or “forgotten.”
GHC.TypeLits
contains this handy definition:
someNatVal :: Integer -> Maybe SomeNat
This looks sort of like what we’re after. someNatVal
allows us to hoist an
arbitrary Integer
up to the type level, hidden inside of SomeNat
. Maybe
is
there simply because Nat
s must be positive, so someNatVal
will return
Nothing
if the argument is negative. The guts of someNatVal
aren’t too
interesting since Nat
is a bit magical in GHC. However, all the machinery is
available to define our own SomeSizedVector
and someSizedVectorVal
:
-- Definition with forall:
data SomeSizedVector a =
forall n. KnownNat n => SomeSizedVector (SizedVector n a)
-- Definition with GADT syntax:
data SomeSizedVector a where
SomeSizedVector :: KnownNat n => SizedVector n a -> SomeSizedVector a
someSizedVectorVal :: V.Vector a -> SomeSizedVector a
=
someSizedVectorVal x case someNatVal (fromIntegral (V.length x)) of
Nothing -> error "negative length?!"
Just (SomeNat (_ :: Proxy n)) ->
SomeSizedVector (SizedVector x :: SizedVector n a)
Note that we don’t need to hide the a
(the type of vector elements), just the
n
(the type-level vector length). Let’s see if map
works:
>>= case someSizedVectorVal (V.fromList [1, 2, 3]) of
SomeSizedVector sv -> map (+1) sv
<interactive> error:
• Couldn't match expected type ‘p’
with actual type ‘SizedVector
n Integer’
because type variable ‘n’ would escape its scope
This (rigid, skolem) type variable is bound by
a pattern with constructor:
SomeSizedVector :: forall a (n :: Nat).
KnownNat n =>
SizedVector n a -> SomeSizedVector a,
in a case alternative
at <interactive>:13:51-68
• In the expression: map (+ 1) sv
In a case alternative: SomeSizedVector sv -> map (+ 1) sv
In the expression:
case someSizedVectorVal (V.fromList [1, 2, 3]) of {
SomeSizedVector sv -> map (+ 1) sv }
• Relevant bindings include
sv :: SizedVector n integer-gmp-1.0.2.0:GHC.Integer.Type.Integer
(bound at <interactive>:13:67)
it :: p (bound at <interactive>:13:1)
What would the type of this expression look like if it somehow did typecheck?
The type of someSizedVectorVal (V.fromList [1, 2, 3])
is Num a => SomeSizedVector a
. The pattern match SomeSizedVector sv
yields an sv
with
the type forall n. KnownNat n => SizedVector n a
. a
is the same type from
Num a => SomeSizedVector a
, but n
is the type hidden by the
SomeSizedVector
constructor, and map
will leak it: the type of map (+1) sv
would be something like exists n. KnownNat n => SizedVector n a
. All we know
is that some n
existed when SomeSizedVector
was constructed. If this type
leaked out into the rest of the program, there’d be no way for it to unify with
anything. There’s no way to recover the type-level information; using the
SomeSizedVector
destroys the type.
The type n
needs to be “consumed” by something with a type like forall n. KnownNat n => (forall r. SizedVector n a -> r)
where r
is not allowed to
reference n
. map
clearly doesn’t fit this criteria: the returned vector is
known to have the same length as the argument, so n
will leak to the
result. However, foldl
can be used in a way that meets these requirements:
>>= case someSizedVectorVal (V.fromList [1, 2, 3]) of
SomeSizedVector sv -> foldl (+) 0 sv
6
In fact, map
can be used as part of the expression on the right hand side of
the pattern match, as long as the type of the whole expression doesn’t reference
the forbidden n
:
>>= case someSizedVectorVal (V.fromList [1, 2, 3]) of SomeSizedVector sv ->
foldl (+) 0 (map (*5) sv)
30
Quantified Continuations
Every expression that uses a SizedVector
whose size isn’t known up front will
need to have this same structure:
-> case someSizedVectorVal x of SomeSizedVector sv -> f sv \f x
If the type of x
is V.Vector a
, then by applying the type of the
SomeSizedVector
constructor we know that the type of f
must be something
like forall r. (forall n. KnownNat n => SizedVector n a) -> r
. With the
RankNTypes
language extension, we can write a function that takes a function
like f
as an argument, short-circuiting the SomeSizedVector
type:
withSizedVector :: forall a r.
V.Vector a
-> (forall n. KnownNat n => SizedVector n a -> r)
-> r
=
withSizedVector x f case someNatVal (fromIntegral (V.length x)) of
Nothing -> error "negative length?!"
Just (SomeNat (Proxy :: Proxy n)) ->
SizedVector x :: SizedVector n a) f (
This is a bit easier to use than pattern matching on SomeSizedVector
:
>>= withSizedVector (V.fromList [1, 2, 3]) (foldl (+) 0 . map (*5))
30
The type of withSizedVector
provides a nice intuition too: it allows an
arbitrary vector to be injected into a type-safe bubble, as long as the
expression in the bubble can handle a sized vector of any length, and as long as
the type of the result doesn’t leak the length of the vector in its type. The
length can appear at the term-level, of course:
>>= withSizedVector (V.fromList [1, 2, 3, 4, 5]) (foldl (\i _ -> i+1) 0)
5
Programming with this pattern makes explicit the interface between a safe model with checked constraints and arbitrary values from the cold, harsh, outside world.
Enforcing Model Invariants with DataKinds
I recently worked on a project involving software components that handle streams of video. Each component is modeled as a video transformer: it consumes video of some resolution, colorspace, framerate, etc. and produces a new video stream with possibly different properties. Some components can only produce video (e.g. reading a video stream from a file or capturing it from a camera), while others only consume video (e.g. encoding the video into a bytestream).
One obvious invariant for such an API is that two components can only be connected if the upstream component produces video that the downstream component knows how to consume. Let’s make a tiny version of this API that only checks if the components have compatible resolutions:
data VideoRes = VideoRes {
width :: Nat
height :: Nat
,
}| None -- ^ No video consumed/produced at this end.
data VideoStream (inVideo :: VideoRes) (outVideo :: VideoRes) where
VideoStream :: VideoStream inVideo outVideo
In reality the VideoStream
constructor would contain a record of functions
providing the stream component’s implementation. Type families can be used to
encode how a component modifies the resolution:
type family ResMult (n :: Nat) (res :: VideoRes) :: VideoRes
type instance ResMult n ('VideoRes w h) = 'VideoRes (n * w) (n * h)
type instance ResMult n 'None = 'None
Now we can mock a camera, video upscaler, and encoder:
camera :: Proxy ('VideoRes w h)
-- ^ Configure the camera to output this resolution.
-> VideoStream 'None ('VideoRes w h)
= undefined
camera
upscaler :: Proxy n
-- ^ Scale up by a factor of n.
-> VideoStream inVideo (ResMult n inVideo)
= undefined
upscaler
-- | An encoder should be able to accept video of any resolution.
encoder :: VideoStream ('VideoRes w h) 'None
= undefined encoder
Stream composition looks a bit like function composition:
(|>) :: VideoStream a b
-> VideoStream b c
-> VideoStream a c
|>) = undefined (
In fact, it looks so much like function composition that we can write a
Category
instance:
instance Category VideoStream where
id = undefined :: VideoStream a a
.) = flip (|>) (
Here’s a video processing pipeline that grabs video from a camera, doubles the
frame, then encodes it. The type is VideoStream 'None 'None
because this
composition of streams is closed; a camera
only produces video, and an
encoder
only consumes it.
doubleScaler :: Proxy ('VideoRes w h) -> VideoStream 'None 'None
= camera proxyRes
doubleScaler proxyRes |> upscaler (Proxy :: Proxy 2)
|> encoder
It’d be nice to be able to configure the camera
dynamically at runtime, so
we’ll need a function like someNatVal
and someSizedVectorVal
to hoist a term
level resolution up to the type level:
dynRes :: Integer -- ^ Width.
-> Integer -- ^ Height.
-> (forall w h. (KnownNat w, KnownNat h) =>
Maybe (Proxy ('VideoRes w h)) -> a)
-> a
=
dynRes w h f case someNatVal w of
Just (SomeNat (_ :: Proxy wn)) ->
case someNatVal h of
Just (SomeNat (_ :: Proxy hn)) ->
Just (Proxy :: Proxy ('VideoRes wn hn)))
f (Nothing -> f (Nothing :: Maybe (Proxy ('VideoRes 0 0)))
Nothing -> f (Nothing :: Maybe (Proxy ('VideoRes 0 0)))
Now we can provide a double scaler that has its model checked by the typechecker and is configurable at runtime:
dynDoubleScaler :: Integer -> Integer -> Maybe (VideoStream 'None 'None)
= dynRes w h (fmap doubleScaler) dynDoubleScaler w h
I hope that these examples have improved your intuition for how to use
DataKinds
in a dynamic way. When I first began using types like this I
struggled to find a good explanation of how to actually use them in a real
program.
I have a lot more to say about using quantification in Haskell
programs. Understanding how to use forall
to solve problems is the gateway to
many modern type system features, and the attendant intuitions offer guidance
even when working in less-mathematically-inspired languages; I’ve found it helps
to light the way when designing C++ templates.
Module containing this post’s code.
By “show” I really mean “promise.” These functions don’t constitute a proof that e.g. the length of
x ++ y
is the length ofx
plus the length ofy
. Instead, their types simply carry a promise that this is true, and indeed we could misbehave with theSizedVector
constructor and make a function with a bogus promise. We could use our own inductive natural type instead ofNat
to really prove that this is true, but something like Agda, Idris, or Coq would be better suited for this purpose, and I’ve found very little Haskell code in the wild that contains such proofs. The promise is a useful compromise.↩︎