Discovering DataKinds at Runtime

Travis Whitaker - May 4, 2019

This post assumes a cursory knowledge of DataKinds.

Something almost universal to introductions to “type-level programming” in Haskell is an example involving sized vectors. Something like this:

import qualified Data.Vector as V

data SizedVector (n :: Nat) a = SizedVector (V.Vector a)
                              deriving ( Eq
                                       , Show
                                       )

This definition carries a lot of promise. It can be used to show¹ that appending two vectors yields one as long as the sum of its parts:

(++) :: SizedVector n a -> SizedVector m a -> SizedVector (n + m) a
(++) (SizedVector x) (SizedVector y) = SizedVector (x V.++ y)

Mapping over a vector doesn’t change its length:

map :: (a -> b) -> SizedVector n a -> SizedVector n b
map f (SizedVector v) = SizedVector (V.map f v)

Nor does zipping:

zipWith :: (a -> b -> c)
        -> SizedVector n a
        -> SizedVector n b
        -> SizedVector n c
zipWith f (Vector x) (Vector y) = Vector (V.zipWith f x y)

Folding doesn’t care about the length:

foldl :: (a -> b -> a) -> a -> SizedVector n b -> a
foldl f i (SizedVector x) = V.foldl f i x

We even get type-safe head, tail, init, and last:

head :: SizedVector (1 + n) a -> a
head (SizedVector x) = V.head x

tail :: SizedVector (1 + n) a -> SizedVector n a
tail (SizedVector x) = V.tail x

init :: SizedVector (n + 1) a -> SizedVector n a
init (SizedVector x) = V.init x

last :: SizedVector (n + 1) a -> a
last (SizedVector x) = V.last x

This type, these operations, and many more are available in the very nice vector-sized package.

Where the water gets murky (and where most of the type-level programming tutorials I’ve found go silent) is actually constructing a SizedVector. natVal from the KnownNat class allows us to get ahold of the Nat vector length as a value. We could compare that value with the length of some V.Vector and return a SizedVector if they match:

tryMakeSizedVector :: forall n a.
                      KnownNat n
                   => V.Vector a
                   -> Maybe (SizedVector n a)
tryMakeSizedVector x =
    let lenProxy = Proxy :: Proxy n
        retLen   = fromIntegral (natVal lenProxy)
    in if V.length x == retLen
       then Just (SizedVector x)
       else Nothing

This function looks like it will work just fine with map:

>>= :t map (+1) <$> (tryMakeSizedVector (V.fromList [1, 2, 3]))
map (+1) <$> (tryMakeSizedVector (V.fromList [1, 2, 3]))
  :: (Num b, KnownNat n) => Maybe (SizedVector n b)

But GHC complains when we try to evaluate it:

>>= map (+1) <$> (tryMakeSizedVector (V.fromList [1, 2, 3]))

<interactive> error:
    • No instance for (KnownNat n0) arising from a use of ‘it’
    • In the first argument of ‘System.IO.print’, namely ‘it’
      In a stmt of an interactive GHCi command: System.IO.print it

The issue is that we don’t really know which n we’re going to get back from tryMakeSizedVector. In this simple case we know what it should be, and if we use a type annotation it works:

>>= map (+1) <$>
        (tryMakeSizedVector (V.fromList [1, 2, 3]) :: Maybe (SizedVector 3 Int))
Just (SizedVector [2,3,4])

But this isn’t that useful. If the vector comes from outside of our program (command line argument, read from a file, received from a socket, randomly generated, etc.) we won’t be able to provide the length in a type annotation.

Unification

By encoding the length of vectors in their types, we’ve lost something. It might seem like we’ve lost the ability to mention vectors whose size we don’t know, but that’s not quite what’s going on. Consider the type of map from above, with explicit forall:

map :: forall n a b. (a -> b) -> SizedVector n a -> SizedVector n b

In the definition of map we’re clearly working with vectors whose size we don’t “know.” map works the same way for any n, a, and b, no matter what they are. And yet, the n in the signature for map is somehow different than the n in the signature for tryMakeSizedVector:

tryMakeSizedVector :: forall n a.
                      KnownNat n
                   => V.Vector a
                   -> Maybe (SizedVector n a)

The n in the signature for map expresses a relationship between the type of its second argument and the type of its result. When we gain information about either the second argument (by applying map to something) or the result (by applying something to map), we gain information about the other. Consider these expression fragments:

let x :: SizedVector 5 Int
    y = map (+1) x
    -- map :: forall n a b. (a -> b) -> SizedVector n a -> SizedVector n b
    -- n, a, and b are the free type variables.
    -- We learn from the type of x that:
    --    n = 5
    --    a = Int
    -- And we learn from the type of (+1) that:
    --    b = Int
    -- Therefore:
    -- y :: SizedVector 5 Int

\j -> let i :: Num c => SizedVector 10 c
          i = map fromIntegral j
          -- map :: forall n a b. (a -> b) -> SizedVector n a -> SizedVector n b
          -- n, a, and b are the free type variables.
          -- We learn from the type of i that:
          --    n = 10
          --    a = forall c. Num c => c
          -- And we learn from the type of fromIntegral that:
          --    b = forall d. Integral d => d
          -- Therefore:
          -- j :: forall d. Integral d => SizedVector 10 d

Even if the argument and/or result of map aren’t explicitly annotated with types, the type of map can be used to gain information about the relationship between the types of the arguments and the result. This process of discovering information about terms by applying relationships is called unification. There’s even an entire programming language whose evaluation model is based on unification. Contrast this with the information we gain by applying tryMakeSizedVector to its argument:

let x :: V.Vector Int
    y = tryMakeSizedVector x
    -- tryMakeSizedVector :: forall n a.
    --                       KnownNat n
    --                    => V.Vector a
    --                    -> Maybe (SizedVector n a)
    -- n and a are the free type variables.
    -- We learn from the type of x that:
    --    a = Int
    -- Therefore:
    -- y :: forall n. KnownNat n => Maybe (SizedVector n Int)

In the case of tryMakeSizedVector, we can only gain information about a from the argument. n is a new universally quantified type variable introduced by tryMakeSizedVector without any relationship to other terms. We can concretize n by annotating the result of tryMakeSizedVector or assigning it to a term with a more concrete type:

>>= map (+1) <$>
        (tryMakeSizedVector (V.fromList [1, 2, 3]) :: Maybe (SizedVector 3 Int))
Just (SizedVector [2,3,4])

This reveals the source of the problem: in order to do this in a useful way, we need access to the length of the argument vector at the type level, the opposite of what natVal does.

Existential Quantification

GHC.TypeNats contains this definition:

-- | This type represents unknown type-level natural numbers.
data SomeNat = forall n. KnownNat n => SomeNat (Proxy n)

By introducing a type variable on the right hand side of the definition (instead of introducing it in the type constructor on the left hand side), we can “hide” the type. The type is “hidden” in the sense that it does not appear in the type constructor, and a function passed a value of type SomeNat cannot learn the specific n that was hidden inside it, nor express any relationship between n an another type. If SomeNat is defined with GADT syntax, the analogy with something like tryMakeSizedVector becomes a bit clearer:

data SomeNat where
    SomeNat :: forall n. KnownNat n => Proxy n -> SomeNat

tryMakeSizedVector :: forall n a.
                      KnownNat n
                   => V.Vector a
                   -> Maybe (SizedVector n a)

tryMakeSizedVector allows a new universally quantified type variable to be introduced, while the SomeNat constructor allows a universally quantified type variable to be consumed or “forgotten.”

GHC.TypeLits contains this handy definition:

someNatVal :: Integer -> Maybe SomeNat

This looks sort of like what we’re after. someNatVal allows us to hoist an arbitrary Integer up to the type level, hidden inside of SomeNat. Maybe is there simply because Nats must be positive, so someNatVal will return Nothing if the argument is negative. The guts of someNatVal aren’t too interesting since Nat is a bit magical in GHC. However, all the machinery is available to define our own SomeSizedVector and someSizedVectorVal:

-- Definition with forall:
data SomeSizedVector a =
    forall n. KnownNat n => SomeSizedVector (SizedVector n a)

-- Definition with GADT syntax:
data SomeSizedVector a where
    SomeSizedVector :: KnownNat n => SizedVector n a -> SomeSizedVector a

someSizedVectorVal :: V.Vector a -> SomeSizedVector a
someSizedVectorVal x =
    case someNatVal (fromIntegral (V.length x)) of
        Nothing -> error "negative length?!"
        Just (SomeNat (_ :: Proxy n)) ->
            SomeSizedVector (SizedVector x :: SizedVector n a)

Note that we don’t need to hide the a (the type of vector elements), just the n (the type-level vector length). Let’s see if map works:

>>= case someSizedVectorVal (V.fromList [1, 2, 3]) of
            SomeSizedVector sv -> map (+1) sv

<interactive> error:
    • Couldn't match expected type ‘p’
                  with actual type ‘SizedVector
                                      n Integer’
        because type variable ‘n’ would escape its scope
      This (rigid, skolem) type variable is bound by
        a pattern with constructor:
          SomeSizedVector :: forall a (n :: Nat).
                             KnownNat n =>
                             SizedVector n a -> SomeSizedVector a,
        in a case alternative
        at <interactive>:13:51-68
    • In the expression: map (+ 1) sv
      In a case alternative: SomeSizedVector sv -> map (+ 1) sv
      In the expression:
        case someSizedVectorVal (V.fromList [1, 2, 3]) of {
          SomeSizedVector sv -> map (+ 1) sv }
    • Relevant bindings include
        sv :: SizedVector n integer-gmp-1.0.2.0:GHC.Integer.Type.Integer
          (bound at <interactive>:13:67)
        it :: p (bound at <interactive>:13:1)

What would the type of this expression look like if it somehow did typecheck? The type of someSizedVectorVal (V.fromList [1, 2, 3]) is Num a => SomeSizedVector a. The pattern match SomeSizedVector sv yields an sv with the type forall n. KnownNat n => SizedVector n a. a is the same type from Num a => SomeSizedVector a, but n is the type hidden by the SomeSizedVector constructor, and map will leak it: the type of map (+1) sv would be something like exists n. KnownNat n => SizedVector n a. All we know is that some n existed when SomeSizedVector was constructed. If this type leaked out into the rest of the program, there’d be no way for it to unify with anything. There’s no way to recover the type-level information; using the SomeSizedVector destroys the type.

The type n needs to be “consumed” by something with a type like forall n. KnownNat n => (forall r. SizedVector n a -> r) where r is not allowed to reference n. map clearly doesn’t fit this criteria: the returned vector is known to have the same length as the argument, so n will leak to the result. However, foldl can be used in a way that meets these requirements:

>>= case someSizedVectorVal (V.fromList [1, 2, 3]) of
            SomeSizedVector sv -> foldl (+) 0 sv
6

In fact, map can be used as part of the expression on the right hand side of the pattern match, as long as the type of the whole expression doesn’t reference the forbidden n:

>>= case someSizedVectorVal (V.fromList [1, 2, 3]) of SomeSizedVector sv ->
            foldl (+) 0 (map (*5) sv)
30

Quantified Continuations

Every expression that uses a SizedVector whose size isn’t known up front will need to have this same structure:

\f x -> case someSizedVectorVal x of SomeSizedVector sv -> f sv

If the type of x is V.Vector a, then by applying the type of the SomeSizedVector constructor we know that the type of f must be something like forall r. (forall n. KnownNat n => SizedVector n a) -> r. With the RankNTypes language extension, we can write a function that takes a function like f as an argument, short-circuiting the SomeSizedVector type:

withSizedVector :: forall a r.
                   V.Vector a
                -> (forall n. KnownNat n => SizedVector n a -> r)
                -> r
withSizedVector x f =
    case someNatVal (fromIntegral (V.length x)) of
        Nothing -> error "negative length?!"
        Just (SomeNat (Proxy :: Proxy n)) ->
            f (SizedVector x :: SizedVector n a)

This is a bit easier to use than pattern matching on SomeSizedVector:

>>= withSizedVector (V.fromList [1, 2, 3]) (foldl (+) 0 . map (*5))
30

The type of withSizedVector provides a nice intuition too: it allows an arbitrary vector to be injected into a type-safe bubble, as long as the expression in the bubble can handle a sized vector of any length, and as long as the type of the result doesn’t leak the length of the vector in its type. The length can appear at the term-level, of course:

>>= withSizedVector (V.fromList [1, 2, 3, 4, 5]) (foldl (\i _ -> i+1) 0)
5

Programming with this pattern makes explicit the interface between a safe model with checked constraints and arbitrary values from the cold, harsh, outside world.

Enforcing Model Invariants with DataKinds

I recently worked on a project involving software components that handle streams of video. Each component is modeled as a video transformer: it consumes video of some resolution, colorspace, framerate, etc. and produces a new video stream with possibly different properties. Some components can only produce video (e.g. reading a video stream from a file or capturing it from a camera), while others only consume video (e.g. encoding the video into a bytestream).

One obvious invariant for such an API is that two components can only be connected if the upstream component produces video that the downstream component knows how to consume. Let’s make a tiny version of this API that only checks if the components have compatible resolutions:

data VideoRes = VideoRes {
                    width :: Nat
                  , height :: Nat
                  }
              | None -- ^ No video consumed/produced at this end.

data VideoStream (inVideo :: VideoRes) (outVideo :: VideoRes) where
    VideoStream :: VideoStream inVideo outVideo

In reality the VideoStream constructor would contain a record of functions providing the stream component’s implementation. Type families can be used to encode how a component modifies the resolution:

type family ResMult (n :: Nat) (res :: VideoRes) :: VideoRes
type instance ResMult n ('VideoRes w h) = 'VideoRes (n * w) (n * h)
type instance ResMult n 'None = 'None

Now we can mock a camera, video upscaler, and encoder:

camera :: Proxy ('VideoRes w h)
          -- ^ Configure the camera to output this resolution.
       -> VideoStream 'None ('VideoRes w h)
camera = undefined

upscaler :: Proxy n
            -- ^ Scale up by a factor of n.
          -> VideoStream inVideo (ResMult n inVideo)
upscaler = undefined

-- | An encoder should be able to accept video of any resolution.
encoder :: VideoStream ('VideoRes w h) 'None
encoder = undefined

Stream composition looks a bit like function composition:

(|>) :: VideoStream a b
     -> VideoStream b c
     -> VideoStream a c
(|>) = undefined

In fact, it looks so much like function composition that we can write a Category instance:

instance Category VideoStream where
    id  = undefined :: VideoStream a a
    (.) = flip (|>)

Here’s a video processing pipeline that grabs video from a camera, doubles the frame, then encodes it. The type is VideoStream 'None 'None because this composition of streams is closed; a camera only produces video, and an encoder only consumes it.

doubleScaler :: Proxy ('VideoRes w h) -> VideoStream 'None 'None
doubleScaler proxyRes = camera proxyRes
                     |> upscaler (Proxy :: Proxy 2)
                     |> encoder

It’d be nice to be able to configure the camera dynamically at runtime, so we’ll need a function like someNatVal and someSizedVectorVal to hoist a term level resolution up to the type level:

dynRes :: Integer -- ^ Width.
       -> Integer -- ^ Height.
       -> (forall w h. (KnownNat w, KnownNat h) =>
               Maybe (Proxy ('VideoRes w h)) -> a)
       -> a
dynRes w h f =
    case someNatVal w of
        Just (SomeNat (_ :: Proxy wn)) ->
            case someNatVal h of
                Just (SomeNat (_ :: Proxy hn)) ->
                    f (Just (Proxy :: Proxy ('VideoRes wn hn)))
                Nothing -> f (Nothing :: Maybe (Proxy ('VideoRes 0 0)))
        Nothing -> f (Nothing :: Maybe (Proxy ('VideoRes 0 0)))

Now we can provide a double scaler that has its model checked by the typechecker and is configurable at runtime:

dynDoubleScaler :: Integer -> Integer -> Maybe (VideoStream 'None 'None)
dynDoubleScaler w h = dynRes w h (fmap doubleScaler)

I hope that these examples have improved your intuition for how to use DataKinds in a dynamic way. When I first began using types like this I struggled to find a good explanation of how to actually use them in a real program.

I have a lot more to say about using quantification in Haskell programs. Understanding how to use forall to solve problems is the gateway to many modern type system features, and the attendant intuitions offer guidance even when working in less-mathematically-inspired languages; I’ve found it helps to light the way when designing C++ templates.

Module containing this post’s code.

By “show” I really mean “promise.” These functions don’t constitute a proof that e.g. the length of x ++ y is the length of x plus the length of y. Instead, their types simply carry a promise that this is true, and indeed we could misbehave with the SizedVector constructor and make a function with a bogus promise. We could use our own inductive natural type instead of Nat to really prove that this is true, but something like Agda, Idris, or Coq would be better suited for this purpose, and I’ve found very little Haskell code in the wild that contains such proofs. The promise is a useful compromise.↩︎