# Backwards Compatible Configuration with Constraint Kinds

Travis Whitaker - April 16, 2019

Programs must often handle data whose specification or schema evolves over time. A changing schema presents challenges at large scales in the form of database migrations or ETL, and anyone who has worked in data science has an idea of how much time this can consume. Schema changes have impacts on smaller scales as well, e.g. an application’s configuration file.

Consider an application configuration like the following. Dhall is a great solution for configuration:

data Config1 = Config1 {
conf1FilePath :: FilePath
, conf1Workers  :: Natural
} deriving (Show, Generic, Interpret)

parseConfig1 :: Text -> IO (Maybe Config1)
parseConfig1 t = extract auto <$> inputExpr t Here Interpret is a class from the Dhall package that we may derive given Generic. It provides us a way to marshal a Dhall expression into a Config1; check out Dhall’s excellent tutorial for more details. For now, all that matters is that we’ve got a function that will attempt to parse our configuration given the text of our configuration file. Later on the configuration schema might need to be modified. Suppose the conf1UserName field is now optional, and we need a host name as well: data Config2 = Config2 { conf2FilePath :: FilePath , conf2UserName :: Maybe Text , conf2Workers :: Natural , conf2HostName :: Text } deriving (Show, Generic, Interpret) Users will already have configuration files that obey the Config1 schema, but they clearly won’t obey the Config2 schema. It’d be nice to make use of the user’s existing configuration file by migrating it to the new schema if possible. Better still would be offering to write the migrated configuration back to disk. There are several syntax-oriented ways of dealing with this problem. If we were using JSON, YAML, or even an IDL like Google’s ProtocolBuffers, we could follow syntactic prescriptions such as “make all fields optional” and get backwards compatibility “for free.” However, there are numerous consequences to this approach. Firstly, the syntactic prescriptions (which, in the case of ProtocolBuffers, are numerous) require the semantics of the schema to be arbitrarily obscured. If we follow the “all fields are optional” rule, there are likely to be fields in the schema that are syntactically optional but practically required. Handling the absence of these specified-optional-but-actually-required fields is kicked down the road. Secondly, the changes to the semantics between schemas won’t be clear from the difference in syntax alone. Again picking on the “all fields are optional rule,” if this is obeyed, a field’s transition from required to optional or vice versa doesn’t show up in the syntax at all! Dhall itself presents a syntax-oriented way of handling such schema changes: the combine operator (//) may be used to combine two records, with a bias towards the right-hand record when fields collide. This would allow the use of default // userConfig, where default is some default configuration value obeying the most recent schema, and userConfig contains only updates to this default value. This is robust to some schema changes (e.g. adding new fields) but not others. The most natural thing to do would be to state the changes to the schema semantics explicitly. Consider a class for this: class Compat a where type Pred a :: * migrate :: Pred a -> a The Pred associated type indicates a schema we can migrate from. If we build a linear chain of Compat instances bridging each schema version, we can recursively migrate our way up to the latest schema version. Here’s an instance bridging Config2 back to Config1: instance Compat Config2 where type Pred Config2 = Config1 migrate c1 = Config2 { conf2FilePath = conf1FilePath c1 , conf2UserName = Just (conf1UserName c1) , conf2Workers = conf1Workers c1 , conf2HostName = "http://example.com" } Compat allows us to write down exactly the change in semantics from one schema version to another. Some readers may be keen to point out that there exist all manner of record-oriented language extensions and lens functions that could be employed to shorten the definition of migrate. That’s true, and they’d be useful in practice, but for clarity’s sake they will not be used here. Now we can write a parseConfig2 that can handle encountering a configuration file meant for Config1: parseConfig2 :: Text -> IO (Maybe Config2) parseConfig2 t = do dhallExpr <- inputExpr t let maybeConfig1 = extract auto dhallExpr :: Maybe Config1 maybeConfig2 = extract auto dhallExpr :: Maybe Config2 pure$ case maybeConfig2 of
Nothing -> migrate <$> maybeConfig1 config2 -> config2 This simple definition of Compat illustrates this idea nicely, but has a few practical problems. Consider a future in which we’ve made it all the way to configuration schema version ConfigN. Our definition of parseConfigN would be something like: parseConfigN :: Text -> IO (Maybe ConfigN) parseConfigN t = do dhallExpr <- inputExpr t let maybeConfig1 = extract auto dhallExpr :: Maybe Config1 maybeConfig2 = extract auto dhallExpr :: Maybe Config2 .... maybeConfigN = extract auto dhallExpr :: Maybe ConfigN pure$ case maybeConfigN of
Nothing -> case  ...
...
configN -> configN

Ideally we wouldn’t need to change the definition of our parseConfig function when the schema evolves. A new type for the new schema and a Compat instance should be all that’s required. Additionally, it’d be handy to have a less restrictive type for the migrate method. For some schemas, the migration might be partial and need to fail with an error, or the result might be paired with a warning message for the user. Let’s tackle this second point first, and generalize migrate:

class Compat a where
type Pred a    :: *
type CompatF a :: * -> *
migrate :: Pred a -> (CompatF a) a

The CompatF associated type allows an instance to say which type the migration result will be wrapped up in. Recall that the kind * includes all inhabited types; things like Int, ByteString, and saturated type constructors like Maybe Int, [Int], and Either String Int. The kind * -> * includes unsaturated type constructors with one free argument, like Maybe, (->) Int, Either String, and IO. Here’s our new instance:

instance Compat Config2 where
type Pred Config2 = Config1
type CompatF Config2 = Maybe
migrate c1 = Just $Config2 { conf2FilePath = conf1FilePath c1 , conf2UserName = Just (conf1UserName c1) , conf2Workers = conf1Workers c1 , conf2HostName = "http://example.com" } Let’s use Maybe for CompatF Config2 to allow migrate to signal failure by returning Nothing. Now that migrate returns a Maybe Config2, we can fix up our previous definition of parseConfig2 by swapping out migrate <$> maybeConfig1 for maybeConfig1 >>= migrate:

parseConfig2 :: Text -> IO (Maybe Config2)
parseConfig2 t = do
dhallExpr <- inputExpr t
let maybeConfig1 = extract auto dhallExpr :: Maybe Config1
maybeConfig2 = extract auto dhallExpr :: Maybe Config2
pure $case maybeConfig2 of Nothing -> maybeConfig1 >>= migrate config2 -> config2 Let’s further extend the schema. Suppose that we’ve changed our minds about the conf2HostName field. "http://example.com" is now the only choice users have, so we’ll remove the field and fail to migrate any user configurations that used a hostname other than "http://example.com": data Config3 = Config3 { conf3FilePath :: FilePath , conf3UserName :: Maybe Text , conf3Workers :: Natural } deriving (Show, Generic, Interpret) instance Compat Config3 where type Pred Config3 = Config2 type CompatF Config3 = Maybe migrate c2 | conf2HostName c2 == "http://example.com" = Just$ Config3 {
conf3FilePath = conf2FilePath c2
, conf3Workers  = conf2Workers c2
}
| otherwise = Nothing

Here’s parseConfig3, but it’s beginning to become repetitive:

parseConfig3 :: Text -> IO (Maybe Config3)
parseConfig3 t = do
dhallExpr <- inputExpr t
let maybeConfig1 = extract auto dhallExpr :: Maybe Config1
maybeConfig2 = extract auto dhallExpr :: Maybe Config2
maybeConfig3 = extract auto dhallExpr :: Maybe Config3
pure (   maybeConfig3
<|> (maybeConfig2 >>= migrate)
<|> (maybeConfig1 >>= migrate >>= migrate)
)

The handy <|> operator from the Alternative class helps a bit, but it’d be nicer if this function was generic. How about something like this:

parseCompatConfig :: ( Interpret a
, Compat a
, Maybe ~ (CompatF a)
)
=> Text
-> IO ((CompatF a) a)
parseCompatConfig t = do
dhallExpr <- inputExpr t
let migrateConfig expr =
extract auto expr <|> (migrateConfig expr >>= migrate)
pure (migrateConfig dhallExpr)

The type has changed, but the definition has similar structure: the Monad instance for Maybe is used to pass a Maybe (Pred a) through to Migrate, and the Alternative instance for Maybe is used to stop recursion once we successfully get a configuration value from extract auto expr. This new type signature warrants some explanation.

First, the Interpret constraint comes from Dhall. This is what gives us access to extract auto expr :: Interpret a => Maybe a. The Compat constraint gives us Pred a, CompatF a, and migrate. The Maybe ~ (CompatF a) bit is called an equality constraint. Maybe provides the Alternative and Monad instances we want to use with extract auto expr, so this constraint checks that the Compat instance for the type we’re parsing uses Maybe for CompatF. The only problem with this definition is that it doesn’t typecheck. Specifically, GHC 8.6.4 says:

• Could not deduce: Pred a ~ a
from the context: (Interpret a, Compat a, Maybe ~ CompatF a)
bound by the type signature for:
parseCompatConfig :: forall a.
(Interpret a, Compat a, Maybe ~ CompatF a) =>
Text -> IO (CompatF a a)

Indeed, Pred a ~ a isn’t what we want at all. The inferred type of migrateConfig is the likely culprit. Using typed holes we can query GHC for the inferred type. Feeding it this:

parseCompatConfig :: ( Interpret a
, Compat a
, Maybe ~ (CompatF a)
)
=> Text
-> IO ((CompatF a) a)
parseCompatConfig t = do
dhallExpr <- inputExpr t
let migrateConfig :: _
migrateConfig expr =
extract auto expr <|> (migrateConfig expr >>= migrate)
pure (migrateConfig dhallExpr)

Yields the inferred type:

• Found type wildcard ‘_’ standing for ‘Expr Src X -> Maybe a’
Where: ‘a’ is a rigid type variable bound by
the type signature for:
parseCompatConfig :: forall a.
(Interpret a, Compat a, Maybe ~ CompatF a) =>
Text -> IO (CompatF a a)

The issue is that we can only get back a Maybe a from migrateConfig, where a is the type of the config we’re returning from parseCompatConfig. However, the recursive call to migrateConfig needs to return a Maybe (Pred a), so GHC figures that a ~ Pred a must be true. What we really need is something like this:

parseCompatConfig :: ( Interpret a
, Compat a
, Maybe ~ (CompatF a)
)
=> Text
-> IO ((CompatF a) a)
parseCompatConfig t = do
dhallExpr <- inputExpr t
let migrateConfig :: ( Interpret b
, Compat b
, Maybe ~ CompatF b
)
=> Expr Src X -> Maybe b
migrateConfig expr =
extract auto expr <|> (migrateConfig expr >>= migrate)
pure (migrateConfig dhallExpr)

The Expr Src X type comes from Dhall, and is just the type of parsed, typechecked, and normalized Dhall expressions. Now GHC says:

• Could not deduce: CompatF (Pred b) ~ Maybe
arising from a use of ‘migrateConfig’
from the context: (Interpret a, Compat a, Maybe ~ CompatF a)
bound by the type signature for:
parseCompatConfig :: forall a.
(Interpret a, Compat a, Maybe ~ CompatF a) =>
Text -> IO (CompatF a a)

CompatF (Pred b) ~ Maybe means that we need to know that the Compat instance for Pred a must also use Maybe for CompatF. No problem, let’s just add that constraint:

parseCompatConfig :: ( Interpret a
, Compat a
, Maybe ~ (CompatF a)
)
=> Text
-> IO ((CompatF a) a)
parseCompatConfig t = do
dhallExpr <- inputExpr t
let migrateConfig :: ( Interpret b
, Compat b
, Maybe ~ CompatF b
, Maybe ~ CompatF (Pred b)
)
=> Expr Src X -> Maybe b
migrateConfig expr =
extract auto expr <|> (migrateConfig expr >>= migrate)
pure (migrateConfig dhallExpr)

Now GHC says:

• Could not deduce: CompatF (Pred (Pred b)) ~ Maybe
arising from a use of ‘migrateConfig’
from the context: (Interpret a, Compat a, Maybe ~ CompatF a)
bound by the type signature for:
parseCompatConfig :: forall a.
(Interpret a, Compat a, Maybe ~ CompatF a) =>
Text -> IO (CompatF a a)

No problem, let’s just add that constraint:

parseCompatConfig :: ( Interpret a
, Compat a
, Maybe ~ (CompatF a)
)
=> Text
-> IO ((CompatF a) a)
parseCompatConfig t = do
dhallExpr <- inputExpr t
let migrateConfig :: ( Interpret b
, Compat b
, Maybe ~ CompatF b
, Maybe ~ CompatF (Pred b)
, Maybe ~ CompatF (Pred (Pred b))
)
=> Expr Src X -> Maybe b
migrateConfig expr =
extract auto expr <|> (migrateConfig expr >>= migrate)
pure (migrateConfig dhallExpr)

Now GHC says:

• Could not deduce: CompatF (Pred (Pred (Pred b))) ~ Maybe
arising from a use of ‘migrateConfig’
from the context: (Interpret a, Compat a, Maybe ~ CompatF a)
bound by the type signature for:
parseCompatConfig :: forall a.
(Interpret a, Compat a, Maybe ~ CompatF a) =>
Text -> IO (CompatF a a)

If we continue we’ll have an infinitely large constraint, which isn’t that useful. Let’s try adding the constraint to Compat instead:

class (Compat (Pred a), CompatF a ~ CompatF (Pred a)) => Compat a where
type Pred a :: *
type CompatF a :: * -> *
migrate :: Pred a -> (CompatF a) a

But this is just a shorter way of writing the same infinite constraint from earlier, and GHC isn’t fooled:

• Superclass cycle for ‘Compat’
one of whose superclasses is ‘Compat’
Use UndecidableSuperClasses to accept this

Even if we allowed such an infinite constraint, that’s not quite what we want. Eventually the chain of Compat instances will end with our original schema type. What’s needed is a way to enforce the Compat (Pred a) constraint only when a is not the final type in the Compat chain. This is where the ConstraintKinds extension comes in.

Anything that is allowed to appear to the left of => can be manipulated like any other type when ConstraintKinds is enabled. These new types have the kind Constraint. They can be aliased with type synonyms, used in type families, and used as arguments to type constructors that have the appropriate kind signature. It’s this last bit that will give us the behavior we need for Compat.

For a given constraint, we can build a value that represents a proof of that constraint. Consider Functor as an example:

data FunctorProof f where
FunctorProof :: ((a -> b) -> f a -> f b) -> FunctorProof f

listFunctorProof :: FunctorProof []
listFunctorProof = FunctorProof map

maybeFunctorProof :: FunctorProof Maybe
maybeFunctorProof = FunctorProof (\f m -> maybe Nothing (Just . f) m)

functorFunctorProof :: Functor f => FunctorProof f
functorFunctorProof = FunctorProof fmap

A value of type FunctorProof f is as good as the constraint Functor f =>, and indeed this FunctorProof type contains exactly the function that would be inside a Functor dictionary after GHC lowers Haskell to Core. With ConstraintKinds, we can make a general proof type that works for any constraint:

data Proof :: Constraint -> * where
Proof :: a => Proof a

type FunctorProof f = Proof (Functor f)

functorFunctorProof :: Functor f => FunctorProof f
functorFunctorProof = Proof

This simple data type let’s us do something interesting: we can build a Proof in a scope where we know the constraint is true, pass the value elsewhere, and then unpack the value to take advantage of the constraint. Edward Kmett’s constraints package calls this type Dict instead of Proof (we’ll switch to Dict from here on out), and includes lots of utilities for doing useful things with these proofs.

Back to the Compat class. The behavior we want is the enforcement of the constraint (Compat (Pred a), CompatF a ~ CompatF (Pred a)) when the Compat instance is not for the original schema type. This constraint ensures the connectedness of of the chain of Compat instances, and also guarantees that the same type is used for CompatF for each instance in the chain. The final link in the chain will be a Compat instance for the original schema type. There can be no meaningful Compat (Pred a) instance for this final type, so the chain continuity constraint must be elided for the final instance.

One way to achieve this is to add an optional Dict to the Compat class:

class Compat a where
type Pred a ::
type CompatF a :: * -> *
migrate :: Pred a -> (CompatF a) a
continue :: Proxy a
-> Maybe (Dict ( Compat (Pred a)
, CompatF a ~ CompatF (Pred a)
)
)

The continue method now allows each instance to provide its own proof of the continuity constraint if it’s available. The instances for Config2 and Config3 now look like this:

instance Compat Config2 where
type Pred Config2 = Config1
type CompatF Config2 = Maybe
migrate c1 = Just $Config2 { conf2FilePath = conf1FilePath c1 , conf2UserName = Just (conf1UserName c1) , conf2Workers = conf1Workers c1 , conf2HostName = "http://example.com" } continue _ = Just Dict instance Compat Config3 where type Pred Config3 = Config2 type CompatF Config3 = Maybe migrate c2 | conf2HostName c2 == "http://example.com" = Just$ Config3 {
conf3FilePath = conf2FilePath c2
, conf3Workers  = conf2Workers c2
}
| otherwise = Nothing
continue _ = Just Dict

We’ll now need an instance for Config1 as well, although it doesn’t do much:

instance Compat Config1 where
type Pred Config1 = Void
type CompatF Config1 = Maybe
migrate _ = Nothing
continue _ = Nothing

Since Config1 is our original schema, there’s no meaningful values of type Pred Config1, so the uninhabited type Void is suitable. CompatF Config1 needs to match our other instances. There are no preceding values, so Migrate can only return Nothing. And because this is the final link in the Compat chain, we have no proof of the continuity constraint, so there’s no Dict we can return from continue.

In the definition of parseCompatConfig, we need to use continue to check if the constraints for migrateConfig’s recursive case can be proven. If the current Compat instance provides a proof, we can recurse, otherwise we’re at the end of the compatibility chain.

parseCompatConfig :: ( Interpret a
, Compat a
, Maybe ~ (CompatF a)
)
=> Text
-> IO ((CompatF a) a)
parseCompatConfig t = do
dhallExpr <- inputExpr t
let migrateConfig :: forall b.
( Interpret b
, Compat b
, Maybe ~ CompatF b
)
=> Expr Src X -> Maybe b
migrateConfig expr =
let next = case continue (Proxy :: Proxy b) of
Just Dict -> migrateConfig expr >>= migrate
Nothing   -> Nothing
in extract auto expr <|> next
pure (migrateConfig dhallExpr)

The forall in the type signature comes from the ScopedTypeVariables language extension. This extension is needed because the b in the Proxy :: Proxy b signature is the same b from the migrateConfig type signature, not a new universally quantified b. Unfortunately, GHC still doesn’t like this:

• Could not deduce (Interpret (Pred b))
arising from a use of ‘migrateConfig’
from the context: (Interpret a, Compat a, Maybe ~ CompatF a)
bound by the type signature for:
parseCompatConfig :: forall a.
(Interpret a, Compat a, Maybe ~ CompatF a) =>
Text -> IO (CompatF a a)

The Dict that may be provided by the continue method does not provide a proof of Interpret (Pred b). In general, there may be arbitrary additional constraints that are necessary to obtain a value of each schema type. When using Dhall to parse the configurations, Interpret is a necessary constraint, so in addition to a proof of Compat (Pred a), a proof of Interpret (Pred a) is needed. There’s no need to settle on Interpret in particular; the Compat class can check that any constraint holds for the entire compatibility chain:

class Compat a where
type Pred a             :: *
type CompatConstraint a :: * -> Constraint
type CompatF a          :: * -> *
migrate  :: Pred a -> (CompatF a) a
continue :: Proxy a
-> Maybe (Dict ( Compat (Pred a)
, (CompatConstraint a) (Pred a)
, CompatConstraint a ~ CompatConstraint (Pred a)
, CompatF a ~ CompatF (Pred a)
)
)

CompatConstraint is a new associated type that provides any additional constraints needed to obtain the values of each schema. continue now also provides a proof of (CompatConstraint a) (Pred a) (which provides whatever is needed to obtain configuration values), and CompatConstraint a ~ CompatConstraint (Pred a), which ensures that the same CompatConstraint is used for each instance in the Compat chain. Here are the updated instances:

instance Compat Config1 where
type Pred Config1 = Void
type CompatConstraint Config1 = Interpret
type CompatF Config1 = Maybe
migrate _ = Nothing
continue _ = Nothing

instance Compat Config2 where
type Pred Config2 = Config1
type CompatConstraint Config2 = Interpret
type CompatF Config2 = Maybe
migrate c1 = Just $Config2 { conf2FilePath = conf1FilePath c1 , conf2UserName = Just (conf1UserName c1) , conf2Workers = conf1Workers c1 , conf2HostName = "http://example.com" } continue _ = Just Dict instance Compat Config3 where type Pred Config3 = Config2 type CompatConstraint Config3 = Interpret type CompatF Config3 = Maybe migrate c2 | conf2HostName c2 == "http://example.com" = Just$ Config3 {
conf3FilePath = conf2FilePath c2
, conf3Workers  = conf2Workers c2
}
| otherwise = Nothing
continue _ = Just Dict

Here’s the updated definition of parseCompatConfig:

parseCompatConfig :: ( Interpret a
, Compat a
, Maybe ~ (CompatF a)
, Interpret ~ (CompatConstraint a)
)
=> Text
-> IO ((CompatF a) a)
parseCompatConfig t = do
dhallExpr <- inputExpr t
let migrateConfig :: forall b.
( Interpret b
, Compat b
, Maybe ~ CompatF b
, Interpret ~ (CompatConstraint b)
)
=> Expr Src X -> Maybe b
migrateConfig expr =
let next = case continue (Proxy :: Proxy b) of
Just Dict -> migrateConfig expr >>= migrate
Nothing   -> Nothing
in extract auto expr <|> next
pure (migrateConfig dhallExpr)

Now we have a working parseCompatConfig. This function may be passed a configuration file encoding a Config1, a Config2, a Config3, or any future schema, and will (if possible) return a value of the latest schema with a Compat instance.

Consider the migrateConfig definition. The only Dhall-specific parts of the definition are the Interpret constraints, the expr argument, and what we do with it. extract auto expr returns something of type forall a. Interpret a => Maybe a. Both the Interpret constraint and the Maybe type constructor are also stipulated by our Compat instances. We can generalize the Interpret constraint by simply using the CompatConstraint associated type, and we can generalize Maybe by using the CompatF associated type, with the addition of Alternative and Monad constraints, since we want to use <|> and >>= with the CompatF type. migrateConfig can now be replaced with the much more general getCompatible:

getCompatible
:: forall a.
( Compat a
, (CompatConstraint a) a
, Alternative (CompatF a)
)
=> (forall c. (Compat c, (CompatConstraint a) c) => (CompatF a) c)
-> (CompatF a) a
getCompatible f =
let f' = case continue (Proxy :: Proxy a) of
Nothing   -> empty
Just Dict -> getCompatible f >>= migrate
in f <|> f'

This will work with any chain of Compat instances where CompatF has Alternative and Monad instances. Defining parseCompatConfig in terms of getCompatible:

parseCompatConfig :: ( Interpret a
, Compat a
, Maybe ~ (CompatF a)
, Interpret ~ (CompatConstraint a)
)
=> Text
-> IO ((CompatF a) a)
parseCompatConfig t = do
expr <- inputExpr t
pure (getCompatible (extract auto expr))

I’ve found this Compat class works well for managing changing configuration schemas. This particular implementation has a few practical drawbacks. In the worst case, parsing values from older schemas has a cost linear in the length of the Compat chain. This cost will likely be unnoticeable for parsing configuration files, but isn’t ideal for anything larger-scale. In the latter case, incorporating a version tag in the data schema that tells the parser which schema revision to try should yield better performance. The version tags could be computed from the Compat chain automatically with type-level naturals, and type families could be used to give migrate a function type that accepts any predecessor value, instead of the most recent predecessor, allowing the migration chain to be short-circuited when possible for the sake of performance.

A potential semantic pitfall with this approach is that the parser used to yield configuration values to getCompatible (which is extract auto in the examples in this post) might return early if it is unable to distinguish between an older and a newer schema. Consider these two problematic schemas:

data Bad1 = Bad1 {
}
Suppose that our configuration has an IP address field. In the first version of the schema, the IP address must be an IPv4 address. In the (very stingy) second schema revision, the IP address must be an IPv6 address. Since both schema types simply use a String to represent IP addresses, a naive parser that can’t tell the difference between an IPv4 and IPv6 address will always return a Bad2 value, even if the configuration file contains an IPv4 address. The Compat class as written here offers no help in this situation, since it relies on parse failure to walk up the migration chain. This situation may be avoided by always using the richest, most specific schema types possible.
A tiny Haskell package providing Compat is available on GitHub