YOCaml

From data to values

Published on the

Now that we can create data, we can now focus on data validation, which will allow us to read metadata attached to a document!

let pp_errors f ppf k = Yocaml.Nel.pp f ppf k
# open Yocaml ;;
# #install_printer pp_errors ;;

As with creation, data validation consists of composing a set of combinators found in the Yocaml.Data.Validation module. We previously saw that projectable data of type t had the type t Yocaml.Data.converter. In the same way, YOCaml exposes two types to describe validations:

  • ('a, 'b) Yocaml.Data.validator, which describes a function. (The notion of validation is expressed using the well-known Result type from the OCaml standard library.)

  • 'a Yocaml.Data.validable, which is a specialized version of validator, but takes as an argument a value of type Yocaml.Data.t and attempts to convert it into an 'a.

Usually, YOCaml handle the conversion of metadata from a document into values of type Yocaml.Data.t. This is generally the role of the DATA_PROVIDER module, which is passed to functions whose purpose is to extract data associated with a document. So our task is to transform values of type Yocaml.Data.t (which is, broadly speaking, an untyped representation) into the representation of our choice!

There is an interesting duality between conversion to Yocaml.Data.t and validation from Yocaml.Data.t. Indeed, conversion is a total function: for any value that can be serialized, there exists a representation, so conversion should never fail.
Validation, however, starts from untyped information and may potentially fail, which makes validation a partial function. In a way, converting amounts to packing type information, while validating amounts to unpacking, or restoring, that type information.

Simple values

Before validating complex and structured data, which will often be the case when our data comes from documents, we will first see how to validate simple data. The goal of these validations will be to attempt to convert a Yocaml.Data.t value into a concrete OCaml value.

A first example, boolean validation

First, we will create a value and then observe it with its associated validator: Yocaml.Data.Validation.bool.

# Data.bool true |> Data.Validation.bool ;;
- : bool Data.Validation.validated_value = Ok true

You can do the same with false:

# Data.bool false |> Data.Validation.bool ;;
- : bool Data.Validation.validated_value = Ok false

We can see that validated values are wrapped in a Result and return their original types. But what happens if we try to validate an unlikely or incorrect piece of data?

When validations fail

There are multiple reasons why a validation can fail, which are captured by the errors exposed by the Yocaml.Data.Validation module.
A common error that can occur is simply that the value being validated does not have the correct shape. For example, if I try to validate an integer as a boolean.

# Data.int 42 |> Data.Validation.bool ;;
- : bool Data.Validation.validated_value =
Error
 (Data.Validation.Invalid_shape
   {Yocaml.Data.Validation.expected = "bool"; given = Data.Int 42})

One might imagine that any non-zero integer would return true; however, YOCaml tries, as much as possible, to avoid implicit conversions. That said, we will see later that it is possible to compose validators to handle multiple cases.

A bit more about implicit conversions

In principle, the fact that YOCaml tries to avoid implicit conversions as much as possible does not mean that they do not exist at all. Indeed, the different languages used to describe data (Sexp, JSON, ToML, YAML, etc.) have varying levels of expressiveness (for example, Sexp notably has fewer data types than ToML). The DATA_PROVIDER modules, whose role is to convert data from these different languages into Yocaml.Data.t, sometimes take liberties in how they interpret a value as Yocaml.Data.t. This is why some validation functions can be a bit lax at times.

Numbers validation

Unsurprisingly, number validation is very similar to boolean validation.
We can use the two observation functions int and float! As with booleans, we will create data of the correct shape and pass it to our observation functions:

# Data.int 42 |> Data.Validation.int ;;
- : int Data.Validation.validated_value = Ok 42

You can do the same with float:

# Data.float 42.3 |> Data.Validation.float ;;
- : float Data.Validation.validated_value = Ok 42.3

And just like with booleans, we can quickly convince ourselves of the reliability of the validation functions by trying to validate data that is objectively irrelevant:

# Data.bool true |> Data.Validation.int ;;
- : int Data.Validation.validated_value =
Error
 (Data.Validation.Invalid_shape
   {Yocaml.Data.Validation.expected = "int"; given = Data.Bool true})

Lax validation of numbers

Since JSON (through JavaScript's somewhat questionable primitive types) does not specify a distinction between floats and integers, YOCaml validation is a bit lax, allowing integers to be treated as floats and vice versa:

# Data.float 42.14 |> Data.Validation.int ;;
- : int Data.Validation.validated_value = Ok 42
# Data.int 42 |> Data.Validation.float ;;
- : float Data.Validation.validated_value = Ok 42.

In practice, YOCaml generally does a good job of distinguishing integers from floats during the DATA_PROVIDER conversion phase. However, we are not completely immune to dubious inference, hence this caution.

OCaml has other representations of integers, int64, int32, nativeint, etc.
We will see later how to build validators for these different integer representations (which are not supported by default in YOCaml, as their use seemed marginal for document metadata).

String validation

As with the previous types, string comes with its observation function Yocaml.Data.Validation.string. The only difference compared to the previous validators is the presence of the strict flag, which controls whether other data types can be considered as strings.

# Data.string "Hello World" |> Data.Validation.string ;;
- : string Data.Validation.validated_value = Ok "Hello World"

If we try to observe a value of a type that does not match, just like before, the validation will return an error:

# Data.int 34 |> Data.Validation.string ;;
- : string Data.Validation.validated_value =
Error
 (Data.Validation.Invalid_shape
   {Yocaml.Data.Validation.expected = "strict-string"; given = Data.Int 34})

However, it is possible to relax this constraint for cases where, for example, the string "true" might have been converted to a boolean, and we want to treat it as a string. To do this, we can set the strict flag to false:

# Data.bool true |> Data.Validation.string ~strict:false ;;
- : string Data.Validation.validated_value = Ok "true"

Now that we have seen the basics of validation for simple data, just like with projections, let's look at more complex cases by composing validators to validate more complex types such as lists or options.

Composing with validation

As with projections, validating simple data is a good first step, but now we want to be able to describe the validation of more complex data!

Option validation

As with projections, we can easily validate an option (in other words, conditionally run a validator if a value exists or not). We can use the function Yocaml.Data.Validation.option:

# Data.Validation.option ;;
- : (Data.t -> 'a Data.Validation.validated_value) ->
    Data.t -> 'a option Data.Validation.validated_value
= <fun>

This allows us to lift a standard validator into an option validator. For example, with no value:

# Data.option (Data.int) None |> Data.Validation.(option int) ;;
- : int option Data.Validation.validated_value = Ok None

And with a value:

# Data.option (Data.int) (Some 10) |> Data.Validation.(option int) ;;
- : int option Data.Validation.validated_value = Ok (Some 10)

Warning: if a value exists but does not satisfy the validator, the validation function will fail! The purpose of the option validator is not to short-circuit a validation pipeline. For example, this function will return an error:

# Data.option (Data.int) (Some 10) |> Data.Validation.(option string) ;;
- : string option Data.Validation.validated_value =
Error
 (Data.Validation.Invalid_shape
   {Yocaml.Data.Validation.expected = "strict-string"; given = Data.Int 10})

List validation

As with option, we have a validator that allows us to lift a standard validator into one that operates on lists, and unsurprisingly, this is Yocaml.Data.Validation.list_of.

# Data.Validation.list_of ;;
- : (Data.t -> 'a Data.Validation.validated_value) ->
    Data.t -> 'a list Data.Validation.validated_value
= <fun>

For example, we are going to validate a list of optional integers (and yes, we are anticipating a bit the composition of validators):

# Data.(list_of (option int)) 
     [None; Some 10; Some 12; None; Some 43]
  |> Data.Validation.(list_of (option int)) ;;
- : int option list Data.Validation.validated_value =
Ok [None; Some 10; Some 12; None; Some 43]

If, on the other hand, one or more fields do not satisfy the validators, all errors are reported:

# Data.(list [bool true; null; string "foo"; int 14]) 
  |> Data.Validation.(list_of (option int)) ;; 
- : int option list Data.Validation.validated_value =
Error
 (Data.Validation.Invalid_list
   {Yocaml.Data.Validation.errors =
     (2,
      Data.Validation.Invalid_shape
       {Yocaml.Data.Validation.expected = "int"; given = Data.String "foo"})
     (0,
      Data.Validation.Invalid_shape
       {Yocaml.Data.Validation.expected = "int"; given = Data.Bool true});
    given = [Data.Bool true; Data.Null; Data.String "foo"; Data.Int 14]})

The error is a bit verbose and hard to read, but don’t worry—with YOCaml, it is reported in a more readable way!

Record Validation

As usual, all metadata attached to a document is structured as a record. This is probably the most important section of this tutorial. As with other validations, record analysis is associated with an observation function: Yocaml.Data.Validation.record.

# Data.Validation.record ;;
- : ((string * Data.t) list -> 'a Data.Validation.validated_record) ->
    Data.t -> 'a Data.Validation.validated_value
= <fun>

Its signature is a bit different from those we have seen previously. Indeed, the function takes as an argument another function, which will be responsible for validating the fields of the record:

# Data.Validation.record 
   (fun _ -> failwith "To be done")
   (Data.int 10) ;;
- : 'a Data.Validation.validated_value =
Error
 (Data.Validation.Invalid_shape
   {Yocaml.Data.Validation.expected = "record"; given = Data.Int 10})

The function that must be passed as an argument is provided by the list of fields extracted from the record. Then, we can use the operators let+ and and+ to perform validations on each field of our record. This is called applicative validation, which collects all errors. To analyze the fields of a record, YOCaml provides three essential functions:

All three functions are called in the same way: required fields key validator:

# Data.Validation.required ;;
- : (string * Data.t) list ->
    string ->
    (Data.t -> 'a Data.Validation.validated_value) ->
    'a Data.Validation.validated_record
= <fun>
  • fields is the list of fields of the record (which is the argument passed to the function given to record)
  • key is the field we want to observe
  • validator is a validator (like those we have seen previously) used to validate the field.

A first example

Let's imagine the following type:

type my_point = {
  label: string option
; x: int
; y: int
}

We could imagine the following validation function:

let validate_point data = 
  let open Yocaml.Data.Validation in 
  record (fun fields -> 
    let+ label = optional fields "label" string 
    and+ x = required fields "x" int 
    and+ y = required fields "y" int in 
    (* Here you can do whatever you want. *)
    { label; x; y} ) data

Since the data argument is simply passed to the record function, we can even omit it:

- let validate_point data = 
+ let validate_point =
  let open Yocaml.Data.Validation in 
  record (fun fields -> 
    let+ label = optional fields "label" string 
    and+ x = required fields "x" int 
    and+ y = required fields "y" int in 
    (* Here you can do whatever you want. *)
-   { label; x; y} ) data
+   { label; x; y} )

We can verify that our validation function works correctly by trying to validate multiple pieces of data (and observe that all errors are properly captured):

When the value is not a record:

# validate_point Data.(string "a point") ;;
- : my_point Data.Validation.validated_value =
Error
 (Data.Validation.Invalid_shape
   {Yocaml.Data.Validation.expected = "record";
    given = Data.String "a point"})

When all fields are missing:

# validate_point Data.(record []) ;;
- : my_point Data.Validation.validated_value =
Error
 (Data.Validation.Invalid_record
   {Yocaml.Data.Validation.errors =
     Data.Validation.Missing_field {Yocaml.Data.Validation.field = "x"}
     Data.Validation.Missing_field {Yocaml.Data.Validation.field = "y"};
    given = []})

Since the label field is optional, it is normal that it is not listed as a missing field.

# validate_point Data.(record [
    "x", int 10
  ; "y", int 23
  ; "label", string "my first point"
  ]) ;;
- : my_point Data.Validation.validated_value =
Ok {label = Some "my first point"; x = 10; y = 23}

Record validation is at the core of metadata validation in YOCaml, and as we can see, it allows us to apply validation steps to each field of a record to ensure that every field is independently valid before constructing arbitrary data.

Just like with data projection, the ability to validate records allows us to build more specific validators, such as for sums, pairs, etc.

Tuple validation

As we saw in projections, having record validation unlocks the possibility to describe more complex data types. For example:

  • pair to logically validate pairs
  • triple to logically validate triples
  • quad unsurprisingly, to validate 4-element tuples

Their operation is identical to other validators. For example, to construct a validator for triples of type bool * int * string:

# let my_triple_validator = 
     Data.Validation.(triple bool int string) ;;
val my_triple_validator :
  Data.t -> (bool * int * string) Data.Validation.validated_value = <fun>

Which we can directly test like this:

#  (true, 42, "Hello World")
   |> Data.(triple bool int string)
   |> my_triple_validator ;;
- : (bool * int * string) Data.Validation.validated_value =
Ok (true, 42, "Hello World")

The mechanism is, once again, dual to projection: we validate a record as a pair, and we describe triples as a pair of pairs, etc.

Sum Validation

As with products (pairs, triples, etc.), we can use record validation to describe the validation of sums. More generally, we can use the either validator:

# Data.Validation.either ;;
- : (Data.t -> 'a Data.Validation.validated_value) ->
    (Data.t -> 'b Data.Validation.validated_value) ->
    Data.t -> ('a, 'b) Either.t Data.Validation.validated_value
= <fun>

And just like with projections, there is a more generic validator, sum, which allows enumerating validators for the constructors of an arbitrary sum. For example:

# let my_either valid_left valid_right = 
    Data.Validation.sum [
      "left", valid_left
    ; "right", valid_right
    ] ;;
val my_either :
  (Data.t -> 'a Data.Validation.validated_value) ->
  (Data.t -> 'a Data.Validation.validated_value) ->
  Data.t -> 'a Data.Validation.validated_value = <fun>

Mapping

We can see that the type of my_either is noticeably different from that of either. Indeed, either is a ('a, 'b) Either.t Data.validable, while my_either is a 'a Data.validable. The two validators provided to it must return data of the same type. To obtain a validator of type ('a, 'b) Either.t Data.validable, we want to wrap the results of the two validators in Left and Right, respectively.

This operation is called map and can be invoked using the infix operator $. Its type is ('a -> ('b, 'c) Result.t) -> ('b -> 'd) -> 'a -> ('d, 'c) Result.t, in other words: v $ f constructs a validator (i.e., a function) that validates using v and, if the validation succeeds, applies f to the validated result. We can therefore rewrite my_either to have the same behavior as the either validator like this:

# let my_either valid_left valid_right = 
    let open Yocaml.Data.Validation in
    sum [
      "left", valid_left $ (fun x -> Either.Left x)
    ; "right", valid_right $ (fun x -> Either.Right x)
    ] ;;
val my_either :
  (Data.t -> ('a, Data.Validation.value_error) result) ->
  (Data.t -> ('b, Data.Validation.value_error) result) ->
  Data.t -> ('a, 'b) Either.t Data.Validation.validated_value = <fun>

The module Yocaml.Data.Validation exposes several small utility operators like $, which we will see later.

There are still many things to cover before we are done with validation; however, we have seen enough to implement the validation part of the two modules we looked at in the section dedicated to data projection.

A Real World Example

Let's go back to our two modules, Gender and User, and add the validation part. You’ll see, the YOCaml API should be intuitive.

Gender Validation

As a reminder, here is how was our Gender module:

module Gender = struct 
  type t =
    | Male 
    | Female 
    | Other of string
    
  let to_data = 
    let open Yocaml.Data in 
    sum (function 
      | Male    -> "male",   null
      | Female  -> "female", null
      | Other s -> "other", string s 
    )
end

Now, we are going to add a from_data function, whose purpose will logically be to validate a sum:

 module Gender = struct 
   type t =
     | Male 
     | Female 
     | Other of string
    
   let to_data = 
     let open Yocaml.Data in 
     sum (function 
       | Male    -> "male",   null
       | Female  -> "female", null
       | Other s -> "other", string s 
     )
+
+  let from_data = 
+    let open Yocaml.Data.Validation in 
+    sum [
+      "male", null $ (fun () -> Male)
+    ; "female", null $ (fun () -> Female)
+    ; "other", string $ (fun g -> Other g)
+    ]
 end

As we can see, the validation function (from_data) is analogous to the projection function (to_data).

module Gender = struct 
  type t =
    | Male 
    | Female 
    | Other of string
    
  let to_data = 
    let open Yocaml.Data in 
    sum (function 
      | Male    -> "male",   null
      | Female  -> "female", null
      | Other s -> "other", string s 
    )
    
  let from_data = 
    let open Yocaml.Data.Validation in 
    sum [
      "male", null $ (fun () -> Male)
    ; "female", null $ (fun () -> Female)
    ; "other", string $ (fun g -> Other g)
    ]
end

We can now test the validation round-trip:

# Gender.from_data (Gender.(to_data Female)) ;;
- : Gender.t Data.Validation.validated_value = Ok Gender.Female
# Gender.from_data (Gender.(to_data (Other "an other gender"))) ;;
- : Gender.t Data.Validation.validated_value =
Ok (Gender.Other "an other gender")

Now that we can validate genders, we can move on to validating a user!

User Validation

As a reminder, here is how was our User module:

module User = struct 
  type t = {
    username: string
  ; firstname: string option
  ; lastname: string option
  ; age: int
  ; gender: Gender.t
  ; identities: t list
  }
  
  let make 
    ?firstname 
    ?lastname 
    ?(identities = []) ~age ~gender username = {
      username
    ; firstname
    ; lastname
    ; age
    ; gender
    ; identities
    }
    
  let rec to_data 
    { username; firstname; lastname; 
      age; gender; identities } 
  = 
    let open Yocaml.Data in
    record [
      "username",   string username
    ; "firstname",  option string firstname
    ; "lastname",   option string lastname
    ; "age",        int age
    ; "gender",     into (module Gender) gender
    ; "identities", list_of to_data identities
    ] 
end

As with the Gender module, we will add a from_data function, which will need to validate a record. We saw the into function, which allows using a module as a projection tool. There is also the from function, which allows using a module to validate a field.

 module User = struct 
   type t = {
     username: string
   ; firstname: string option
   ; lastname: string option
   ; age: int
   ; gender: Gender.t
   ; identities: t list
   }
   
   let make 
     ?firstname 
     ?lastname 
     ?(identities = []) ~age ~gender username = {
       username
     ; firstname
     ; lastname
     ; age
     ; gender
     ; identities
     }
    
   let rec to_data 
     { username; firstname; lastname; 
       age; gender; identities } 
   = 
     let open Yocaml.Data in
     record [
       "username",   string username
     ; "firstname",  option string firstname
     ; "lastname",   option string lastname
     ; "age",        int age
     ; "gender",     into (module Gender) gender
     ; "identities", list_of to_data identities
     ]
    
+  let rec from_data data = 
+    let open Yocaml.Data.Validation in 
+    record (fun fields -> 
+      let+ username = required fields "username" string 
+      and+ firstname = optional fields "firstname" string 
+      and+ lastname = optional fields "lastname" string 
+      and+ age = required fields "age" int 
+      and+ gender = 
+        required fields "gender" (from (module Gender))
+      and+ identities = 
+        optional fields "identities" (list_of from_data) in 
+      make username ?firstname ?lastname 
+           ~age ~gender ?identities
+    ) data
 end
module User = struct 
  type t = {
    username: string
  ; firstname: string option
  ; lastname: string option
  ; age: int
  ; gender: Gender.t
  ; identities: t list
  }
  
  let make 
    ?firstname 
    ?lastname 
    ?(identities = []) ~age ~gender username = {
      username
    ; firstname
    ; lastname
    ; age
    ; gender
    ; identities
    }
    
  let rec to_data 
    { username; firstname; lastname; 
      age; gender; identities } 
  = 
    let open Yocaml.Data in
    record [
      "username",   string username
    ; "firstname",  option string firstname
    ; "lastname",   option string lastname
    ; "age",        int age
    ; "gender",     into (module Gender) gender
    ; "identities", list_of to_data identities
    ]
    
  let rec from_data data = 
    let open Yocaml.Data.Validation in 
    record (fun fields -> 
      let+ username = required fields "username" string 
      and+ firstname = optional fields "firstname" string 
      and+ lastname = optional fields "lastname" string 
      and+ age = required fields "age" int 
      and+ gender = required fields "gender" (from (module Gender))
      and+ identities = optional fields "identities" (list_of from_data) in 
      make username ?firstname ?lastname ~age ~gender ?identities
    ) data
end

let xvw1 = User.make ~age:36 ~gender:Gender.Male "xvw"
let xvw2 = 
  User.make
    ~identities:[xvw1; xvw1]
    ~firstname:"Xavier"
    ~lastname:"Van de Woestyne"
    ~age:36
    ~gender:(Gender.Other "male")
    "xvw2"

Note a small subtlety: the data argument is passed rather than eliminated, essentially because the from_data function is recursive.

And just like with genders, we can attempt a round-trip with xvw2, which we created during the data projections tutorial:

# xvw2 |> User.to_data |> User.from_data ;;
- : User.t Data.Validation.validated_value =
Ok
 {User.username = "xvw2"; firstname = Some "Xavier";
  lastname = Some "Van de Woestyne"; age = 36; gender = Gender.Other "male";
  identities =
   [{User.username = "xvw"; firstname = None; lastname = None; age = 36;
     gender = Gender.Male; identities = []};
    {User.username = "xvw"; firstname = None; lastname = None; age = 36;
     gender = Gender.Male; identities = []}]}

At this point, we have the opportunity to explore how to build complex validation schemas. However, a few minor frustrations become apparent! Indeed, our examples so far seem to only allow validation of a limited set of primitive types. For instance, how can we ensure that age is always positive?

In the next section, we will see how to compose and build validators to capture as many validation rules as possible.