Back to Home Back to Notes

OCaml is (Almost) Awesome!

I wonder why OCaml isn’t at least as popular as Haskell.
5th July, 2020

For the last few days, I have been writing a small utility program called Panther. It’s a program that wraps your existing text editor into a tool for encrypting and decrypting files, thus serving as a password manager, a journaling system, and an inter-device clipboard. The relevant part, however, is that Panther is written OCaml, and I’ve learnt a few things about OCaml now that I’d like to document for posterity.

My broad impression about OCaml is that it seems to be a robust language, save for some small parts that are clunky. The language feels a little like Fortran, in that it is small but still expressive. In many ways, the language features are intuitive and the error message are concise and direct, which is hugely helpful in fixing problems quickly. I (mostly) like it so much that I wish I could use it at work, where I (occasionally) use Haskell.

But the language also has a few pitfalls, and after many days of writing code and tests at breakneck speed, my work has come to screeching halt because of one such pitfall. (TLDR: I naively mixed Lwt I/O functions with non-Lwt I/O functions, and it’s causing some bizarre problems.) Specifically, concurrency (let alone parallelism) is somewhat tricky to implement, and there’s not enough consistency in how errors are handled — some implementors prefer exceptions, others prefer the result type (which is similar to Haskell’s Either type). Similarly, sometimes there is more than one way to perform the same thing, such as file I/O.

Nevertheless, writing code in OCaml is generally fun, and I’d like to record a few of the things that made it fun so that I can come back to them later.

Code Structure Using Dune

Dune is a build system for OCaml, and given its (seemingly) widespread use in the OCaml community, it is a breeze to setup and use with many libraries and preprocessors. I’ve settled on the following code structure, which is similar to how Dune itself is organized.

bin/
|-- dune      <-- rules for building the application
|-- myapp.ml  <-- entry point to the application

lib/
|-- dune      <-- build rules for library, useful for writing tests
|-- module1.ml
|-- module2.ml

test/
|-- dune      <-- rules for building and executing tests
|-- test1.ml  <-- each test file is a standalone executable
|-- test2.ml  <-- each test file is a standalone executable
|-- test3.ml  <-- each test file is a standalone executable

The dune files are key, and thankfully they’re concise enough to show them here.

$ cat bin/dune
(executable
 (name myapp)
 (libraries lib))

$ cat lib/dune
(library
 (name lib)
 (libraries names-of-external-libraries-separated-by-space))

$ cat test/dune
(tests
 (names test1 test2 test3)
 (libraries base lib))

Dune offers a few handy built-in targets. dune build builds all targets, dune exec myapp arg1 arg2 runs the binary with the two arguments, and dune runtest executes all test binaries; if any test binary exits with a non-zero exit code, the test is considered to have failed.

Dune also works well with ocamlformat to auto-format your code and test files. Create an empty ocamlformat file at the root, after which dune build @fmt shows the suggested changes and dune promote applies the suggested changes.

Error Handling and ppx_let

Since OCaml supports sum types, the languge offers a result type that can be used to encode whether a function succeeded or failed, and this resulting status can be pattern-matched using match statements. Here is an example.

let div (x : int) (y : int) : (int, string) result =
  match y = 0 with
  | true -> Error "division by zero"
  | false -> Ok (x / y)

let consumer (a : int) (b : int) : unit =
  match div a b with
  | Ok result -> Printf.printf "Your answer is %d.\n" result
  | Error message -> Printf.printf "Oops! We have a problem: %s.\n" message

Compared to many other languages that use return codes for signifying errors, the result type in OCaml not only makes code more concise and easier to reason about, but also more robust because the programmer is forced to deal with each potential outcome while also cleanly separating those outcomes.

However, code can quickly become verbose as you chain multiple result-producing functions, like below.

let foo (x : int) : (int, string) result =
  (* Call `bar` and check whether there was an error. *)
  match bar x with
  | Error message -> Error message
  | Ok bar_out ->
    begin

      (* Call `hoo` and check whether there was an error. *)
      match hoo bar_out with
      | Error message -> Error message
      | Ok hoo_out ->
        begin

          (* Call `goo` and check whether there was an error. *)
          match goo hoo_out with
          | Error message -> Error message
          | Ok goo_out -> Ok (goo_out + 1)
        end
    end

To eliminate this verbosity, the ppx_let preprocessor / extension provides monadic and applicative let bindings. To set it up, install the base library using opam install base add preprocess (pps ppx_let) to the appropriate target and add base as a library dependency inside the dune file, and rewrite the above code as follows.

let foo (x : int) : (int, string) result =
  let open Base.Result.Let_syntax in

  (* Call `bar` and check whether there was an error. *)
  let%bind bar_out = bar x in

  (* Call `hoo` and check whether there was an error. *)
  let%bind hoo_out = hoo bar_out in

  (* Call `goo` and check whether there was an error. *)
  let%bind goo_out = goo hoo_out in

  Ok (goo_out + 1)

Concurrency

I really, really wish I knew this before I started writing OCaml code, because it seems like I am going to have to rewrite my entire application to fix the problem I never knew I had.

Perhaps I’m being too harsh, but OCaml isn’t the best language for implementing concurrent objects because of the design of the runtime. In lieu of this, the Lwt library provides a so-called ‘promise’ type, akin to the functionality offered by Javascript’s async and await keywords. Lwt also provides a library of functions to use in place of functions from the OCaml standard library. See the Lwt manual for details.

More importantly, try not to mix Lwt I/O code with non-Lwt I/O code. I am in the midst of debugging some bizarre behaviour, wherein the contents of the file vanish after I open the input channel for that file. It's likely some race condition in my code, but I imagine that if I had used Lwt consistently throughout my code, I would not have landed into this hairy situation.

Finally, you can use the ppx_let extension from earlier to simplify the use of Lwt code too! First, add the following code (copied from here) as a module.

module Let_syntax = struct
  let return = Lwt.return
  let ( >>= ) = Lwt.Infix.( >>= )
  let ( >>| ) = Lwt.Infix.( >|= )

  module Let_syntax = struct
    let bind m ~f = Lwt.bind m f
  end
end

And use it in your code as follows.

let open Lwt_ppx_let.Let_syntax in
let%bind _ = Lwt_io.eprintf "eventually ..\n" in
...

Final Thoughts

Barring a few hiccups, writing code in OCaml is a lot of fun! As with all functional programming languages, my code is concise, but I am especially glad that I could turn some run-time checks into compile-time checks, thus increasing my confidence in the code.

I wonder what factors affect the popularity of a language. It surprises me that Haskell is more popular than OCaml, especially given the difficulty of reasoning about Haskell’s lazy-by-default code and the scary error messages produced by GHC, but there's a lot that I don’t know about these languages. I wish more people would try OCaml.

Back to Notes