OCaml is (Almost) Awesome!
For the last few days, I have been writing a small utility program called Panther. It’s a program that wraps your existing text editor into a tool for encrypting and decrypting files, thus serving as a password manager, a journaling system, and an inter-device clipboard. The relevant part, however, is that Panther is written OCaml, and I’ve learnt a few things about OCaml now that I’d like to document for posterity.
My broad impression about OCaml is that it seems to be a robust language, save for some small parts that are clunky. The language feels a little like Fortran, in that it is small but still expressive. In many ways, the language features are intuitive and the error message are concise and direct, which is hugely helpful in fixing problems quickly. I (mostly) like it so much that I wish I could use it at work, where I (occasionally) use Haskell.
But the language also has a few pitfalls, and after many days of writing code and tests at breakneck speed, my work has come to screeching halt because of one such pitfall. (TLDR: I naively mixed Lwt I/O functions with non-Lwt I/O functions, and it’s causing some bizarre problems.) Specifically, concurrency (let alone parallelism) is somewhat tricky to implement, and there’s not enough consistency in how errors are handled — some implementors prefer exceptions, others prefer the result
type (which is similar to Haskell’s Either
type). Similarly, sometimes there is more than one way to perform the same thing, such as file I/O.
Nevertheless, writing code in OCaml is generally fun, and I’d like to record a few of the things that made it fun so that I can come back to them later.
Code Structure Using Dune
Dune is a build system for OCaml, and given its (seemingly) widespread use in the OCaml community, it is a breeze to setup and use with many libraries and preprocessors. I’ve settled on the following code structure, which is similar to how Dune itself is organized.
bin/ |-- dune <-- rules for building the application |-- myapp.ml <-- entry point to the application lib/ |-- dune <-- build rules for library, useful for writing tests |-- module1.ml |-- module2.ml test/ |-- dune <-- rules for building and executing tests |-- test1.ml <-- each test file is a standalone executable |-- test2.ml <-- each test file is a standalone executable |-- test3.ml <-- each test file is a standalone executable
The dune
files are key, and thankfully they’re concise enough to show them here.
$ cat bin/dune (executable (name myapp) (libraries lib)) $ cat lib/dune (library (name lib) (libraries names-of-external-libraries-separated-by-space)) $ cat test/dune (tests (names test1 test2 test3) (libraries base lib))
Dune offers a few handy built-in targets. dune build
builds all targets, dune exec myapp arg1 arg2
runs the binary with the two arguments, and dune runtest
executes all test binaries; if any test binary exits with a non-zero exit code, the test is considered to have failed.
Dune also works well with ocamlformat
to auto-format your code and test files. Create an empty ocamlformat
file at the root, after which dune build @fmt
shows the suggested changes and dune promote
applies the suggested changes.
Error Handling and ppx_let
Since OCaml supports sum types, the languge offers a result
type that can be used to encode whether a function succeeded or failed, and this resulting status can be pattern-matched using match
statements. Here is an example.
let div (x : int) (y : int) : (int, string) result = match y = 0 with | true -> Error "division by zero" | false -> Ok (x / y) let consumer (a : int) (b : int) : unit = match div a b with | Ok result -> Printf.printf "Your answer is %d.\n" result | Error message -> Printf.printf "Oops! We have a problem: %s.\n" message
Compared to many other languages that use return codes for signifying errors, the result
type in OCaml not only makes code more concise and easier to reason about, but also more robust because the programmer is forced to deal with each potential outcome while also cleanly separating those outcomes.
However, code can quickly become verbose as you chain multiple result
-producing functions, like below.
let foo (x : int) : (int, string) result = (* Call `bar` and check whether there was an error. *) match bar x with | Error message -> Error message | Ok bar_out -> begin (* Call `hoo` and check whether there was an error. *) match hoo bar_out with | Error message -> Error message | Ok hoo_out -> begin (* Call `goo` and check whether there was an error. *) match goo hoo_out with | Error message -> Error message | Ok goo_out -> Ok (goo_out + 1) end end
To eliminate this verbosity, the ppx_let
preprocessor / extension provides monadic and applicative let
bindings. To set it up, install the base
library using opam install base
add preprocess (pps ppx_let)
to the appropriate target and add base
as a library dependency inside the dune
file, and rewrite the above code as follows.
let foo (x : int) : (int, string) result = let open Base.Result.Let_syntax in (* Call `bar` and check whether there was an error. *) let%bind bar_out = bar x in (* Call `hoo` and check whether there was an error. *) let%bind hoo_out = hoo bar_out in (* Call `goo` and check whether there was an error. *) let%bind goo_out = goo hoo_out in Ok (goo_out + 1)
Concurrency
I really, really wish I knew this before I started writing OCaml code, because it seems like I am going to have to rewrite my entire application to fix the problem I never knew I had.
Perhaps I’m being too harsh, but OCaml isn’t the best language for implementing concurrent objects because of the design of the runtime. In lieu of this, the Lwt library provides a so-called ‘promise’ type, akin to the functionality offered by Javascript’s async
and await
keywords. Lwt also provides a library of functions to use in place of functions from the OCaml standard library. See the Lwt manual for details.
More importantly, try not to mix Lwt I/O code with non-Lwt I/O code. I am in the midst of debugging some bizarre behaviour, wherein the contents of the file vanish after I open the input channel for that file. It's likely some race condition in my code, but I imagine that if I had used Lwt consistently throughout my code, I would not have landed into this hairy situation.
Finally, you can use the ppx_let
extension from earlier to simplify the use of Lwt code too! First, add the following code (copied from here) as a module.
module Let_syntax = struct let return = Lwt.return let ( >>= ) = Lwt.Infix.( >>= ) let ( >>| ) = Lwt.Infix.( >|= ) module Let_syntax = struct let bind m ~f = Lwt.bind m f end end
And use it in your code as follows.
let open Lwt_ppx_let.Let_syntax in let%bind _ = Lwt_io.eprintf "eventually ..\n" in ...
Final Thoughts
Barring a few hiccups, writing code in OCaml is a lot of fun! As with all functional programming languages, my code is concise, but I am especially glad that I could turn some run-time checks into compile-time checks, thus increasing my confidence in the code.
I wonder what factors affect the popularity of a language. It surprises me that Haskell is more popular than OCaml, especially given the difficulty of reasoning about Haskell’s lazy-by-default code and the scary error messages produced by GHC, but there's a lot that I don’t know about these languages. I wish more people would try OCaml.