Thursday, 26 July 2018

Enforcing rules at compile-time: an example

I recently solved a problem I had in Scala. I was able to solve the problem quickly and easily, where as I remember when I was less experienced with Scala, solving a problem like this was difficult. It would take a lot of thought and effort, and even if I came away with a solution that technically worked, it often felt off and wasn't pleasant to use.

I thought it'd be nice to share this example of my current approach to writing good Scala. I hope you find it useful.

Premise

I'm the author and maintainer of scalajs-react which is a Scala.JS library that provides a type-safe interface to React, a JavaScript UI library.

In scalajs-react, the primary way to create a UI component is via "the builder pattern". The builder API is separated into 4 steps so that you first specify the prerequisites in a deliberate order, then at the final step you can specify a bunch of optional lifecycle methods. Usage looks like this:

val MyComponent =
  ScalaComponent.builder[Props]("MyComponent")
    .stateless               // step 1
    .noBackend               // step 2
    .render(...)             // step 3
    .componentWillMount(...) // step 4 (optional)
    .componentDidMount(...)  // step 4 (optional)
    .build                   // step 4

Steps 1 and 2 are optional and are made so in the API via implicits. A minimal example looks like this:

val MyComponent =
  ScalaComponent.builder[Props]("MyComponent")
    .render(...)             // step 3
    .build                   // step 4

Multiple specifications of the same lifecycle method compose. For example, this is valid and will result in all three procedures executing at the .componentDidMount lifecycle event.

val MyComponent =
  ScalaComponent.builder[Props]("MyComponent")
    .render(...)             // step 3
    .componentDidMount(...)  // step 4 (optional)
    .componentDidMount(...)  // step 4 (optional)
    .componentDidMount(...)  // step 4 (optional)
    .build                   // step 4

Problem

A recent version of React introduced some changes to lifecycle methods and so I was updating scalajs-react the other day.

The two changes relevant to this article are:

A new lifecycle method getSnapshotBeforeUpdate is added, from which you return any arbitrary value called a snapshot.
The lifecycle method componentDidUpdate gets a new parameter which is the value from getSnapshotBeforeUpdate above.

Goals

Let's break down the new React changes into a few rules:

The return type of getSnapshotBeforeUpdate needs to match the type of the new componentDidUpdate param.
In the case that getSnapshotBeforeUpdate isn't specified, the type of the new componentDidUpdate param will be Unit (which is undefined in JS).
In order to support composition of multiple getSnapshotBeforeUpdate functions, a means of return value composition (most naturally a Semigroup typeclass) is required.

Regarding (3),

Semigroup would only be required for subsequent calls, not the first, which adds a little complication for values for which Semigroup isn't defined.
scalajs-react doesn't have external dependencies and I don't want to add a Semigroup typeclass to the public API.
I'd be surprised if anyone ever wanted to supply multiple getSnapshotBeforeUpdate functions anyway; if so, one can do it oneself.

Therefore I've decided to just not support multiple getSnapshotBeforeUpdate functions. We don't lose parity with React JS anyway.

Let's break down (1) and (2) into more concrete rules:

getSnapshotBeforeUpdate can only be called 0 or 1 times
getSnapshotBeforeUpdate sets the Snapshot type
componentDidUpdate receives the Snapshot type
When componentDidUpdate is called and the Snapshot type is undefined, it's set to Unit
getSnapshotBeforeUpdate cannot occur after componentDidUpdate because it will change the Snapshot type which would invalidate the previous componentDidUpdate where the Snapshot was Unit (and a fn to Unit would be pointless here).

Before we continue it's time to emphasise: type-safety is very important to me. One of the biggest features of scalajs-react is its strong type-safety. (As much is reasonable in Scala) if it compiles, I want confidence that it works and is correct.

I want to encode the above rules into the types so that end-users don't have to read any documentation, have any internal knowledge of these rules, or experience any runtime exceptions; the compiler should just enforce everything we discussed such that violations wont even compile.

Rejected solution

Probably the first solution that earlier-me would've reached for, is to create a new step in the builder API like this:

  [STEP 3]                                          [STEP 5]

  .render   ----(implicit with Snapshot=Unit)--->   last step
     \                                              /
      \                                            /
       --------> .getSnapshotBeforeUpdate ------> /

                         [STEP 4]

There are problems with such a solution:

It doesn't scale. If React adds more constraints in future it will become harder to keep a fluent API without introducing unnecessary usage constraints.
External component config fns (LastStep => LastStep) need to be able to configure any part of the lifecycle.
ScalaComponent.builder.static is an example where it returns a half-built component allowing further configuration. It needs to set the shouldComponentUpdate method which would skip step 4 in this approach, or else require that we add nearly everything to both steps (yuk).

Basic solution

Consider this pseudo-code:

var snapshotType = None

def getSnapshotBeforeUpdate[A](f: X => A) = {
  snapshotType match {
    case None    => snapshotType = Some(A)
    case Some(_) => error("SnapshotType already defined!")
  }
  getSnapshotBeforeUpdate = f
}

def componentDidUpdate(f) = {
  snapshotType = Some(snapshotType.getOrElse(Unit))
  componentDidUpdate.append(f)
}

We could track this at the term-level at runtime using Option (and typetags). It's not very type-safe though. We can still keep the same approach and logic, we just need to lift it up into the type-level so that it runs at compile-time instead of at runtime. To do so we'll use a type-level encoding of Option.

This is how you encode Option at the type-level in Scala; first you'll see a term-level equivalent for contrast:

object TermLevel {

  sealed trait Option[+A] {
    def getOrElse[B >: A](default: => B): B
  }
  final case class Some[+A](value: A) extends Option[A] {
    override def getOrElse[B >: A](default: => B) = value
  }
  case object None extends Option[Nothing] {
    override def getOrElse[B >: Nothing](default: => B) = default
  }

  // Example usage
  def value: Option[Any] => Any = _.getOrElse(())
}

// ===============================================================

object TypeLevel {

  sealed trait TOption {
    type GetOrElse[B]
  }
  sealed trait TSome[A] extends TOption {
    override final type GetOrElse[B] = A
  }
  sealed trait TNone extends TOption {
    override final type GetOrElse[B] = B
  }

  // Example usage
  type Value[T <: TOption] = T#GetOrElse[Unit]
}

Ok, now let's code up a skeleton that will enforce our rules at compile-time:

final class Builder[SnapshotType <: TOption] {
  type SnapshotValue = SnapshotType#GetOrElse[Unit]

  def getSnapshotBeforeUpdate[A](f: ... => A)
                                (implicit ev: SnapshotType =:= TNone)
                                : Builder[TSome[A]]

  def componentDidUpdate(f: SnapshotValue => ...)
                        : Builder[TSome[SnapshotValue]]
}

Let's compare this to our pseudo-code:

The snapshotType var is now a type parameter of Builder.
getSnapshotBeforeUpdate would check snapshotType is None and set it to Some(A). Now we ask for implicit proof that SnapshotType =:= TNone, set in the return type we can see return a new Builder with SnapshotType set to TSome
getSnapshotBeforeUpdate throw an error when snapshotType is Some(_). Now the compiler will throw an implicit not found error at compile-time when SnapshotType =:= TSome[_].
Where as before in componentDidUpdate we had snapshotType = Some(snapshotType.getOrElse(Unit)), we now have the equivalent in that the return type is Builder[TSome[SnapshotValue]] where SnapshotValue = SnapshotType#GetOrElse[Unit].

Nice solution

A while back I would've been satisfied with the above solution; it works right? If you know all of the rules, sure, but the error message to users is going to be pretty confusing and probably even lead them to think there's some kind of bug in the library. This is what an error looks like at the moment:

[error] ScalaComponentTest.scala:189: Cannot prove that japgolly.scalajs.react.example.TSome[A] =:= japgolly.scalajs.react.example.TNone.
[error]         .getSnapshotBeforeUpdate(???)
[error]                                 ^
[error] one error found

Nice UX in Scala is a bit of an art; in this case we'll do away with the generic TOption and create a custom construct for this one specific problem.

First, the new shape. Because this isn't generic anymore we no longer need the inner type member to be a type constructor which makes usage nicer too (i.e. T#Value instead of T#GetOrElse[Unit]):

sealed trait UpdateSnapshot {
  type Value
}

object UpdateSnapshot {
  sealed trait None extends UpdateSnapshot {
    override final type Value = Unit
  }

  sealed trait Some[A] extends UpdateSnapshot {
    override final type Value = A
  }
}

Easy enough. Now to improve the UX on failure. First we change the (implicit ev: SnapshotType =:= TNone) to (implicit ev: UpdateSnapshot.SafetyProof[U]) and create:

object UpdateSnapshot {

  @implicitNotFound("You can only specify getSnapshotBeforeUpdate once, and it has to be before " +
    "you specify componentDidUpdate, otherwise the snapshot type could become inconsistent.")
  sealed trait SafetyProof[U <: UpdateSnapshot]

  implicit def safetyProof[U <: UpdateSnapshot](implicit ev: U =:= UpdateSnapshot.None): SafetyProof[U] =
    null.asInstanceOf[SafetyProof[U]]
}

The (implicit ev: U =:= UpdateSnapshot.None) is still part of the solution, but this time it's indirect. It's a dependency on the availability of implicit SafetyProof. Thus the logic is still the same, just users will never see it as an error message.

The @implicitNotFound annotation on SafetyProof is the pudding. It will cause our custom error message to be displayed as a compilation error when someone breaks the rules.

Using null.asInstanceOf[SafetyProof[U]] is a performance optimisation; new SafetyProof[U]{} is fine too but I'd prefer to avoid the allocation and more importantly, by never actually creating or using SafetyProof it can be completely elided from Scala.JS output which means a smaller download for your webapp's end-users.

Finally, our new builder excerpt looks like this:

final class Builder[U <: UpdateSnapshot] {
  type SnapshotValue = U#Value

  def getSnapshotBeforeUpdate[A](f: ... => A)
                                (implicit ev: UpdateSnapshot.SafetyProof[U])
                                : Builder[UpdateSnapshot.Some[A]]

  def componentDidUpdate(f: SnapshotValue => ...)
                        : Builder[UpdateSnapshot.Some[SnapshotValue]]
}

And let's look at what errors look like now:

[error] ScalaComponentTest.scala:189: You can only specify getSnapshotBeforeUpdate once, and it has to be before you specify componentDidUpdate, otherwise the snapshot type could become inconsistent.
[error]         .getSnapshotBeforeUpdate(???)
[error]                                 ^
[error] one error found

Done

That's all. I hope you've enjoyed. If you're interested, the full patch that went into scalajs-react is here:

https://github.com/japgolly/scalajs-react/commit/ee81acf12c1039997460a7cac3d759fda6577533

Wednesday, 13 December 2017

Practical Awesome Recursion - Ch 02: Catamorphisms

Recursion schemes are awesome, and practical. This is chapter 2 in a series of blog posts about recursion schemes. This series uses Scala, and will focus on usage and applicability. It will be scarce in theory, and abundant in examples. The theory is valuable and fascinating, but I often find that knowing the theory alone is only half understanding. The other half of understanding comes from, and enables, practical application. I recommend you bounce back and forth between this series and theory. The internet is rich with blogs and videos on theory that explain it better than I would, so use those awesome resources.

Before you start...

If you don't know what I mean by any of the following, read or skim chapter 1 of this series.

Fix[_]
IntList / IntListF[_]
BinaryTree / BinaryTreeF[_]

Also, this series doesn't depend on, or emphasise, Matryoshka or any other library. If you'd like to understand why, I've explained here in the FAQ.

The Catamorphism

What is a catamorphism? It can be answered from a few perspectives.

What

It's often referred to as a fold over your data structure. Examples that sum or count values are very common. Conceptually speaking, an example using Scala stdlib would be List(1, 3, 7).foldLeft(0)(_ + _) == 11. As you'll see, folds and catamorphisms are capable of much more than calculating numbers.

The definition of catamorphism is:

def cata[F[_]: Functor, A, B](fAlgebra: F[A] => A)(f: Fix[F]): A =
  fAlgebra(f.unfix.map(cata(fAlgebra)))

The first argument fAlgebra is so-called because F[A] => A is known as an F-algebra. What it is, is your folding logic. You implement your folding logic as a function that processes a single level/layer of your structure without recursion. When I say level/layer, I mean in terms of recursive depth, example:

IntList
=======

This is equivalent to List(1, 2, 3, 4, 5).

IntCons(1, _) ← Level 1
IntCons(2, _) ← Level 2
IntCons(3, _) ← Level 3
IntCons(4, _) ← Level 4
IntCons(5, _) ← Level 5
IntNil        ← Level 6

BinaryTree
==========

      Branch(_, "root", _)          ← Level 1
              |
  +-----------+-----------+
  |                       |
Leaf      Branch(_, "right", _)     ← Level 2
                 |           |
               Leaf         Leaf    ← Level 3

How

Look at the definition of cata and at the shape/type of the f-algebra. There's an interesting note about parametricity. The only means cata has of producing an A is to call fAlgebra... which requires an F[A]... so how does it put an A in the F[_] to call the function, if it can't produce As otherwise? Remember that the hole in F[_] represents the recursive case? In non-recursive cases (eg. Nil in a cons list) the type is a phantom-type, completely unused, or covariant and Nothing. For example:

case class Eg[T](int: Int)

def changeIt[X, Y](eg: Eg[X]): Eg[Y] =
  Eg(eg.int)

When a type variable is unused you can replace it with anything you want, which is exactly what happens in Functor[F].

It's also important to understand the order in which things happen. Catamorphisms:

start at the root (their input, the f: Fix[F])
(computationally) move to the leaves
calculate their way back to the root

Which means your folding logic is going to start executing against all the leaves first, then their parents, then their parents, etc.

Optimisation

I did say that this is a practical series. This is a good a place as any to mention that this cata definition, while correct, is inefficient.

Every time it recurses it has to create the same functions with the same logic over and over again. We can make it more efficient by creating what we need once and reusing it.

def cata2[F[_], A, B](algebra: F[A] => A)(f: Fix[F])(implicit F: Functor[F]): A = {
  var self: Fix[F] => A = null
  self = f => algebra(F.map(f.unfix)(self))
  self(f)
}

In this new definition, we create the recursive function once and reuse it. Despite the null, it's 100% safe because we (provably) set it before it's used.

Let's measure it. How fast does it perform in comparison to the original? 105%? 110%?

[info] Benchmark         (size)  Mode  Cnt    Score   Error  Units
[info] RecursionBM.cata      10  avgt   10    0.274 ± 0.003  us/op
[info] RecursionBM.cata2     10  avgt   10    0.146 ± 0.001  us/op
[info] RecursionBM.cata     100  avgt   10    2.323 ± 0.020  us/op
[info] RecursionBM.cata2    100  avgt   10    1.555 ± 0.006  us/op
[info] RecursionBM.cata    1000  avgt   10   31.111 ± 0.720  us/op
[info] RecursionBM.cata2   1000  avgt   10   16.067 ± 0.187  us/op
[info] RecursionBM.cata   10000  avgt   10  326.443 ± 9.054  us/op
[info] RecursionBM.cata2  10000  avgt   10  165.470 ± 1.617  us/op

200%, it's twice as fast! Not bad for a tiny bit of one-off, hidden boilerplate.

Side-note: I ran the benchmark on a i7-6700HQ and all results are under 1ms, even at structure size of 1x10000 (length x depth), which means that either implementation is going to be fine on a fast CPU in a non-high-throughput solution. It'd be interesting to know what the measurements would be in prod, on real GCP/AWS VMs; the savings of the optimisation would be more significant because the VCPUs are slower.

F-Algebras

This isn't necessary but I'm also going to add a type alias.

type FAlgebra[F[_], A] = F[A] => A

and tweak the catamorphism definition to

def cata2[F[_], A, B](algebra: FAlgebra[F, A])(f: Fix[F])(implicit F: Functor[F]): A = {
  var self: Fix[F] => A = null
  self = f => algebra(F.map(f.unfix)(self))
  self(f)
}

Type aliases don't exist at runtime, the compilation process dealiases them completely. You can also use either definition interchangably.

def useAlias[F[_], A](f: F[A] => A): FAlgebra[F, A] =
  f

def removeAlias[F[_], A](f: FAlgebra[F, A]): F[A] => A =
  f

There are two advantages to having and using a type alias.

Readability. The A and the F are separated; the A doesn't repeat; it clarifies intent the more you get used to recursion schemes.
There are cases in which it helps type inference.

Simple Examples

Let's start with some basic examples: the usual blah -> Int stuff:

List Sum

Let's sum a list:

val listSum: FAlgebra[IntListF, Int] = {
  case IntListF.Cons(h, t) => h + t
  case IntListF.Nil        => 0
}

How does it work?

val listSumVerbose: IntListF[Int] => Int = {
  case IntListF.Cons(h, t) => h + t
  //                 |  |
  // Int by definition  |
  //                    Sum of tail (Int)
  case IntListF.Nil => 0
}

Notice this is an algebra, to actually use it in you need to call cata:

def sumThisListPlease(list: IntList): Int =
  cata(listSum)(list)

For reasons that will become obvious later, when using this stuff in your own project or libraries, the algebra itself is the unit that you'll be exposing most often. Instead of creating functions that take fixed-point data (or codata) structures and return a result, you create algebras and leave it to users to call cata themselves. More on this later but the point is, from the next example onwards, I'll show just the algebras.

List Length

Counting elements in a list is similar:

val listLength: FAlgebra[IntListF, Int] = {
  case IntListF.Cons(_, t) => 1 + t
  case IntListF.Nil        => 0
}

How does it work?

val listLengthVerbose: IntListF[Int] => Int = {
  case IntListF.Cons(_, t) => 1 + t
  //                    |     |
  //                    |     Add 1 for this element
  // Length of tail (Int)
  case IntListF.Nil => 0
}

BinaryTree Algebras

Here's a few algebras for BinaryTree:

val binaryTreeNodeCount: FAlgebra[BinaryTreeF[Any, ?], Int] = {
  case BinaryTreeF.Node(left, _, right) => left + 1 + right
  case BinaryTreeF.Leaf                 => 0
}

val binaryTreeMaxDepth: FAlgebra[BinaryTreeF[Any, ?], Int] = {
  case BinaryTreeF.Node(left, _, right) => left.max(right) + 1
  case BinaryTreeF.Leaf                 => 0
}

def binaryTreeSum[N](implicit N: Numeric[N]): FAlgebra[BinaryTreeF[N, ?], N] = {
  case BinaryTreeF.Node(left, n, right) => N.plus(left, N.plus(n, right))
  case BinaryTreeF.Leaf                 => N.zero
}

Pretty straight-forward. Each recursive slot (left & right) already has the computed value for that subtree.

JSON

Let's take a JSON value in our JSON ADT, and turn it into a JSON string that we can send out the door.

val jsonToString: FAlgebra[JsonF, String] = {
  case JsonF.Null        => "null"
  case JsonF.Bool(b)     => b.toString
  case JsonF.Num(n)      => n.toString
  case JsonF.Str(s)      => escapeString(s)
  case JsonF.Arr(values) => values.mkString("[", ",", "]")
  case JsonF.Obj(fields) => fields.iterator.map { case (k, v) => k + ":" + v }.mkString("{", ",", "}")
}

Is that easy or what? The array values and object fields are already all strings, we just mindlessly combine them using array/object notation.

What if, instead of the slower String concatenation, we wanted to use StringBuilder? Don't let the mutability discourage you; it's the same concept, we'll just replace String in the algebra type signature with StringBuilder => Unit. Executing a StringBuilder => Unit is mutable but the function itself is immutable, referentially transparent and pure. Descriptions of side-effects are safe.

val jsonToStringSB: FAlgebra[JsonF, StringBuilder => Unit] = {
  case JsonF.Null        => _ append "null"
  case JsonF.Bool(b)     => _ append b.toString
  case JsonF.Num(n)      => _ append n.toString
  case JsonF.Str(s)      => _ append escapeString(s)
  case JsonF.Arr(values) => sb => {
    sb append '['
    for (v <- values) v(sb)
    sb append ']'
  }
  case JsonF.Obj(fields) => sb => {
    sb append '{'
    for ((k, v) <- fields) {
      sb append k
      sb append ':'
      v(sb)
    }
    sb append '}'
  }
}

To be clear, usage would look like this:

def jsonToStringBuilderUsage(json: Json): String = {
  val sbToUnit = cata(jsonToStringSB)(json)
  val sb = new StringBuilder
  sbToUnit(sb)
  sb.toString()
}

A File System

Let's look at a more interesting example: a file system.

We'll start with a typical representation with hard-coded recursion.

sealed trait Entry
final case class Dir(files: Map[String, Entry]) extends Entry
final case class File(size: Long) extends Entry

This is an example inhabitant.

// Example of 3 files:
// 1. /usr/bin/find
// 2. /usr/bin/ls
// 3. /tmp/example.tmp
val example =
  Dir(Map(
    "usr" -> Dir(Map(
      "bin" -> Dir(Map(
        "find" -> File(197360),
        "ls" -> File(133688))))),
    "tmp" -> Dir(Map(
      "example.tmp" -> File(12)))))

Now let's create an API:

def totalFileSize(e: Entry): Long = e match {
  case File(s) => s
  case Dir(fs) => fs.values.foldLeft(0L)(_ + totalFileSize(_))
}

def countFiles(e: Entry): Int = e match {
  case File(_) => 1
  case Dir(fs) => fs.values.foldLeft(0)(_ + countFiles(_))
}

def countDirs(e: Entry): Int = e match {
  case File(_) => 0
  case Dir(fs) => fs.values.foldLeft(1)(_ + countDirs(_))
}

Looks great! SHIP IT! Ok so now people are using our super-awesome file system and associated API. One day a user wants to collect a bunch of stats and writes the following code:

final case class Stats(totalSize: Long, files: Int, dirs: Int)

def stats(e: Entry): Stats =
  Stats(totalFileSize(e), countFiles(e), countDirs(e))

The user then complains that their stats method takes 3 times as long as other operations. This is because each stat is produced by traversing the entire file system. 3 stats = 3 traversals. Now obviously, with the pure definitions given above, the extra time is going to be negligible but imagine it's a real file system here, maybe even one distributed over the network, all that drive/network/hardware cost to traverse the file system is likely to be very noticable and very significant when it's repeated 3 times.

What can the user do? Well... nothing. The only recourse they have is to raise an issue and complain. They have no control or power in this situation. They're at the mercy of the decisions made by the library authors.

After a few years of complaints, the authors of the super-awesome file system library do a big rewrite and create a new API that looks a little something like this...

final case class Stats(totalSize: Long, files: Int, dirs: Int)

def stats(e: Entry): Stats = e match {

  case File(fileSize) =>
    Stats(fileSize, 1, 0)

  case Dir(fs) =>
    fs.values.foldLeft(Stats(0, 0, 0)) { (statsAcc, entry) =>
      val b = stats(entry)
      Stats(
        statsAcc.totalSize + b.totalSize,
        statsAcc.files + b.files,
        statsAcc.dirs + b.dirs)
    }
}

def totalFileSize(e: Entry): Long =
  stats(e).totalSize

def countFiles(e: Entry): Int =
  stats(e).files

def countDirs(e: Entry): Int =
  stats(e).dirs

KICK-ASS! Now all stats are gathered in a single traversal; issue: resolved. Except there are two problems now where there once was one. First, over the next few versions of the library more and more stats (permissions, ownership, etc.) get added, and the stats function gets bigger, more complex and more wild. Second, now new complaints start coming in, complaints like "totalFileSize() significantly slower since new update", "dirCount() 4x slower than in v1.8.3". What's going on? Well, now in every traversal we're spending more time fetching more data that we don't need and end up throwing away. If a user only wants a directory count, the library spends oodles of time looking up attributes, group names, etc. only to discard them all at the end.

What can the user do? Well... again: nothing. The only recourse they have is to raise an issue and complain. They have no control or power in this situation. They're yet again at the mercy of the decisions made by the library authors.

Fixing our FileSystem

Start with the usual changes: type param, Functor, Fix:

sealed trait EntryF[+F]
final case class Dir[F](files: Map[String, F]) extends EntryF[F]
final case class File(size: Long) extends EntryF[Nothing]

object EntryF {
  implicit val functor: Functor[EntryF] = new Functor[EntryF] {
    override def map[A, B](fa: EntryF[A])(f: A => B): EntryF[B] = fa match {
      case f: File => f
      case Dir(fs) => Dir(fs.map { case (k, v) => (k, f(v)) })
    }
  }
}

type Entry = Fix[EntryF]

Now let's add a little DSL and re-create our sample value:

object Entry {
  def apply(f: EntryF[Entry]): Entry = Fix(f)
  def file(s: Long): Entry = apply(File(s))
  def dir(es: (String, Entry)*): Entry = apply(Dir(es.toMap))
}

// Example of 3 files:
// 1. /usr/bin/find
// 2. /usr/bin/ls
// 3. /tmp/example.tmp
val example =
  Entry.dir(
    "usr" -> Entry.dir(
      "bin" -> Entry.dir(
        "find" -> Entry.file(197360),
        "ls" -> Entry.file(133688))),
    "tmp" -> Entry.dir(
      "example.tmp" -> Entry.file(12)))

Next up are the queries. Small, reusable, independent functions are good practice for good reason; let's recreate what we had in the initial version:

val totalFileSize: FAlgebra[EntryF, Long] = {
  case File(s) => s
  case Dir(fs) => fs.values.sum
}

val countFiles: FAlgebra[EntryF, Int] = {
  case File(_) => 1
  case Dir(fs) => fs.values.sum
}

val countDirs: FAlgebra[EntryF, Int] = {
  case File(_) => 0
  case Dir(fs) => fs.values.sum + 1
}

If you compare this to the first attempt, it's just as simple from the outside, and even simpler in its internal implementation!

Now what happens when a user wants to get all three stats? F-Algebras compose. Here's a simple, reusable snippet to combine two F-algebras that share the same F:

def algebraZip[F[_], A, B](fa: FAlgebra[F, A],
                           fb: FAlgebra[F, B])
                          (implicit F: Functor[F]): FAlgebra[F, (A, B)] =
  fab => {
    val a = fa(fab.map(_._1))
    val b = fb(fab.map(_._2))
    (a, b)
  }

This is all the user needs to create their own stats gathering algebra:

val statsAlg: FAlgebra[EntryF, (Long, (Int, Int))] =
  algebraZip(totalFileSize, algebraZip(countFiles, countDirs))

The (Long, (Int, Int)) is a little ugly, let's clean it up a bit:

final case class Stats(totalSize: Long, files: Int, dirs: Int)

def stats(e: Entry): Stats = {
  val (totalSize, (files, dirs)) = cata(statsAlg)(e)
  Stats(totalSize, files, dirs)
}

Hoorah. Without any dependencies on the library authors, our user was able to choose the 3 stats they were interested in, and retrieve them all in a single file system traversal. Whilst not parallelism (wait for the next episode in this series), they've created their own concurrency!

Now contrast this with the previous incarnation. This time around, the user has the control, the user has the power. Library authors don't need to decide tradeoffs for everyone.

These are the advantages of providing algebras instead of higher-purpose functions. F-algebras (like val statsAlg) allow you to accrue power; that power is spent when you apply it (like def stats).

Personally speaking, when using all this stuff in my own work projects, I often find it nice to provide two sets of interfaces:

A low-level ecosystem of algebras, raw data types, logic, etc. Usually an entire package.
A high-level DSL that very cleanly exposes common high-level functions and hides all the algebra composition, calls to cata and other morphisms, etc. Usually a single object.

This approach allows all devs to get the most common types of work done without any knowledge of recursion schemes or theory. They just see a small, high-level DSL that's easy to skim, understand and use. It also serves as a good means of teaching recursion schemes because devs unfamiliar with it can then click through into the lower-level code and see real-world examples in a domain they're familiar with. This makes for a nice experience on teams with differing skill levels, both in terms of code quality and a path to gradually up-skill the team.

To be continued...

That's plenty for now.

If you like what I do and you'd like to support me, this series or any of my other projects, become a patron! It will make a big difference and help further more content. You can also follow me on Twitter for updates about the next post the series.

All source code available here.

Looking for a Scala recursion scheme library? There's a list in the FAQ.