Wednesday 13 December 2017

Practical Awesome Recursion - Ch 02: Catamorphisms

Recursion schemes are awesome, and practical. This is chapter 2 in a series of blog posts about recursion schemes. This series uses Scala, and will focus on usage and applicability. It will be scarce in theory, and abundant in examples. The theory is valuable and fascinating, but I often find that knowing the theory alone is only half understanding. The other half of understanding comes from, and enables, practical application. I recommend you bounce back and forth between this series and theory. The internet is rich with blogs and videos on theory that explain it better than I would, so use those awesome resources.

Before you start...

If you don't know what I mean by any of the following, read or skim chapter 1 of this series.

  • Fix[_]
  • IntList / IntListF[_]
  • BinaryTree / BinaryTreeF[_]

Also, this series doesn't depend on, or emphasise, Matryoshka or any other library. If you'd like to understand why, I've explained here in the FAQ.

The Catamorphism

What is a catamorphism? It can be answered from a few perspectives.

What

It's often referred to as a fold over your data structure. Examples that sum or count values are very common. Conceptually speaking, an example using Scala stdlib would be List(1, 3, 7).foldLeft(0)(_ + _) == 11. As you'll see, folds and catamorphisms are capable of much more than calculating numbers.

The definition of catamorphism is:

def cata[F[_]: Functor, A, B](fAlgebra: F[A] => A)(f: Fix[F]): A =
  fAlgebra(f.unfix.map(cata(fAlgebra)))

The first argument fAlgebra is so-called because F[A] => A is known as an F-algebra. What it is, is your folding logic. You implement your folding logic as a function that processes a single level/layer of your structure without recursion. When I say level/layer, I mean in terms of recursive depth, example:

IntList
=======

This is equivalent to List(1, 2, 3, 4, 5).

IntCons(1, _) ← Level 1
IntCons(2, _) ← Level 2
IntCons(3, _) ← Level 3
IntCons(4, _) ← Level 4
IntCons(5, _) ← Level 5
IntNil        ← Level 6
BinaryTree
==========

      Branch(_, "root", _)          ← Level 1
              |
  +-----------+-----------+
  |                       |
Leaf      Branch(_, "right", _)     ← Level 2
                 |           |
               Leaf         Leaf    ← Level 3

How

Look at the definition of cata and at the shape/type of the f-algebra. There's an interesting note about parametricity. The only means cata has of producing an A is to call fAlgebra... which requires an F[A]... so how does it put an A in the F[_] to call the function, if it can't produce As otherwise? Remember that the hole in F[_] represents the recursive case? In non-recursive cases (eg. Nil in a cons list) the type is a phantom-type, completely unused, or covariant and Nothing. For example:

case class Eg[T](int: Int)

def changeIt[X, Y](eg: Eg[X]): Eg[Y] =
  Eg(eg.int)

When a type variable is unused you can replace it with anything you want, which is exactly what happens in Functor[F].

It's also important to understand the order in which things happen. Catamorphisms:

  1. start at the root (their input, the f: Fix[F])
  2. (computationally) move to the leaves
  3. calculate their way back to the root

Which means your folding logic is going to start executing against all the leaves first, then their parents, then their parents, etc.

Optimisation

I did say that this is a practical series. This is a good a place as any to mention that this cata definition, while correct, is inefficient.

Every time it recurses it has to create the same functions with the same logic over and over again. We can make it more efficient by creating what we need once and reusing it.

def cata2[F[_], A, B](algebra: F[A] => A)(f: Fix[F])(implicit F: Functor[F]): A = {
  var self: Fix[F] => A = null
  self = f => algebra(F.map(f.unfix)(self))
  self(f)
}

In this new definition, we create the recursive function once and reuse it. Despite the null, it's 100% safe because we (provably) set it before it's used.

Let's measure it. How fast does it perform in comparison to the original? 105%? 110%?

[info] Benchmark         (size)  Mode  Cnt    Score   Error  Units
[info] RecursionBM.cata      10  avgt   10    0.274 ± 0.003  us/op
[info] RecursionBM.cata2     10  avgt   10    0.146 ± 0.001  us/op
[info] RecursionBM.cata     100  avgt   10    2.323 ± 0.020  us/op
[info] RecursionBM.cata2    100  avgt   10    1.555 ± 0.006  us/op
[info] RecursionBM.cata    1000  avgt   10   31.111 ± 0.720  us/op
[info] RecursionBM.cata2   1000  avgt   10   16.067 ± 0.187  us/op
[info] RecursionBM.cata   10000  avgt   10  326.443 ± 9.054  us/op
[info] RecursionBM.cata2  10000  avgt   10  165.470 ± 1.617  us/op

200%, it's twice as fast! Not bad for a tiny bit of one-off, hidden boilerplate.

Side-note: I ran the benchmark on a i7-6700HQ and all results are under 1ms, even at structure size of 1x10000 (length x depth), which means that either implementation is going to be fine on a fast CPU in a non-high-throughput solution. It'd be interesting to know what the measurements would be in prod, on real GCP/AWS VMs; the savings of the optimisation would be more significant because the VCPUs are slower.

F-Algebras

This isn't necessary but I'm also going to add a type alias.

type FAlgebra[F[_], A] = F[A] => A

and tweak the catamorphism definition to

def cata2[F[_], A, B](algebra: FAlgebra[F, A])(f: Fix[F])(implicit F: Functor[F]): A = {
  var self: Fix[F] => A = null
  self = f => algebra(F.map(f.unfix)(self))
  self(f)
}

Type aliases don't exist at runtime, the compilation process dealiases them completely. You can also use either definition interchangably.

def useAlias[F[_], A](f: F[A] => A): FAlgebra[F, A] =
  f

def removeAlias[F[_], A](f: FAlgebra[F, A]): F[A] => A =
  f

There are two advantages to having and using a type alias.

  1. Readability. The A and the F are separated; the A doesn't repeat; it clarifies intent the more you get used to recursion schemes.
  2. There are cases in which it helps type inference.

Simple Examples

Let's start with some basic examples: the usual blah -> Int stuff:

List Sum

Let's sum a list:

val listSum: FAlgebra[IntListF, Int] = {
  case IntListF.Cons(h, t) => h + t
  case IntListF.Nil        => 0
}

How does it work?

val listSumVerbose: IntListF[Int] => Int = {
  case IntListF.Cons(h, t) => h + t
  //                 |  |
  // Int by definition  |
  //                    Sum of tail (Int)
  case IntListF.Nil => 0
}

Notice this is an algebra, to actually use it in you need to call cata:

def sumThisListPlease(list: IntList): Int =
  cata(listSum)(list)

For reasons that will become obvious later, when using this stuff in your own project or libraries, the algebra itself is the unit that you'll be exposing most often. Instead of creating functions that take fixed-point data (or codata) structures and return a result, you create algebras and leave it to users to call cata themselves. More on this later but the point is, from the next example onwards, I'll show just the algebras.

List Length

Counting elements in a list is similar:

val listLength: FAlgebra[IntListF, Int] = {
  case IntListF.Cons(_, t) => 1 + t
  case IntListF.Nil        => 0
}

How does it work?

val listLengthVerbose: IntListF[Int] => Int = {
  case IntListF.Cons(_, t) => 1 + t
  //                    |     |
  //                    |     Add 1 for this element
  // Length of tail (Int)
  case IntListF.Nil => 0
}

BinaryTree Algebras

Here's a few algebras for BinaryTree:

val binaryTreeNodeCount: FAlgebra[BinaryTreeF[Any, ?], Int] = {
  case BinaryTreeF.Node(left, _, right) => left + 1 + right
  case BinaryTreeF.Leaf                 => 0
}

val binaryTreeMaxDepth: FAlgebra[BinaryTreeF[Any, ?], Int] = {
  case BinaryTreeF.Node(left, _, right) => left.max(right) + 1
  case BinaryTreeF.Leaf                 => 0
}

def binaryTreeSum[N](implicit N: Numeric[N]): FAlgebra[BinaryTreeF[N, ?], N] = {
  case BinaryTreeF.Node(left, n, right) => N.plus(left, N.plus(n, right))
  case BinaryTreeF.Leaf                 => N.zero
}

Pretty straight-forward. Each recursive slot (left & right) already has the computed value for that subtree.

JSON

Let's take a JSON value in our JSON ADT, and turn it into a JSON string that we can send out the door.

val jsonToString: FAlgebra[JsonF, String] = {
  case JsonF.Null        => "null"
  case JsonF.Bool(b)     => b.toString
  case JsonF.Num(n)      => n.toString
  case JsonF.Str(s)      => escapeString(s)
  case JsonF.Arr(values) => values.mkString("[", ",", "]")
  case JsonF.Obj(fields) => fields.iterator.map { case (k, v) => k + ":" + v }.mkString("{", ",", "}")
}

Is that easy or what? The array values and object fields are already all strings, we just mindlessly combine them using array/object notation.

What if, instead of the slower String concatenation, we wanted to use StringBuilder? Don't let the mutability discourage you; it's the same concept, we'll just replace String in the algebra type signature with StringBuilder => Unit. Executing a StringBuilder => Unit is mutable but the function itself is immutable, referentially transparent and pure. Descriptions of side-effects are safe.

val jsonToStringSB: FAlgebra[JsonF, StringBuilder => Unit] = {
  case JsonF.Null        => _ append "null"
  case JsonF.Bool(b)     => _ append b.toString
  case JsonF.Num(n)      => _ append n.toString
  case JsonF.Str(s)      => _ append escapeString(s)
  case JsonF.Arr(values) => sb => {
    sb append '['
    for (v <- values) v(sb)
    sb append ']'
  }
  case JsonF.Obj(fields) => sb => {
    sb append '{'
    for ((k, v) <- fields) {
      sb append k
      sb append ':'
      v(sb)
    }
    sb append '}'
  }
}

To be clear, usage would look like this:

def jsonToStringBuilderUsage(json: Json): String = {
  val sbToUnit = cata(jsonToStringSB)(json)
  val sb = new StringBuilder
  sbToUnit(sb)
  sb.toString()
}

A File System

Let's look at a more interesting example: a file system.

We'll start with a typical representation with hard-coded recursion.

sealed trait Entry
final case class Dir(files: Map[String, Entry]) extends Entry
final case class File(size: Long) extends Entry

This is an example inhabitant.

// Example of 3 files:
// 1. /usr/bin/find
// 2. /usr/bin/ls
// 3. /tmp/example.tmp
val example =
  Dir(Map(
    "usr" -> Dir(Map(
      "bin" -> Dir(Map(
        "find" -> File(197360),
        "ls" -> File(133688))))),
    "tmp" -> Dir(Map(
      "example.tmp" -> File(12)))))

Now let's create an API:

def totalFileSize(e: Entry): Long = e match {
  case File(s) => s
  case Dir(fs) => fs.values.foldLeft(0L)(_ + totalFileSize(_))
}

def countFiles(e: Entry): Int = e match {
  case File(_) => 1
  case Dir(fs) => fs.values.foldLeft(0)(_ + countFiles(_))
}

def countDirs(e: Entry): Int = e match {
  case File(_) => 0
  case Dir(fs) => fs.values.foldLeft(1)(_ + countDirs(_))
}

Looks great! SHIP IT! Ok so now people are using our super-awesome file system and associated API. One day a user wants to collect a bunch of stats and writes the following code:

final case class Stats(totalSize: Long, files: Int, dirs: Int)

def stats(e: Entry): Stats =
  Stats(totalFileSize(e), countFiles(e), countDirs(e))

The user then complains that their stats method takes 3 times as long as other operations. This is because each stat is produced by traversing the entire file system. 3 stats = 3 traversals. Now obviously, with the pure definitions given above, the extra time is going to be negligible but imagine it's a real file system here, maybe even one distributed over the network, all that drive/network/hardware cost to traverse the file system is likely to be very noticable and very significant when it's repeated 3 times.

What can the user do? Well... nothing. The only recourse they have is to raise an issue and complain. They have no control or power in this situation. They're at the mercy of the decisions made by the library authors.

After a few years of complaints, the authors of the super-awesome file system library do a big rewrite and create a new API that looks a little something like this...

final case class Stats(totalSize: Long, files: Int, dirs: Int)

def stats(e: Entry): Stats = e match {

  case File(fileSize) =>
    Stats(fileSize, 1, 0)

  case Dir(fs) =>
    fs.values.foldLeft(Stats(0, 0, 0)) { (statsAcc, entry) =>
      val b = stats(entry)
      Stats(
        statsAcc.totalSize + b.totalSize,
        statsAcc.files + b.files,
        statsAcc.dirs + b.dirs)
    }
}

def totalFileSize(e: Entry): Long =
  stats(e).totalSize

def countFiles(e: Entry): Int =
  stats(e).files

def countDirs(e: Entry): Int =
  stats(e).dirs

KICK-ASS! Now all stats are gathered in a single traversal; issue: resolved. Except there are two problems now where there once was one. First, over the next few versions of the library more and more stats (permissions, ownership, etc.) get added, and the stats function gets bigger, more complex and more wild. Second, now new complaints start coming in, complaints like "totalFileSize() significantly slower since new update", "dirCount() 4x slower than in v1.8.3". What's going on? Well, now in every traversal we're spending more time fetching more data that we don't need and end up throwing away. If a user only wants a directory count, the library spends oodles of time looking up attributes, group names, etc. only to discard them all at the end.

What can the user do? Well... again: nothing. The only recourse they have is to raise an issue and complain. They have no control or power in this situation. They're yet again at the mercy of the decisions made by the library authors.

Fixing our FileSystem

Start with the usual changes: type param, Functor, Fix:

sealed trait EntryF[+F]
final case class Dir[F](files: Map[String, F]) extends EntryF[F]
final case class File(size: Long) extends EntryF[Nothing]

object EntryF {
  implicit val functor: Functor[EntryF] = new Functor[EntryF] {
    override def map[A, B](fa: EntryF[A])(f: A => B): EntryF[B] = fa match {
      case f: File => f
      case Dir(fs) => Dir(fs.map { case (k, v) => (k, f(v)) })
    }
  }
}

type Entry = Fix[EntryF]

Now let's add a little DSL and re-create our sample value:

object Entry {
  def apply(f: EntryF[Entry]): Entry = Fix(f)
  def file(s: Long): Entry = apply(File(s))
  def dir(es: (String, Entry)*): Entry = apply(Dir(es.toMap))
}

// Example of 3 files:
// 1. /usr/bin/find
// 2. /usr/bin/ls
// 3. /tmp/example.tmp
val example =
  Entry.dir(
    "usr" -> Entry.dir(
      "bin" -> Entry.dir(
        "find" -> Entry.file(197360),
        "ls" -> Entry.file(133688))),
    "tmp" -> Entry.dir(
      "example.tmp" -> Entry.file(12)))

Next up are the queries. Small, reusable, independent functions are good practice for good reason; let's recreate what we had in the initial version:

val totalFileSize: FAlgebra[EntryF, Long] = {
  case File(s) => s
  case Dir(fs) => fs.values.sum
}

val countFiles: FAlgebra[EntryF, Int] = {
  case File(_) => 1
  case Dir(fs) => fs.values.sum
}

val countDirs: FAlgebra[EntryF, Int] = {
  case File(_) => 0
  case Dir(fs) => fs.values.sum + 1
}

If you compare this to the first attempt, it's just as simple from the outside, and even simpler in its internal implementation!

Now what happens when a user wants to get all three stats? F-Algebras compose. Here's a simple, reusable snippet to combine two F-algebras that share the same F:

def algebraZip[F[_], A, B](fa: FAlgebra[F, A],
                           fb: FAlgebra[F, B])
                          (implicit F: Functor[F]): FAlgebra[F, (A, B)] =
  fab => {
    val a = fa(fab.map(_._1))
    val b = fb(fab.map(_._2))
    (a, b)
  }

This is all the user needs to create their own stats gathering algebra:

val statsAlg: FAlgebra[EntryF, (Long, (Int, Int))] =
  algebraZip(totalFileSize, algebraZip(countFiles, countDirs))

The (Long, (Int, Int)) is a little ugly, let's clean it up a bit:

final case class Stats(totalSize: Long, files: Int, dirs: Int)

def stats(e: Entry): Stats = {
  val (totalSize, (files, dirs)) = cata(statsAlg)(e)
  Stats(totalSize, files, dirs)
}

Hoorah. Without any dependencies on the library authors, our user was able to choose the 3 stats they were interested in, and retrieve them all in a single file system traversal. Whilst not parallelism (wait for the next episode in this series), they've created their own concurrency!

Now contrast this with the previous incarnation. This time around, the user has the control, the user has the power. Library authors don't need to decide tradeoffs for everyone.

These are the advantages of providing algebras instead of higher-purpose functions. F-algebras (like val statsAlg) allow you to accrue power; that power is spent when you apply it (like def stats).

Personally speaking, when using all this stuff in my own work projects, I often find it nice to provide two sets of interfaces:

  1. A low-level ecosystem of algebras, raw data types, logic, etc. Usually an entire package.
  2. A high-level DSL that very cleanly exposes common high-level functions and hides all the algebra composition, calls to cata and other morphisms, etc. Usually a single object.

This approach allows all devs to get the most common types of work done without any knowledge of recursion schemes or theory. They just see a small, high-level DSL that's easy to skim, understand and use. It also serves as a good means of teaching recursion schemes because devs unfamiliar with it can then click through into the lower-level code and see real-world examples in a domain they're familiar with. This makes for a nice experience on teams with differing skill levels, both in terms of code quality and a path to gradually up-skill the team.

To be continued...

That's plenty for now.

If you like what I do and you'd like to support me, this series or any of my other projects, become a patron! It will make a big difference and help further more content. You can also follow me on Twitter for updates about the next post the series.

All source code available here.

Looking for a Scala recursion scheme library? There's a list in the FAQ.

Monday 20 November 2017

Practical Awesome Recursion - Ch 01: Fixpoints

This is chapter 1 in a series of blog posts about recursion schemes. This series uses Scala, and will focus on usage and applicability. It will be scarce in theory, and abundant in examples. The theory is valuable and fascinating, but I often find that knowing the theory alone is only half understanding. The other half of understanding comes from, and enables, practical application. I recommend you bounce back and forth between this series and theory. The internet is rich with blogs and videos on theory that explain it better than I would, so use those awesome resources.

Overview

The goal of this post is to prepare your data types so that you can abstrct over/away recursion, and be able to use all the generalisations that we'll explore in all future chapters in this series.

In order to prepare your data, there are three things to do:

  1. Make the recursive positions in your data type abstract
  2. Create a Functor for your data type
  3. Wraps your data type in Fix[_]

Step 1. Remove recursion from the data type

Add a type parameter to your data type that will be used to represent recursion (and other things as you'll see in future chapters). Everywhere that your type references itself, replace the type with the new abstract type parameter.

IntList

Example #1: let's say you have your own hand-written cons-list, specialised to Int; maybe you want to avoid boxing with List[Int]. It might look like this:

// Before
sealed trait IntList
final case class IntCons(head: Int, tail: IntList) extends IntList
case object IntNil extends IntList

This type references itself in IntCons#tail. Let's fix that...

//  After (v1)
sealed trait IntList[F]
final case class IntCons[F](head: Int, tail: F) extends IntList[F]
final case class IntNil[F]() extends IntList[F]

Ok so now it never references itself, but we've had to modify the non-recursive case, IntNil, from an object to a case class with zero params. I'd prefer IntNil to remain an object for better ergonomics and efficiency so let's use variance.

(Note: Many Scala FP'ers avoid type variance entirely because it's kind of broken in theory, can lead to bugs in higher-kinded contexts, screws up implicits sometimes and breaks type inference sometimes too. I advise: learn about it, understand the tradeoffs and decide on a case-by-case basis.)

//  After (v2)
sealed trait IntList[+F]
final case class IntCons[+F](head: Int, tail: F) extends IntList[F]
case object IntNil extends IntList[Nothing]

BinaryTree

How about a binary tree with an abstract type A in the nodes:

// Before
sealed trait BinaryTree[+A]
final case class Node[+A](left: BinaryTree[A], value: A, right: BinaryTree[A]) extends BinaryTree[A]
case object Leaf extends BinaryTree[Nothing]

Here we replace the left and right branches.

// After
sealed trait BinaryTree[+A, +F]
final case class Node[+A, +F](left: F, value: A, right: F) extends BinaryTree[A, F]
case object Leaf extends BinaryTree[Nothing, Nothing]

Don't worry about preserving the A in the branches -- don't try to make it an F[A] -- just a plain old *-kinded F is all you need. Step 3 will ensure that As persist in both branches' children.

JSON

JSON is recursive too.

JSON has arrays of JSON, it has objects with JSON values, those values can be arrays that contains even more objects with nested arrays and... you get the picture.

// Before
sealed trait Json
object Json {
  case object      Null                               extends Json
  final case class Bool(value: Boolean)               extends Json
  final case class Str (value: String)                extends Json
  final case class Num (value: Double)                extends Json
  final case class Arr (values: List[Json])           extends Json
  final case class Obj (fields: List[(String, Json)]) extends Json
}

Only replace the self references; preserve the outer type. List[Json] should be List[F], not F.

// After
sealed trait Json[+F]
object Json {
  case object      Null                              extends Json
  final case class Bool  (value: Boolean)            extends Json[Nothing]
  final case class Str   (value: String)             extends Json[Nothing]
  final case class Num   (value: Double)             extends Json[Nothing]
  final case class Arr[F](values: List[F])           extends Json[F]
  final case class Obj[F](fields: List[(String, F)]) extends Json[F]
}

Step 2. Create a Functor

In this step we create a Functor for our data types. Functor is a type class that exists in the FP lib of your choice, namely Scalaz or Cats. It looks like this:

trait Functor[F[_]] {
  def map[A, B](fa: F[A])(f: A => B): F[B]
}

It empowers the world outside your data structure to change one of its abstract types, and by necessity, all values of that type. It's just like calling .map on a list:

// Change the values
List(1, 2, 3).map(_ * 10)
  // yields
  List[Int](10, 20, 30)

// Change the type too
List(1, 2, 3).map(i => s"[$i]")
  // yields
  List[String]("[1]", "[2]", "[3]")

This functor is how the generic recursion abstractions can have access to, and control over the recursive spots in your data.

(Note: If you want to use monadic variants of morphisms later, you'll need to upgrade your Functor to a Traverse. I'll show that in a future post but just keep it in mind.)

Let's create instances for our examples above. It doesn't matter if you use Scalaz or Cats, only the imports need to change. The code itself is identical.

IntList

Let the compiler guide you, it will only accept one implementation:

implicit val functor: Functor[IntList] = new Functor[IntList] {
  override def map[A, B](fa: IntList[A])(f: A => B): IntList[B] = fa match {
    case IntCons(head, tail) => IntCons(head, f(tail))
    case IntNil              => IntNil
  }
}

Note: You could also use an implicit object but it can cause implicit resolution problems later. I can't remember why/when anymore, it's just become habit.

// Also possible but leads to implicit resolution problems
implicit object IntListFunctor extends Functor[IntList] {
  override def map[A, B](fa: IntList[A])(f: A => B): IntList[B] = fa match {
    case IntCons(head, tail) => IntCons(head, f(tail))
    case IntNil              => IntNil
  }
}

Another side-note, do explicitly annotate the type.

  1. If you don't the type will be a structural type which will slow down the compiler and mess with implicit resolution.
  2. Explicit annotation will be mandatory in Scala v3 anyway.
// Don't do this
implicit val functor = new Functor[IntList] {

BinaryTree

As intended, this next case introduces a bit more complexity. We want to provide the ability to transform the recursive positions which means we need to keep the value: A position abstract and stable.

You'll need to use kind-projector to get the nice BinaryTree[A, ?] syntax, instead of the monstrous, out-of-the-box ({ type L[X] = BinaryTree[A, X] })#L syntax. If types and terms weren't so different, it'd be BinaryTree[A, _] just like the underscore in List(1,2,3).map(_ * 100).

implicit def functor[A]: Functor[BinaryTree[A, ?]] = new Functor[BinaryTree[A, ?]] {
  override def map[B, C](fa: BinaryTree[A, B])(f: B => C): BinaryTree[A, C] = fa match {
    case Node(left, value, right) => Node(f(left), value, f(right))
    case Leaf                     => Leaf
  }
}

There's nothing special about the F type over the A. You could also write a functor over the A type, i.e. a Functor[BinaryTree[?, F]] instance.

JSON

Here our Fs are embedded in other values so we have to work a little harder to transform them, but not much harder. Lists have functors too so it's easy peasy: just call .map.

implicit val functor: Functor[Json] = new Functor[Json] {
  override def map[A, B](fa: Json[A])(f: A => B): Json[B] = fa match {
    case Null        => Null
    case j: Bool     => j
    case j: Str      => j
    case j: Num      => j
    case Arr(values) => Arr(values.map(f))
    case Obj(fields) => Obj(fields.map { case (k, v) => (k, f(v)) })
  }
}

Step 3. The Fixpoint

In step 1, we went from a recursive data structure to a non-recursive data structure. Our goal is to be able to abstract over/away recursion, not to vanquish it. How do we regain our recursion? After all, we still want the users of our amazing IntList library to be able to store more than one element!

We've going to wrap our types in a magical fixpoint type. Doing so will give us back our recursion.

Here's a definition of Fix:

case class Fix[F[_]](unfix: F[Fix[F]])

Confused? This isn't a theory series but get a pen and paper, plug in IntList and expand the alias step-by-step; it'll clear it up real quick.

Exciting side note: Stephen Compall and Tomas Mikula found a way to define Fix without boxing! See how, here. I measured a 20-30% improvement and copied the approach in my own recursion library. Very awesome stuff.

Anyway, wrap your data types in Fix[_] and you get your recursion back.

type RecursiveIntList       = Fix[IntList]
type RecursiveBinaryTree[A] = Fix[BinaryTree[A, ?]]
type RecursiveJson          = Fix[Json]

...which is a bit long-winded and unpleasant. Let's rename things.

The most-common convention I've seen is to append an F for "functor" to the data type and use the proper name in the alias.

sealed trait IntListF[+F]
type IntList = Fix[IntListF]

sealed trait BinaryTreeF[+A, +F]
type BinaryTree[A] = Fix[BinaryTreeF[A, ?]]

sealed trait JsonF[+F]
type Json = Fix[JsonF]

This actually works out great because then, in practice, I often create a new object with helpers that improve ergonomics and avoid Fix boilerplate when you need to manually create a structure, for example, unit test expectations, or using a parser that doesn't play nice with recursion schemes.

For example:

type IntList = Fix[IntListF]

object IntList {

  // Helpful cos Scala's type inference fails so often
  def apply(f: IntListF[IntList]): IntList =
    Fix(f)

  def nil: IntList =
    apply(IntNil)

  def cons(head: Int, tail: IntList): IntList  =
    apply(IntCons(head, tail))

  def fromList(is: Int*): IntList  =
    is.foldRight(nil)(cons)
}

Done!

That's it! We're done. It may seem like a lot of work but there's benefit coming that greatly outweighs the cost. The cost isn't huge anyway, just a bit of one-time-only boilerplate.

All source code available here: https://github.com/japgolly/learning/tree/public/recursion-blog/example/src/main/scala/japgolly/blog/recursion

If you like what I do and you'd like to support me, this series or any of my other projects, become a patron! It will make a big difference to me and help further more content. You can also follow me on Twitter for updates about the next post the series.