Preposterous! Egregious!

Enforcing rules at compile-time: an example

2018-07-26T09:39:00.000+10:00

I recently solved a problem I had in Scala. I was able to solve the problem quickly and easily, where as I remember when I was less experienced with Scala, solving a problem like this was difficult. It would take a lot of thought and effort, and even if I came away with a solution that technically worked, it often felt off and wasn't pleasant to use.

I thought it'd be nice to share this example of my current approach to writing good Scala. I hope you find it useful.

Premise

I'm the author and maintainer of scalajs-react which is a Scala.JS library that provides a type-safe interface to React, a JavaScript UI library.

In scalajs-react, the primary way to create a UI component is via "the builder pattern". The builder API is separated into 4 steps so that you first specify the prerequisites in a deliberate order, then at the final step you can specify a bunch of optional lifecycle methods. Usage looks like this:

val MyComponent =
  ScalaComponent.builder[Props]("MyComponent")
    .stateless               // step 1
    .noBackend               // step 2
    .render(...)             // step 3
    .componentWillMount(...) // step 4 (optional)
    .componentDidMount(...)  // step 4 (optional)
    .build                   // step 4

Steps 1 and 2 are optional and are made so in the API via implicits. A minimal example looks like this:

val MyComponent =
  ScalaComponent.builder[Props]("MyComponent")
    .render(...)             // step 3
    .build                   // step 4

Multiple specifications of the same lifecycle method compose. For example, this is valid and will result in all three procedures executing at the .componentDidMount lifecycle event.

val MyComponent =
  ScalaComponent.builder[Props]("MyComponent")
    .render(...)             // step 3
    .componentDidMount(...)  // step 4 (optional)
    .componentDidMount(...)  // step 4 (optional)
    .componentDidMount(...)  // step 4 (optional)
    .build                   // step 4

Problem

A recent version of React introduced some changes to lifecycle methods and so I was updating scalajs-react the other day.

The two changes relevant to this article are:

A new lifecycle method getSnapshotBeforeUpdate is added, from which you return any arbitrary value called a snapshot.
The lifecycle method componentDidUpdate gets a new parameter which is the value from getSnapshotBeforeUpdate above.

Goals

Let's break down the new React changes into a few rules:

The return type of getSnapshotBeforeUpdate needs to match the type of the new componentDidUpdate param.
In the case that getSnapshotBeforeUpdate isn't specified, the type of the new componentDidUpdate param will be Unit (which is undefined in JS).
In order to support composition of multiple getSnapshotBeforeUpdate functions, a means of return value composition (most naturally a Semigroup typeclass) is required.

Regarding (3),

Semigroup would only be required for subsequent calls, not the first, which adds a little complication for values for which Semigroup isn't defined.
scalajs-react doesn't have external dependencies and I don't want to add a Semigroup typeclass to the public API.
I'd be surprised if anyone ever wanted to supply multiple getSnapshotBeforeUpdate functions anyway; if so, one can do it oneself.

Therefore I've decided to just not support multiple getSnapshotBeforeUpdate functions. We don't lose parity with React JS anyway.

Let's break down (1) and (2) into more concrete rules:

getSnapshotBeforeUpdate can only be called 0 or 1 times
getSnapshotBeforeUpdate sets the Snapshot type
componentDidUpdate receives the Snapshot type
When componentDidUpdate is called and the Snapshot type is undefined, it's set to Unit
getSnapshotBeforeUpdate cannot occur after componentDidUpdate because it will change the Snapshot type which would invalidate the previous componentDidUpdate where the Snapshot was Unit (and a fn to Unit would be pointless here).

Before we continue it's time to emphasise: type-safety is very important to me. One of the biggest features of scalajs-react is its strong type-safety. (As much is reasonable in Scala) if it compiles, I want confidence that it works and is correct.

I want to encode the above rules into the types so that end-users don't have to read any documentation, have any internal knowledge of these rules, or experience any runtime exceptions; the compiler should just enforce everything we discussed such that violations wont even compile.

Rejected solution

Probably the first solution that earlier-me would've reached for, is to create a new step in the builder API like this:

  [STEP 3]                                          [STEP 5]

  .render   ----(implicit with Snapshot=Unit)--->   last step
     \                                              /
      \                                            /
       --------> .getSnapshotBeforeUpdate ------> /

                         [STEP 4]

There are problems with such a solution:

It doesn't scale. If React adds more constraints in future it will become harder to keep a fluent API without introducing unnecessary usage constraints.
External component config fns (LastStep => LastStep) need to be able to configure any part of the lifecycle.
ScalaComponent.builder.static is an example where it returns a half-built component allowing further configuration. It needs to set the shouldComponentUpdate method which would skip step 4 in this approach, or else require that we add nearly everything to both steps (yuk).

Basic solution

Consider this pseudo-code:

var snapshotType = None

def getSnapshotBeforeUpdate[A](f: X => A) = {
  snapshotType match {
    case None    => snapshotType = Some(A)
    case Some(_) => error("SnapshotType already defined!")
  }
  getSnapshotBeforeUpdate = f
}

def componentDidUpdate(f) = {
  snapshotType = Some(snapshotType.getOrElse(Unit))
  componentDidUpdate.append(f)
}

We could track this at the term-level at runtime using Option (and typetags). It's not very type-safe though. We can still keep the same approach and logic, we just need to lift it up into the type-level so that it runs at compile-time instead of at runtime. To do so we'll use a type-level encoding of Option.

This is how you encode Option at the type-level in Scala; first you'll see a term-level equivalent for contrast:

object TermLevel {

  sealed trait Option[+A] {
    def getOrElse[B >: A](default: => B): B
  }
  final case class Some[+A](value: A) extends Option[A] {
    override def getOrElse[B >: A](default: => B) = value
  }
  case object None extends Option[Nothing] {
    override def getOrElse[B >: Nothing](default: => B) = default
  }

  // Example usage
  def value: Option[Any] => Any = _.getOrElse(())
}

// ===============================================================

object TypeLevel {

  sealed trait TOption {
    type GetOrElse[B]
  }
  sealed trait TSome[A] extends TOption {
    override final type GetOrElse[B] = A
  }
  sealed trait TNone extends TOption {
    override final type GetOrElse[B] = B
  }

  // Example usage
  type Value[T <: TOption] = T#GetOrElse[Unit]
}

Ok, now let's code up a skeleton that will enforce our rules at compile-time:

final class Builder[SnapshotType <: TOption] {
  type SnapshotValue = SnapshotType#GetOrElse[Unit]

  def getSnapshotBeforeUpdate[A](f: ... => A)
                                (implicit ev: SnapshotType =:= TNone)
                                : Builder[TSome[A]]

  def componentDidUpdate(f: SnapshotValue => ...)
                        : Builder[TSome[SnapshotValue]]
}

Let's compare this to our pseudo-code:

The snapshotType var is now a type parameter of Builder.
getSnapshotBeforeUpdate would check snapshotType is None and set it to Some(A). Now we ask for implicit proof that SnapshotType =:= TNone, set in the return type we can see return a new Builder with SnapshotType set to TSome
getSnapshotBeforeUpdate throw an error when snapshotType is Some(_). Now the compiler will throw an implicit not found error at compile-time when SnapshotType =:= TSome[_].
Where as before in componentDidUpdate we had snapshotType = Some(snapshotType.getOrElse(Unit)), we now have the equivalent in that the return type is Builder[TSome[SnapshotValue]] where SnapshotValue = SnapshotType#GetOrElse[Unit].

Nice solution

A while back I would've been satisfied with the above solution; it works right? If you know all of the rules, sure, but the error message to users is going to be pretty confusing and probably even lead them to think there's some kind of bug in the library. This is what an error looks like at the moment:

[error] ScalaComponentTest.scala:189: Cannot prove that japgolly.scalajs.react.example.TSome[A] =:= japgolly.scalajs.react.example.TNone.
[error]         .getSnapshotBeforeUpdate(???)
[error]                                 ^
[error] one error found

Nice UX in Scala is a bit of an art; in this case we'll do away with the generic TOption and create a custom construct for this one specific problem.

First, the new shape. Because this isn't generic anymore we no longer need the inner type member to be a type constructor which makes usage nicer too (i.e. T#Value instead of T#GetOrElse[Unit]):

sealed trait UpdateSnapshot {
  type Value
}

object UpdateSnapshot {
  sealed trait None extends UpdateSnapshot {
    override final type Value = Unit
  }

  sealed trait Some[A] extends UpdateSnapshot {
    override final type Value = A
  }
}

Easy enough. Now to improve the UX on failure. First we change the (implicit ev: SnapshotType =:= TNone) to (implicit ev: UpdateSnapshot.SafetyProof[U]) and create:

object UpdateSnapshot {

  @implicitNotFound("You can only specify getSnapshotBeforeUpdate once, and it has to be before " +
    "you specify componentDidUpdate, otherwise the snapshot type could become inconsistent.")
  sealed trait SafetyProof[U <: UpdateSnapshot]

  implicit def safetyProof[U <: UpdateSnapshot](implicit ev: U =:= UpdateSnapshot.None): SafetyProof[U] =
    null.asInstanceOf[SafetyProof[U]]
}

The (implicit ev: U =:= UpdateSnapshot.None) is still part of the solution, but this time it's indirect. It's a dependency on the availability of implicit SafetyProof. Thus the logic is still the same, just users will never see it as an error message.

The @implicitNotFound annotation on SafetyProof is the pudding. It will cause our custom error message to be displayed as a compilation error when someone breaks the rules.

Using null.asInstanceOf[SafetyProof[U]] is a performance optimisation; new SafetyProof[U]{} is fine too but I'd prefer to avoid the allocation and more importantly, by never actually creating or using SafetyProof it can be completely elided from Scala.JS output which means a smaller download for your webapp's end-users.

Finally, our new builder excerpt looks like this:

final class Builder[U <: UpdateSnapshot] {
  type SnapshotValue = U#Value

  def getSnapshotBeforeUpdate[A](f: ... => A)
                                (implicit ev: UpdateSnapshot.SafetyProof[U])
                                : Builder[UpdateSnapshot.Some[A]]

  def componentDidUpdate(f: SnapshotValue => ...)
                        : Builder[UpdateSnapshot.Some[SnapshotValue]]
}

And let's look at what errors look like now:

[error] ScalaComponentTest.scala:189: You can only specify getSnapshotBeforeUpdate once, and it has to be before you specify componentDidUpdate, otherwise the snapshot type could become inconsistent.
[error]         .getSnapshotBeforeUpdate(???)
[error]                                 ^
[error] one error found

Done

That's all. I hope you've enjoyed. If you're interested, the full patch that went into scalajs-react is here:

https://github.com/japgolly/scalajs-react/commit/ee81acf12c1039997460a7cac3d759fda6577533

Practical Awesome Recursion - Ch 02: Catamorphisms

2017-12-13T12:09:00.000+11:00

Recursion schemes are awesome, and practical. This is chapter 2 in a series of blog posts about recursion schemes. This series uses Scala, and will focus on usage and applicability. It will be scarce in theory, and abundant in examples. The theory is valuable and fascinating, but I often find that knowing the theory alone is only half understanding. The other half of understanding comes from, and enables, practical application. I recommend you bounce back and forth between this series and theory. The internet is rich with blogs and videos on theory that explain it better than I would, so use those awesome resources.

Before you start...

If you don't know what I mean by any of the following, read or skim chapter 1 of this series.

Fix[_]
IntList / IntListF[_]
BinaryTree / BinaryTreeF[_]

Also, this series doesn't depend on, or emphasise, Matryoshka or any other library. If you'd like to understand why, I've explained here in the FAQ.

The Catamorphism

What is a catamorphism? It can be answered from a few perspectives.

What

It's often referred to as a fold over your data structure. Examples that sum or count values are very common. Conceptually speaking, an example using Scala stdlib would be List(1, 3, 7).foldLeft(0)(_ + _) == 11. As you'll see, folds and catamorphisms are capable of much more than calculating numbers.

The definition of catamorphism is:

def cata[F[_]: Functor, A, B](fAlgebra: F[A] => A)(f: Fix[F]): A =
  fAlgebra(f.unfix.map(cata(fAlgebra)))

The first argument fAlgebra is so-called because F[A] => A is known as an F-algebra. What it is, is your folding logic. You implement your folding logic as a function that processes a single level/layer of your structure without recursion. When I say level/layer, I mean in terms of recursive depth, example:

IntList
=======

This is equivalent to List(1, 2, 3, 4, 5).

IntCons(1, _) ← Level 1
IntCons(2, _) ← Level 2
IntCons(3, _) ← Level 3
IntCons(4, _) ← Level 4
IntCons(5, _) ← Level 5
IntNil        ← Level 6

BinaryTree
==========

      Branch(_, "root", _)          ← Level 1
              |
  +-----------+-----------+
  |                       |
Leaf      Branch(_, "right", _)     ← Level 2
                 |           |
               Leaf         Leaf    ← Level 3

How

Look at the definition of cata and at the shape/type of the f-algebra. There's an interesting note about parametricity. The only means cata has of producing an A is to call fAlgebra... which requires an F[A]... so how does it put an A in the F[_] to call the function, if it can't produce As otherwise? Remember that the hole in F[_] represents the recursive case? In non-recursive cases (eg. Nil in a cons list) the type is a phantom-type, completely unused, or covariant and Nothing. For example:

case class Eg[T](int: Int)

def changeIt[X, Y](eg: Eg[X]): Eg[Y] =
  Eg(eg.int)

When a type variable is unused you can replace it with anything you want, which is exactly what happens in Functor[F].

It's also important to understand the order in which things happen. Catamorphisms:

start at the root (their input, the f: Fix[F])
(computationally) move to the leaves
calculate their way back to the root

Which means your folding logic is going to start executing against all the leaves first, then their parents, then their parents, etc.

Optimisation

I did say that this is a practical series. This is a good a place as any to mention that this cata definition, while correct, is inefficient.

Every time it recurses it has to create the same functions with the same logic over and over again. We can make it more efficient by creating what we need once and reusing it.

def cata2[F[_], A, B](algebra: F[A] => A)(f: Fix[F])(implicit F: Functor[F]): A = {
  var self: Fix[F] => A = null
  self = f => algebra(F.map(f.unfix)(self))
  self(f)
}

In this new definition, we create the recursive function once and reuse it. Despite the null, it's 100% safe because we (provably) set it before it's used.

Let's measure it. How fast does it perform in comparison to the original? 105%? 110%?

[info] Benchmark         (size)  Mode  Cnt    Score   Error  Units
[info] RecursionBM.cata      10  avgt   10    0.274 ± 0.003  us/op
[info] RecursionBM.cata2     10  avgt   10    0.146 ± 0.001  us/op
[info] RecursionBM.cata     100  avgt   10    2.323 ± 0.020  us/op
[info] RecursionBM.cata2    100  avgt   10    1.555 ± 0.006  us/op
[info] RecursionBM.cata    1000  avgt   10   31.111 ± 0.720  us/op
[info] RecursionBM.cata2   1000  avgt   10   16.067 ± 0.187  us/op
[info] RecursionBM.cata   10000  avgt   10  326.443 ± 9.054  us/op
[info] RecursionBM.cata2  10000  avgt   10  165.470 ± 1.617  us/op

200%, it's twice as fast! Not bad for a tiny bit of one-off, hidden boilerplate.

Side-note: I ran the benchmark on a i7-6700HQ and all results are under 1ms, even at structure size of 1x10000 (length x depth), which means that either implementation is going to be fine on a fast CPU in a non-high-throughput solution. It'd be interesting to know what the measurements would be in prod, on real GCP/AWS VMs; the savings of the optimisation would be more significant because the VCPUs are slower.

F-Algebras

This isn't necessary but I'm also going to add a type alias.

type FAlgebra[F[_], A] = F[A] => A

and tweak the catamorphism definition to

def cata2[F[_], A, B](algebra: FAlgebra[F, A])(f: Fix[F])(implicit F: Functor[F]): A = {
  var self: Fix[F] => A = null
  self = f => algebra(F.map(f.unfix)(self))
  self(f)
}

Type aliases don't exist at runtime, the compilation process dealiases them completely. You can also use either definition interchangably.

def useAlias[F[_], A](f: F[A] => A): FAlgebra[F, A] =
  f

def removeAlias[F[_], A](f: FAlgebra[F, A]): F[A] => A =
  f

There are two advantages to having and using a type alias.

Readability. The A and the F are separated; the A doesn't repeat; it clarifies intent the more you get used to recursion schemes.
There are cases in which it helps type inference.

Simple Examples

Let's start with some basic examples: the usual blah -> Int stuff:

List Sum

Let's sum a list:

val listSum: FAlgebra[IntListF, Int] = {
  case IntListF.Cons(h, t) => h + t
  case IntListF.Nil        => 0
}

How does it work?

val listSumVerbose: IntListF[Int] => Int = {
  case IntListF.Cons(h, t) => h + t
  //                 |  |
  // Int by definition  |
  //                    Sum of tail (Int)
  case IntListF.Nil => 0
}

Notice this is an algebra, to actually use it in you need to call cata:

def sumThisListPlease(list: IntList): Int =
  cata(listSum)(list)

For reasons that will become obvious later, when using this stuff in your own project or libraries, the algebra itself is the unit that you'll be exposing most often. Instead of creating functions that take fixed-point data (or codata) structures and return a result, you create algebras and leave it to users to call cata themselves. More on this later but the point is, from the next example onwards, I'll show just the algebras.

List Length

Counting elements in a list is similar:

val listLength: FAlgebra[IntListF, Int] = {
  case IntListF.Cons(_, t) => 1 + t
  case IntListF.Nil        => 0
}

How does it work?

val listLengthVerbose: IntListF[Int] => Int = {
  case IntListF.Cons(_, t) => 1 + t
  //                    |     |
  //                    |     Add 1 for this element
  // Length of tail (Int)
  case IntListF.Nil => 0
}

BinaryTree Algebras

Here's a few algebras for BinaryTree:

val binaryTreeNodeCount: FAlgebra[BinaryTreeF[Any, ?], Int] = {
  case BinaryTreeF.Node(left, _, right) => left + 1 + right
  case BinaryTreeF.Leaf                 => 0
}

val binaryTreeMaxDepth: FAlgebra[BinaryTreeF[Any, ?], Int] = {
  case BinaryTreeF.Node(left, _, right) => left.max(right) + 1
  case BinaryTreeF.Leaf                 => 0
}

def binaryTreeSum[N](implicit N: Numeric[N]): FAlgebra[BinaryTreeF[N, ?], N] = {
  case BinaryTreeF.Node(left, n, right) => N.plus(left, N.plus(n, right))
  case BinaryTreeF.Leaf                 => N.zero
}

Pretty straight-forward. Each recursive slot (left & right) already has the computed value for that subtree.

JSON

Let's take a JSON value in our JSON ADT, and turn it into a JSON string that we can send out the door.

val jsonToString: FAlgebra[JsonF, String] = {
  case JsonF.Null        => "null"
  case JsonF.Bool(b)     => b.toString
  case JsonF.Num(n)      => n.toString
  case JsonF.Str(s)      => escapeString(s)
  case JsonF.Arr(values) => values.mkString("[", ",", "]")
  case JsonF.Obj(fields) => fields.iterator.map { case (k, v) => k + ":" + v }.mkString("{", ",", "}")
}

Is that easy or what? The array values and object fields are already all strings, we just mindlessly combine them using array/object notation.

What if, instead of the slower String concatenation, we wanted to use StringBuilder? Don't let the mutability discourage you; it's the same concept, we'll just replace String in the algebra type signature with StringBuilder => Unit. Executing a StringBuilder => Unit is mutable but the function itself is immutable, referentially transparent and pure. Descriptions of side-effects are safe.

val jsonToStringSB: FAlgebra[JsonF, StringBuilder => Unit] = {
  case JsonF.Null        => _ append "null"
  case JsonF.Bool(b)     => _ append b.toString
  case JsonF.Num(n)      => _ append n.toString
  case JsonF.Str(s)      => _ append escapeString(s)
  case JsonF.Arr(values) => sb => {
    sb append '['
    for (v <- values) v(sb)
    sb append ']'
  }
  case JsonF.Obj(fields) => sb => {
    sb append '{'
    for ((k, v) <- fields) {
      sb append k
      sb append ':'
      v(sb)
    }
    sb append '}'
  }
}

To be clear, usage would look like this:

def jsonToStringBuilderUsage(json: Json): String = {
  val sbToUnit = cata(jsonToStringSB)(json)
  val sb = new StringBuilder
  sbToUnit(sb)
  sb.toString()
}

A File System

Let's look at a more interesting example: a file system.

We'll start with a typical representation with hard-coded recursion.

sealed trait Entry
final case class Dir(files: Map[String, Entry]) extends Entry
final case class File(size: Long) extends Entry

This is an example inhabitant.

// Example of 3 files:
// 1. /usr/bin/find
// 2. /usr/bin/ls
// 3. /tmp/example.tmp
val example =
  Dir(Map(
    "usr" -> Dir(Map(
      "bin" -> Dir(Map(
        "find" -> File(197360),
        "ls" -> File(133688))))),
    "tmp" -> Dir(Map(
      "example.tmp" -> File(12)))))

Now let's create an API:

def totalFileSize(e: Entry): Long = e match {
  case File(s) => s
  case Dir(fs) => fs.values.foldLeft(0L)(_ + totalFileSize(_))
}

def countFiles(e: Entry): Int = e match {
  case File(_) => 1
  case Dir(fs) => fs.values.foldLeft(0)(_ + countFiles(_))
}

def countDirs(e: Entry): Int = e match {
  case File(_) => 0
  case Dir(fs) => fs.values.foldLeft(1)(_ + countDirs(_))
}

Looks great! SHIP IT! Ok so now people are using our super-awesome file system and associated API. One day a user wants to collect a bunch of stats and writes the following code:

final case class Stats(totalSize: Long, files: Int, dirs: Int)

def stats(e: Entry): Stats =
  Stats(totalFileSize(e), countFiles(e), countDirs(e))

The user then complains that their stats method takes 3 times as long as other operations. This is because each stat is produced by traversing the entire file system. 3 stats = 3 traversals. Now obviously, with the pure definitions given above, the extra time is going to be negligible but imagine it's a real file system here, maybe even one distributed over the network, all that drive/network/hardware cost to traverse the file system is likely to be very noticable and very significant when it's repeated 3 times.

What can the user do? Well... nothing. The only recourse they have is to raise an issue and complain. They have no control or power in this situation. They're at the mercy of the decisions made by the library authors.

After a few years of complaints, the authors of the super-awesome file system library do a big rewrite and create a new API that looks a little something like this...

final case class Stats(totalSize: Long, files: Int, dirs: Int)

def stats(e: Entry): Stats = e match {

  case File(fileSize) =>
    Stats(fileSize, 1, 0)

  case Dir(fs) =>
    fs.values.foldLeft(Stats(0, 0, 0)) { (statsAcc, entry) =>
      val b = stats(entry)
      Stats(
        statsAcc.totalSize + b.totalSize,
        statsAcc.files + b.files,
        statsAcc.dirs + b.dirs)
    }
}

def totalFileSize(e: Entry): Long =
  stats(e).totalSize

def countFiles(e: Entry): Int =
  stats(e).files

def countDirs(e: Entry): Int =
  stats(e).dirs

KICK-ASS! Now all stats are gathered in a single traversal; issue: resolved. Except there are two problems now where there once was one. First, over the next few versions of the library more and more stats (permissions, ownership, etc.) get added, and the stats function gets bigger, more complex and more wild. Second, now new complaints start coming in, complaints like "totalFileSize() significantly slower since new update", "dirCount() 4x slower than in v1.8.3". What's going on? Well, now in every traversal we're spending more time fetching more data that we don't need and end up throwing away. If a user only wants a directory count, the library spends oodles of time looking up attributes, group names, etc. only to discard them all at the end.

What can the user do? Well... again: nothing. The only recourse they have is to raise an issue and complain. They have no control or power in this situation. They're yet again at the mercy of the decisions made by the library authors.

Fixing our FileSystem

Start with the usual changes: type param, Functor, Fix:

sealed trait EntryF[+F]
final case class Dir[F](files: Map[String, F]) extends EntryF[F]
final case class File(size: Long) extends EntryF[Nothing]

object EntryF {
  implicit val functor: Functor[EntryF] = new Functor[EntryF] {
    override def map[A, B](fa: EntryF[A])(f: A => B): EntryF[B] = fa match {
      case f: File => f
      case Dir(fs) => Dir(fs.map { case (k, v) => (k, f(v)) })
    }
  }
}

type Entry = Fix[EntryF]

Now let's add a little DSL and re-create our sample value:

object Entry {
  def apply(f: EntryF[Entry]): Entry = Fix(f)
  def file(s: Long): Entry = apply(File(s))
  def dir(es: (String, Entry)*): Entry = apply(Dir(es.toMap))
}

// Example of 3 files:
// 1. /usr/bin/find
// 2. /usr/bin/ls
// 3. /tmp/example.tmp
val example =
  Entry.dir(
    "usr" -> Entry.dir(
      "bin" -> Entry.dir(
        "find" -> Entry.file(197360),
        "ls" -> Entry.file(133688))),
    "tmp" -> Entry.dir(
      "example.tmp" -> Entry.file(12)))

Next up are the queries. Small, reusable, independent functions are good practice for good reason; let's recreate what we had in the initial version:

val totalFileSize: FAlgebra[EntryF, Long] = {
  case File(s) => s
  case Dir(fs) => fs.values.sum
}

val countFiles: FAlgebra[EntryF, Int] = {
  case File(_) => 1
  case Dir(fs) => fs.values.sum
}

val countDirs: FAlgebra[EntryF, Int] = {
  case File(_) => 0
  case Dir(fs) => fs.values.sum + 1
}

If you compare this to the first attempt, it's just as simple from the outside, and even simpler in its internal implementation!

Now what happens when a user wants to get all three stats? F-Algebras compose. Here's a simple, reusable snippet to combine two F-algebras that share the same F:

def algebraZip[F[_], A, B](fa: FAlgebra[F, A],
                           fb: FAlgebra[F, B])
                          (implicit F: Functor[F]): FAlgebra[F, (A, B)] =
  fab => {
    val a = fa(fab.map(_._1))
    val b = fb(fab.map(_._2))
    (a, b)
  }

This is all the user needs to create their own stats gathering algebra:

val statsAlg: FAlgebra[EntryF, (Long, (Int, Int))] =
  algebraZip(totalFileSize, algebraZip(countFiles, countDirs))

The (Long, (Int, Int)) is a little ugly, let's clean it up a bit:

final case class Stats(totalSize: Long, files: Int, dirs: Int)

def stats(e: Entry): Stats = {
  val (totalSize, (files, dirs)) = cata(statsAlg)(e)
  Stats(totalSize, files, dirs)
}

Hoorah. Without any dependencies on the library authors, our user was able to choose the 3 stats they were interested in, and retrieve them all in a single file system traversal. Whilst not parallelism (wait for the next episode in this series), they've created their own concurrency!

Now contrast this with the previous incarnation. This time around, the user has the control, the user has the power. Library authors don't need to decide tradeoffs for everyone.

These are the advantages of providing algebras instead of higher-purpose functions. F-algebras (like val statsAlg) allow you to accrue power; that power is spent when you apply it (like def stats).

Personally speaking, when using all this stuff in my own work projects, I often find it nice to provide two sets of interfaces:

A low-level ecosystem of algebras, raw data types, logic, etc. Usually an entire package.
A high-level DSL that very cleanly exposes common high-level functions and hides all the algebra composition, calls to cata and other morphisms, etc. Usually a single object.

This approach allows all devs to get the most common types of work done without any knowledge of recursion schemes or theory. They just see a small, high-level DSL that's easy to skim, understand and use. It also serves as a good means of teaching recursion schemes because devs unfamiliar with it can then click through into the lower-level code and see real-world examples in a domain they're familiar with. This makes for a nice experience on teams with differing skill levels, both in terms of code quality and a path to gradually up-skill the team.

To be continued...

That's plenty for now.

If you like what I do and you'd like to support me, this series or any of my other projects, become a patron! It will make a big difference and help further more content. You can also follow me on Twitter for updates about the next post the series.

All source code available here.

Looking for a Scala recursion scheme library? There's a list in the FAQ.

Practical Awesome Recursion - Ch 01: Fixpoints

2017-11-20T18:15:00.001+11:00

This is chapter 1 in a series of blog posts about recursion schemes. This series uses Scala, and will focus on usage and applicability. It will be scarce in theory, and abundant in examples. The theory is valuable and fascinating, but I often find that knowing the theory alone is only half understanding. The other half of understanding comes from, and enables, practical application. I recommend you bounce back and forth between this series and theory. The internet is rich with blogs and videos on theory that explain it better than I would, so use those awesome resources.

Overview

The goal of this post is to prepare your data types so that you can abstrct over/away recursion, and be able to use all the generalisations that we'll explore in all future chapters in this series.

In order to prepare your data, there are three things to do:

Make the recursive positions in your data type abstract
Create a Functor for your data type
Wraps your data type in Fix[_]

Step 1. Remove recursion from the data type

Add a type parameter to your data type that will be used to represent recursion (and other things as you'll see in future chapters). Everywhere that your type references itself, replace the type with the new abstract type parameter.

`IntList`

Example #1: let's say you have your own hand-written cons-list, specialised to Int; maybe you want to avoid boxing with List[Int]. It might look like this:

// Before
sealed trait IntList
final case class IntCons(head: Int, tail: IntList) extends IntList
case object IntNil extends IntList

This type references itself in IntCons#tail. Let's fix that...

//  After (v1)
sealed trait IntList[F]
final case class IntCons[F](head: Int, tail: F) extends IntList[F]
final case class IntNil[F]() extends IntList[F]

Ok so now it never references itself, but we've had to modify the non-recursive case, IntNil, from an object to a case class with zero params. I'd prefer IntNil to remain an object for better ergonomics and efficiency so let's use variance.

(Note: Many Scala FP'ers avoid type variance entirely because it's kind of broken in theory, can lead to bugs in higher-kinded contexts, screws up implicits sometimes and breaks type inference sometimes too. I advise: learn about it, understand the tradeoffs and decide on a case-by-case basis.)

//  After (v2)
sealed trait IntList[+F]
final case class IntCons[+F](head: Int, tail: F) extends IntList[F]
case object IntNil extends IntList[Nothing]

`BinaryTree`

How about a binary tree with an abstract type A in the nodes:

// Before
sealed trait BinaryTree[+A]
final case class Node[+A](left: BinaryTree[A], value: A, right: BinaryTree[A]) extends BinaryTree[A]
case object Leaf extends BinaryTree[Nothing]

Here we replace the left and right branches.

// After
sealed trait BinaryTree[+A, +F]
final case class Node[+A, +F](left: F, value: A, right: F) extends BinaryTree[A, F]
case object Leaf extends BinaryTree[Nothing, Nothing]

Don't worry about preserving the A in the branches -- don't try to make it an F[A] -- just a plain old *-kinded F is all you need. Step 3 will ensure that As persist in both branches' children.

JSON

JSON is recursive too.

JSON has arrays of JSON, it has objects with JSON values, those values can be arrays that contains even more objects with nested arrays and... you get the picture.

// Before
sealed trait Json
object Json {
  case object      Null                               extends Json
  final case class Bool(value: Boolean)               extends Json
  final case class Str (value: String)                extends Json
  final case class Num (value: Double)                extends Json
  final case class Arr (values: List[Json])           extends Json
  final case class Obj (fields: List[(String, Json)]) extends Json
}

Only replace the self references; preserve the outer type. List[Json] should be List[F], not F.

// After
sealed trait Json[+F]
object Json {
  case object      Null                              extends Json
  final case class Bool  (value: Boolean)            extends Json[Nothing]
  final case class Str   (value: String)             extends Json[Nothing]
  final case class Num   (value: Double)             extends Json[Nothing]
  final case class Arr[F](values: List[F])           extends Json[F]
  final case class Obj[F](fields: List[(String, F)]) extends Json[F]
}

Step 2. Create a Functor

In this step we create a Functor for our data types. Functor is a type class that exists in the FP lib of your choice, namely Scalaz or Cats. It looks like this:

trait Functor[F[_]] {
  def map[A, B](fa: F[A])(f: A => B): F[B]
}

It empowers the world outside your data structure to change one of its abstract types, and by necessity, all values of that type. It's just like calling .map on a list:

// Change the values
List(1, 2, 3).map(_ * 10)
  // yields
  List[Int](10, 20, 30)

// Change the type too
List(1, 2, 3).map(i => s"[$i]")
  // yields
  List[String]("[1]", "[2]", "[3]")

This functor is how the generic recursion abstractions can have access to, and control over the recursive spots in your data.

(Note: If you want to use monadic variants of morphisms later, you'll need to upgrade your Functor to a Traverse. I'll show that in a future post but just keep it in mind.)

Let's create instances for our examples above. It doesn't matter if you use Scalaz or Cats, only the imports need to change. The code itself is identical.

`IntList`

Let the compiler guide you, it will only accept one implementation:

implicit val functor: Functor[IntList] = new Functor[IntList] {
  override def map[A, B](fa: IntList[A])(f: A => B): IntList[B] = fa match {
    case IntCons(head, tail) => IntCons(head, f(tail))
    case IntNil              => IntNil
  }
}

Note: You could also use an implicit object but it can cause implicit resolution problems later. I can't remember why/when anymore, it's just become habit.

// Also possible but leads to implicit resolution problems
implicit object IntListFunctor extends Functor[IntList] {
  override def map[A, B](fa: IntList[A])(f: A => B): IntList[B] = fa match {
    case IntCons(head, tail) => IntCons(head, f(tail))
    case IntNil              => IntNil
  }
}

Another side-note, do explicitly annotate the type.

If you don't the type will be a structural type which will slow down the compiler and mess with implicit resolution.
Explicit annotation will be mandatory in Scala v3 anyway.

// Don't do this
implicit val functor = new Functor[IntList] {

`BinaryTree`

As intended, this next case introduces a bit more complexity. We want to provide the ability to transform the recursive positions which means we need to keep the value: A position abstract and stable.

You'll need to use kind-projector to get the nice BinaryTree[A, ?] syntax, instead of the monstrous, out-of-the-box ({ type L[X] = BinaryTree[A, X] })#L syntax. If types and terms weren't so different, it'd be BinaryTree[A, _] just like the underscore in List(1,2,3).map(_ * 100).

implicit def functor[A]: Functor[BinaryTree[A, ?]] = new Functor[BinaryTree[A, ?]] {
  override def map[B, C](fa: BinaryTree[A, B])(f: B => C): BinaryTree[A, C] = fa match {
    case Node(left, value, right) => Node(f(left), value, f(right))
    case Leaf                     => Leaf
  }
}

There's nothing special about the F type over the A. You could also write a functor over the A type, i.e. a Functor[BinaryTree[?, F]] instance.

JSON

Here our Fs are embedded in other values so we have to work a little harder to transform them, but not much harder. Lists have functors too so it's easy peasy: just call .map.

implicit val functor: Functor[Json] = new Functor[Json] {
  override def map[A, B](fa: Json[A])(f: A => B): Json[B] = fa match {
    case Null        => Null
    case j: Bool     => j
    case j: Str      => j
    case j: Num      => j
    case Arr(values) => Arr(values.map(f))
    case Obj(fields) => Obj(fields.map { case (k, v) => (k, f(v)) })
  }
}

Step 3. The Fixpoint

In step 1, we went from a recursive data structure to a non-recursive data structure. Our goal is to be able to abstract over/away recursion, not to vanquish it. How do we regain our recursion? After all, we still want the users of our amazing IntList library to be able to store more than one element!

We've going to wrap our types in a magical fixpoint type. Doing so will give us back our recursion.

Here's a definition of Fix:

case class Fix[F[_]](unfix: F[Fix[F]])

Confused? This isn't a theory series but get a pen and paper, plug in IntList and expand the alias step-by-step; it'll clear it up real quick.

Exciting side note: Stephen Compall and Tomas Mikula found a way to define Fix without boxing! See how, here. I measured a 20-30% improvement and copied the approach in my own recursion library. Very awesome stuff.

Anyway, wrap your data types in Fix[_] and you get your recursion back.

type RecursiveIntList       = Fix[IntList]
type RecursiveBinaryTree[A] = Fix[BinaryTree[A, ?]]
type RecursiveJson          = Fix[Json]

...which is a bit long-winded and unpleasant. Let's rename things.

The most-common convention I've seen is to append an F for "functor" to the data type and use the proper name in the alias.

sealed trait IntListF[+F]
type IntList = Fix[IntListF]

sealed trait BinaryTreeF[+A, +F]
type BinaryTree[A] = Fix[BinaryTreeF[A, ?]]

sealed trait JsonF[+F]
type Json = Fix[JsonF]

This actually works out great because then, in practice, I often create a new object with helpers that improve ergonomics and avoid Fix boilerplate when you need to manually create a structure, for example, unit test expectations, or using a parser that doesn't play nice with recursion schemes.

For example:

type IntList = Fix[IntListF]

object IntList {

  // Helpful cos Scala's type inference fails so often
  def apply(f: IntListF[IntList]): IntList =
    Fix(f)

  def nil: IntList =
    apply(IntNil)

  def cons(head: Int, tail: IntList): IntList  =
    apply(IntCons(head, tail))

  def fromList(is: Int*): IntList  =
    is.foldRight(nil)(cons)
}

Done!

That's it! We're done. It may seem like a lot of work but there's benefit coming that greatly outweighs the cost. The cost isn't huge anyway, just a bit of one-time-only boilerplate.

All source code available here: https://github.com/japgolly/learning/tree/public/recursion-blog/example/src/main/scala/japgolly/blog/recursion

If you like what I do and you'd like to support me, this series or any of my other projects, become a patron! It will make a big difference to me and help further more content. You can also follow me on Twitter for updates about the next post the series.

Dependently-Typed Functions

2017-06-05T21:27:00.002+10:00

It's been a while since my last blog post. This time around I'm going to show how to do something in Scala that you might first think would be very straight-forward but unfortunately, at least in Scala ≤ 2.12.2, requires a bit of hoop-jumping: dependently-typed functions.

Consider the following data type:

sealed trait Field { type Value }
object Field {
  case object Name extends Field { type Value = String }
  case object Age  extends Field { type Value = Int }
}

With this data type, you have the following path-dependent types:

Name.Value == String
Age .Value == Int

val n = Name
n.Value == String

You also have the following type projections:

Name .type#Value = String
Age  .type#Value = Int
Field     #Value = Any

1. A Dependently-Typed Method

Lets say you want to create the following, rather innocent-looking function:

def emptyValue(f: Field): f.Value

Likely your first attempt will be to pattern-match:

def emptyValue(f: Field): f.Value =
  f match {
    case Field.Name => ""
    case Field.Age  => 0
  }

Unfortunately scalac doesn't support this and you instead get compilation errors:

<console>:15: error: type mismatch;
 found   : String("")
 required: f.Value
           case Field.Name => ""
                              ^
<console>:16: error: type mismatch;
 found   : Int(0)
 required: f.Value
           case Field.Age  => 0
                              ^

Ok, let's switch to a different representation:

type Aux[A] = Field { type Value = A }

def emptyValueHack[A](f: Aux[A]): A =
  f match {
    case Field.Name => ""
    case Field.Age  => 0
  }

def emptyValue(f: Field): f.Value =
  emptyValueHack[f.Value](f)

This does work. Great! END BLOG POST. Right? Wrong. There's another problem. What happens if we forget a case:

def emptyValueHack2[A](f: Aux[A]): A =
  f match {
    case Field.Name => ""
    // case Field.Age  => 0
  }

It's compiles successfully without any warnings. We've lost exhaustivity checking which is a runtime exception just waiting to happen. Oh noes. Now what? Consider: what are we doing when we do simple pattern-matching on each case? We're inspecting a value, choosing a corresponding function, and executing it. If each case has exactly one corresponding function then we have exhaustivity. That sounds like something we already have the means to do, without pattern-matching, in plain old Scala.

Ok, let do this ourselves, exactly one function per case and choose depending on the Field.

sealed trait Field {
  type Value
  def fold(n: Field.Name.type => Field.Name.Value,
           a: Field.Age .type => Field.Age .Value): Value
}

object Field {
  case object Name extends Field {
    override type Value = String
    override def fold(n: Field.Name.type => Field.Name.Value, a: Field.Age.type => Field.Age.Value): Value =
      n(this)
  }

  case object Age extends Field {
    override type Value = Int
    override def fold(n: Field.Name.type => Field.Name.Value, a: Field.Age.type => Field.Age.Value): Value =
      a(this)
  }
}

Take a good look at this. There are a few things that might seem odd. You're probably wondering why I'm passing statically-known singletons to arguments. Why not just:

def fold(name: => Field.Name.Value,
         age : => Field.Age .Value): Value

Three reasons:

Cases aren't always objects. What if they're case classes like case class CustomField(label: String) extends Field. In such a case it will be important to pass the instance to the caller so that they have access to the additional/dynamic information in the label field.
It embodies the proof that the appropriate argument case is required for each field case. The types make it clear and so long as you call the arg with this instead of Field.Name directly, then you get an extra bit of proof of correctness. Seeing as the workarounds described in this post introduce some tedium so too do they often introduce copy-pasting which can lead to accidental bugs like this:

// Spot the bug...
case object Name extends Field {
  override type Value = String
  override def fold(n: Field.Name.type => Field.Name.Value, a: Field.Age.type => Field.Age.Value): Value =
    n(Name)
}
case object Age extends Field {
  override type Value = Int
  override def fold(n: Field.Name.type => Field.Name.Value, a: Field.Age.type => Field.Age.Value): Value =
    n(Name)
}

The caller has the same problem as above; if they're pattern-matching and want an downcast instance of Field in their case functions then this gives it to them so that they don't manually reference Field.Name and end up with potential bugs.

2. Reducing Duplication

Next, let's think about what happens if we add a new case like Field.Address; we'll have to update the fold signature in our current three places and then add a fourth in Field.Address. So much repetition! If you find yourself copy-pasting a cumbersome list of method arguments, consider it a good practice to encapsulate them all in a class/record. That way you can consistently pass around and declare a single type, and changes to its fields don't cause changes to method signatures. Let's do that for our fold method:

sealed trait Field {
  type Value
  def fold(f: Field.Fold): Value
}

object Field {
  case object Name extends Field {
    override type Value = String
    override def fold(f: Field.Fold) = f.name(this)
  }

  case object Age extends Field {
    override type Value = Int
    override def fold(f: Field.Fold) = f.age(this)
  }

  final case class Fold(name: Field.Name.type => Field.Name.Value,
                        age : Field.Age .type => Field.Age .Value)
}

Look at that fold method, nice and easy. We can add cases ALL DAY LONG without repetition. High five thineself.

While we're in the vicinity, let's add a little convenience method to the Fold class to really prove to ourselves that can accomplish our original goal, pay attention to signature:

final case class Fold(name: Field.Name.type => Field.Name.Value,
                      age : Field.Age .type => Field.Age .Value) {
  def apply(f: Field): f.Value =
    f.fold(this)                        
}

Now using the above, how can we rewrite our original emptyValue function?

def emptyValue(f: Field): f.Value =
  Field.Fold(
    name = _ => "",
    age  = _ => 0,
  )(f)

Goals: accomplished.

Concise
Exhaustive
Pattern-match in spirit

It's a little ugly but not a huge deal. Depending on your codebase and environment you can likely avoid the need for _ => bits without penalty which will make it a little nicer on the eyes.

3. A Dependently-Typed Function

Like most everything from OOP, methods are boring. You can't abstract over them, all you can do is call them. Functions allow you to do more because they themselves are values; you can pass them around which facilitates awesome collections methods like .map(A => B), .filter(A => Boolean), etc. and that's just the tip of the iceberg.

In the previous step we created a method whose output depends on its input value. How can we have the same in the form of a function? Function types are fixed, and the function type doesn't have a value to work with. Scala doesn't accept this kind of syntax:

// nope - not valid syntax
type MyFn = (f: Field) => f.Value

You might try using projections like this but you'll lose the relationship between input and output.

type MyFn = Field => Field#Value
// which is the same as
type MyFn = Field => Any

We've already encountered the answer. Surprise! Field.Fold is what we're looking for. It's a value, you can pass it around; you can apply it via its apply method to obtain a dependently-typed result.

Let's try it out:

def blah(field: Field, getValue: Field.Fold): field.Value =
  getValue(field)

Hmmm, yes, well it does work but it's admittedly not very interesting in that shape. It can only represent f => f.Value...

4. Dependently-Typed Functions

What if we want a function of any shape instead of just f => f.Value? Maybe you want to be able to print each field to the screen: f => f.Value => Unit. Maybe you want to reduce lists of each value to a single value: f => List[f.Value] => f.Value. Or perform validation: f => f.Value => Either[Error, f.Value].

Let's find an abstraction that can represent each example above. They all have the shape: f => {something including f.Value}. Add some type aliases for the right-hand sides:

type GetValue  [Value] = Value
type Print     [Value] = Value => Unit
type ReduceList[Value] = List[Value] => Value
type Validate  [Value] = Value => Either[Error, Value]

There's our abstraction:

type F[Value] = // various

with the fold being various cases of f => F[f.Value] where F[Value] = ….

Let's update the fold to allow this:

final case class Fold[F[_]](name: Field.Name.type => F[Field.Name.Value],
                            age : Field.Age .type => F[Field.Age .Value]) {
  def apply(f: Field): F[f.Value] =
    f.fold(this)                        
}

Now let's try it out:

// f: Field => f.Value
val getValue = Field.Fold[GetValue](
  name = _ => "George",
  age  = _ => 99)

// f: Field => f.Value => Unit
val printValue = Field.Fold[Print](
  name = _ => n => println(s"My name is $n."),
  age  = _ => i => println(s"I am $i years old."))

Great. Dependently-typed function values. And how do we use them? Just like normal functions.

val f = Field.Age
val v = getValue(f) // v = 99
printValue(f)(v)    // prints: I am 99 years old.

Very good. I hope you found the interesting. Thanks for reading!

For more exhilaration, as an exercise to the reader, try to:

define composition of fold functions
create multiple folds for subsets of the same data type

Appendix

Type parameters

In terms of representation, using type parameters is isomorphic (ignoring variance). The following represents the exact same information:

sealed trait Field[V]
object Field {
  case object Name extends Field[Int]
  case object Age  extends Field[String]
}

In terms of usage however, the two representations are far from the same and have a drastic impact on how you'll use them. Some things are easier, some harder.

For example, in terms of generic access existential types make life easier: its simply Field instead of Field[_]. This is even more evident when there's a constraint on the type like Field[_ <: AnyRef] whilst Field remains the same.

I've come across other scenarios but honestly can't remember right now, something something implicits... Instead take a look at this slide deck of @julienrf's for more comparison: https://julienrf.github.io/2017/existential-types/

Code

The final code in full of this blog post is available here: https://gist.github.com/japgolly/162f102d516abf86b54359ef0b1d3b65

Testing Scala.JS on Firefox & Chrome from SBT

2016-03-20T16:04:00.001+11:00

It's been fantastic being able to write Scala and compile it to JavaScript thanks to Scala.JS. Going further, Scala.JS lets you write accompanying unit tests, run them from SBT, and choose a target environment from {Rhino, Node.JS, Phantom.JS}. If you needed DOM in your testing, your only option was Phantom.JS which seems great at first but—because it only simulates DOM—you soon discover that there are many cases in which its behaviour diverges from normal browsers (eg. DOM types for <td> tags), or isn't supported at all (text selection, anything to do with focus, more). Oh, and it's also riddled with bugs, many significant and long-standing. So while Phantom.JS's effort is appreciated, it's no substitute for a real browser. This was where the story ended until very recently.

Recently, the Scala.JS team released scala-js-env-selenium which allows you to use Selenium in the same way you would the other JS environments. That means that Scala.JS can now interface with real browsers, namely Firefox and Chrome. Awesome!

The next step, and the purpose of this blog post, is to effectively integrate it into your SBT/Scala/Scala.JS project. Let me tell you about my ideal environment and then I'll show you how to achieve it.

Goals

In my ideal environment:

I write my Scala.JS unit tests the same way I already do, and I can continue to run them against Node.JS or Phantom.JS because it's fast. No changes there.
To run tests against Firefox or Chrome, I simply prefix my SBT command with firefox: or chrome:.
For example, firefox:test would run tests in Firefox, chrome:testOnly a.b.c would run my a.b.c test in Chrome.
Testing in FF/Chrome doesn't require recompilation (All in the Land know Scalac is slow). It should use all of the same bits and bobs (especially the output JS) that normal tests use.
I specify that certain tests will only run in FF and/or Chrome.
For example, tests that use focus should skip Phantom.JS where I know focus doesn't work.
I run testAll to run the same tests in all environments (fast-env & Firefox & Chrome). This will happen concurrently.
I can use FF/Chrome headlessly (i.e. without the windows popping up when the browsers are launched and running).

1. Selenium support.

Add this to your project/plugins.sbt:

libraryDependencies += "org.scala-js" %% "scalajs-env-selenium" % "0.1.1"

Then install ChromeDriver which is needed so that Selenium can interface with Chrome.

2. General SBT Config

Create a file called project/InBrowserTesting.scala with the following content: This creates SBT configurations for each browser, then delegates most of the settings to the test:* settings.

3. Project-specific SBT config

Next you need to look at your existing SBT build settings.

For each cross-JVM/JS project, add:

  .configure(InBrowserTesting.cross)

For each JS-only project, add:

  .configure(InBrowserTesting.js)

For each JVM-only project, add:

  .configure(InBrowserTesting.jvm)

You might be wondering why JVM projects need any configuration at all. It's so that when testAll is run from a JVM project or the root project, you want it to run the JVM tests. Without this setting, testAll in a JVM project would do nothing at all, and testAll from the root would only run the JS tests.

4. Environment-Dependent Tests

I often forget that I'm writing JS when I write Scala.JS. As we're in JS land, to determine our environment all we have to do is check the user-agent. How you skip tests depends on the test framework you're using but all you have to do is put something like this in your test code:

  if (JsEnvUtils.isRealBrowser) {
    // test here
  } else {
    // skip
  }

5. Headlessness

Super simple (unless you're a Windows user). Install xvfb, "X Virtual FrameBuffer", which starts X without a graphics display.

Here are two different ways you can use it:

Either start it in a separate window via
```
Xvfb :1
```
or in the background
```
nohup Xvfb :1 &
```
then launch SBT like this:
```
DISPLAY=:1 sbt
```
This tip comes from danielkza (thanks!). You can simply prepend your SBT command with xvfb-run -a to have an X server spun up on demand without the need to start it yourself. Beware though, that xvfb-run is a bit naive and susceptible to race conditions so while it'll be fine on your local machine, it may cause you problems on your CI server or similar.

You'll no longer see any Firefox and Chrome windows; all your output just appears in the SBT console as usual. Too easy.

Note: The :1 indicates the X display-number, which is a means to uniquely identify an X server on a host. The 1 is completely arbitrary—you can choose any number so long as there isn't another X server running that's already associated with it.

Done.

All goals described are thus achieved.
I hope you found this helpful.
Happy coding!

Zero-overhead Recursive ADT Coproducts

2015-02-23T19:18:00.000+11:00

Zero-product Recursive AMD what???

Ok. Imagine this: you're building some app, and in certain parts users can type text with special tokens. (It's like in Twitter, when typing “Hello @japgolly” the “@japgolly” part gets special treatment.) You parse various types of tokens. You might have different locations to type text, and rules about which tokens are allowed in each location. You want the compiler to enforce those rules but you also want to handle tokens generically sometimes. How would you do such a thing in Scala?

Initial Attempts

Ideally you'd define an ADT (algebraic data type) for all tokens possible, then create new types for each location that form a subset of tokens allowed. If that were possible, here's what that would look like.

sealed trait Token
case class  PlainText(t: String)  extends Token
case object NewLine               extends Token
case class  Link(url: String)     extends Token
case class  Quote(q: List[Token]) extends Token

// type BlogTitle   = PlainText | Link    | Quote[BlogTitle]

// type BlogComment = PlainText | NewLine | Quote[BlogComment]

Now Scala won't let us create our BlogTitle type like shown above. It doesn't have a syntax for coproducts (which is what BlogTitle and BlogComment would be, also called “disjoint unions” and “sum-types”) over existing types. Seeing as we have control over the definition of the generic tokens, we can be tricky and inverse the declarations like this:

sealed trait BlogTitle
sealed trait BlogComment

sealed trait Token
case class  PlainText(t: String) extends Token with BlogTitle with BlogComment
case object NewLine              extends Token                with BlogComment
case class  Link(url: String)    extends Token with BlogTitle
case class  Quote(q: List[????]) extends Token with BlogTitle with BlogComment
//                        ↑ hmmm...

...but as you can see, we hit a wall when we to Quote, which is recursive. We want a Quote in a BlogTitle to only contain BlogTitle tokens, not just any type of token. We can continue our poor hack as follows.

abstract class Quote[A <: Token] extends Token { val q: List[A] }
case class QuoteInBlogTitle  (q: List[BlogTitle])   extends Quote[BlogTitle]
case class QuoteInBlogComment(q: List[BlogComment]) extends Quote[BlogComment]

Not pleasant. And we're not really sharing types anymore. What else could we do?

We could create separate ADTs for BlogTitle and BlogComment, that would mirror and wrap their matching generic tokens, then write converters from specific to generic. That's a lot of duplicated logic and tedium, plus we now double the allocations and memory needed. Let's try something else...

Shapeless

NOTE: This bit about Shapeless is an interesting detour, but it can be skipped.

We could use Shapeless! Shapeless is an ingeniously-sculpted library that facilitates abstractions that a panel of sane experts would deem impossible in Scala, one such abstraction being Coproducts. Here's what a solution looks like using Shapeless.

(Sorry I thought I had this working but just realised recursive coproducts don't work. I've commented-out Quote[A] for now. There is probably a way of doing this – Shapeless often doesn't take no from scalac for an answer – I'll update this if some kind 'netizen shares how.)

sealed trait Token
case class  PlainText(text: String) extends Token
case object NewLine                 extends Token
case class  Link(url: String)       extends Token
case class  Quote[A](q: List[A])    extends Token

type BlogTitle   = PlainText :+: Link         :+: /*Quote[BlogTitle]   :+: */ CNil
type BlogComment = PlainText :+: NewLine.type :+: /*Quote[BlogComment] :+: */ CNil

/* compiles → */ val title = Coproduct[BlogTitle](PlainText("cool"))
// error    → // val title = Coproduct[BlogTitle](NewLine)

So far so good. What would a Token ⇒ Text function look like?

object ToText extends Poly1 {
  implicit def caseText    = at[PlainText   ](_.text)
  implicit def caseNewLine = at[NewLine.type](_ => "\n")
  implicit def caseLink    = at[Link        ](_.url)
  // ...
}

val text: String = title.map(ToText).unify

Ok I'm a little unhappy because I'm very fold of pattern-matching in these situations, but the above does work effectively. One thing to be aware of with Shapeless, is how it works. To achieve its awesomeness, it must build up a hierarchy of proofs which incurs time and space costs at both compile- and run-time – its awesomeness ain't free. The val title = ... statement above creates at least 7 new classes at runtime, where I want 1. Depending on your usage and needs, that overhead might be nothing, but it might be significant. It's something to be aware of when you decide on your solution.

Zero-overhead Recursive ADT Coproducts

There's another way. I mentioned “zero-overhead” and it can be done. Here is a different solution that relies solely on standard Scala features, one such feature being path-dependent types.

You can create an abstract ADT, putting each constituent in a trait, then simply combine those traits into an object to have it reify a new, concrete, sealed ADT. Sealed! Let's see the new definition:

// Generic

sealed trait Base {
  sealed trait Token
}

sealed trait PlainTextT extends Base {
  case class PlainText(text: String) extends Token
}

sealed trait NewLineT extends Base {
  case class NewLine() extends Token
}

sealed trait LinkT extends Base {
  case class Link(url: String) extends Token
}

sealed trait QuoteT extends Base {
  case class Quote(content: List[Token]) extends Token
}

// Specific

object BlogTitle   extends PlainTextT with LinkT    with QuoteT
object BlogComment extends PlainTextT with NewLineT with QuoteT

Now let's use it:

   List[BlogTitle.Token](BlogTitle.PlainText("Hello")) // success
// List[BlogTitle.Token](BlogTitle.NewLine)   ← error: BlogTitle.NewLine doesn't exist
// List[BlogTitle.Token](BlogComment.NewLine) ← error: BlogComment tokens aren't BlogTitle tokens


// Specific
val blogTitleToText: BlogTitle.Token => String = {
  // case BlogTitle.NewLine   => ""     ← error: BlogTitle.NewLine doesn't exist
  // case BlogComment.NewLine => ""     ← error: BlogComment tokens not allowed
  case BlogTitle.PlainText(txt) => txt
  case BlogTitle.Link(url)      => url
  // Compiler warns missing BlogTitle.Quote(_) ✓
}

// General
val anyTokenToText: Base#Token => String = {
  case a: PlainTextT#PlainText => a.text
  case a: LinkT     #Link      => a.url
  case a: NewLineT  #NewLine   => "\n"
  // Compiler warns missing QuoteT#Quote ✓
}


// Recursive types
val t: BlogTitle  .Quote => List[BlogTitle  .Token] = _.content
val c: BlogComment.Quote => List[BlogComment.Token] = _.content
val g: QuoteT     #Quote => List[Base       #Token] = _.content

Look at that! That is awesome. These are some things that we get:

No duplicated definitions or logic.
Generic & specific hierarchies are sealed, meaning the compiler will let you know when you forget to cater for a case, or try to cater for a case not allowed.
Children of recursive types have the same specialisation.
Eg. a BlogTitle can only quote using BlogTitle tokens.
Tokens can be processed generically.
Zero-overhead. No additional computation or new memory allocation needed to store tokens, or move them into a generic context. No implicits.
Nice, neat pattern-matching which makes me happy.
It's just plain ol' Scala traits so you're free to encode more constraints & organisation. You can consolidate traits, add type aliases, all that jazz.

There you go. Seems like a great solution to this particular scenario.

Nothing is without downsides though. Creation will likely be a little hairy; imagine writing a serialisation codec – the Generic ⇒ Binary part will be easy but Binary ⇒ Specific will be more effort. In my case I will only create this data thrice {serialisation, parsing, random data generation} but read and process it many, many times. Good tradeoff.

An Example of Functional Programming

2014-09-28T21:56:00.000+10:00

Many people, after reading my previous blog post, asked to see a practical example of FP with code. I know it's been a few months – I actually got married recently; wedding planning is very time consuming – but I've finally come up with an example. Please enjoy.

Most introductions to FP begin with pleas for immutability but I'm going to do something different. I've come up with a real-world example that's not too contrived. It's about validating user data. There will be 5 incremental requirements that you can imagine coming in sequentially, each one building on the other. We'll code to satisfy each requirement incrementally without peeking at the following requirements. We'll code with an FP mindset and use Scala to do so but the language isn't important. This isn't about Scala. It's about principals and a perspective of thought. You can write FP in Java (if you enjoy pain) and you can write OO in Haskell (I know someone who does this and baffles his friends). The language you use affects the ease of writing FP, but FP isn't bound to or defined by any one language. It's more than that. If you don't use Scala this will still be applicable and useful to you.

I know many readers will have programming experience but little FP experience so I will try to make this as beginner-friendly as possible and omit using jargon without explanation.

Req 1. Reject invalid input.

The premise here is that we have data and we want to know if it's valid or not. For example, suppose we want ensure a username conforms to certain rules before we accept it and store it in our app's database.

I'm championing functional programming here so let's use a function! What's the simplest thing we need to make this work? A function that takes some data and returns whether it's valid or not: A ⇒ Boolean.

Well that's certainly simple but I'm going to handwaveily tell you that primitives are dangerous. They denote the format of the underlying data but not its meaning. If you refactor a function like

blah(dataWasValid: Boolean, hacksEnabled: Boolean,
killTheHostages: Boolean)

the compiler isn't going to help you if you get the arguments wrong somewhere. Have you ever had a bug where you used the ID of one data object in place of another because they were both longs? Did you hear about the NASA mission that failed because of mixed metric numbers (eg. miles and kilometers) being indistinguishable?

So let's address that by first correcting the definition of our function. We want a function that takes some data and returns ~~whether it's valid or not~~ an indication of validity: A ⇒ Validity.

sealed trait Validity
case object Valid extends Validity
case object Invalid extends Validity

type Validator[A] = A => Validity

We'll also create a sample username validator and put it to use. First the validator:

val usernameV: Validator[String] = {
  val p = "^[a-z]+$".r.pattern
  s => if (p.matcher(s).matches) Valid else Invalid
}

Now a sample save function:

def example(u: String): Unit =
  usernameV(u) match {
    case Valid   => println("Fake-saving username.")
    case Invalid => println("Invalid username.")
  }

There's a problem here. Code like this will make FP practitioners cry and for good reason. How would we test this function? How could we ever manipulate or depend on what it does, or its outcome? The problem here is “effects” and unbridled, they are anathema to healthy, reusable code. An effect is anything that affects anything outside the function it lives in, relies on anything impure outside the function it lives in, or happens in place of the function returning a value. Examples are printing to the screen, throwing an exception, reading a file, reading a global variable.

Instead we will model effects as data. Where as the above example would either 1) print “Fake-saving username” or 2) print “Invalid username”, we will now either 1) return an effect that when invoked, prints “Fake-saving username”, or 2) return a reason for failure.

We'll define our own datatype called Effect, to be a function that neither takes input nor output.

(Note: If you're using Scalaz, scalaz.effect.IO is a decent catch-all for effects.)

type Effect = () => Unit
def fakeSave: Effect = () => println("Fake save")

Next, Scala provides a type Either[A,B] which can be inhabited by either Left[A] or Right[B] and we'll use this to return either an effect or failure reason.

Putting it all together we have this:

def example(u: String): Either[String, Effect] =
  usernameV(u) match {
    case Valid   => Right(fakeSave)
    case Invalid => Left("Invalid username.")
  }

Req 2. Explain why input is invalid.

We need to specify a reason for failure now.

We still have two cases: valid with no error msg, invalid with an error msg. We'll simply add an error message to the Invalid case.

case class Invalid(e: String) extends Validity

Then we make it compile and return the invalidity result in our example.

 val usernameV: Validator[String] = {
   val p = "^[a-z]+$".r.pattern
-  s => if (p.matcher(s).matches) Valid else Invalid
+  s => if (p.matcher(s).matches) Valid else
+         Invalid("Username must be 1 or more lowercase letters.")
 }
 
 def example(u: String): Either[String, Effect] =
   usernameV(u) match {
     case Valid      => Right(fakeSave)
-    case Invalid    => Left("Invalid username.")
+    case Invalid(e) => Left(e)
   }

Req 3. Share reusable rules between validators.

Imagine our system has 50 data validation rules, 80% reject empty strings, 30% reject whitespace characters, 90% have maximum string lengths. We like reuse and D.R.Y. and all that; this requirement addresses that by demanding that we break rules into smaller constituents and reuse them.

We want to write small, independent units and join then into larger things. This leads us to an important and interesting topic: composability.

I want to suggest something that I know will cause many people to cringe – but hear me out – let's look to math. Remember basic arithmetic from ye olde youth?

8 = 5 + 3
8 = 5 + 1 + 2

Addition. This is great! It's building something from smaller parts. This seems like a perfect starting point for composition to me. There's a certain beauty and elegance to math, and its capability is proven; what better inspiration!

Let's look at some basic properties of addition.

Property #1:

8 = 8 + 0

8 = 0 + 8

Add 0 to any number and you get that number back unchanged.

Property #2:

8 = (1 + 3) + 4

8 = 1 + (3 + 4)

8 = 1 + 3 + 4

Parentheses don't matter. Add or remove them without changing the result.

Property #3:

I'll also mention that in primary school, you had full confidence in this:

number + number = number

It may seem silly to mention, but imagine if your primary school teacher told you that

number + number = number | null | InvalidArgumentException

+ has other properties too, like 2+6=6+2 but we don't want that for our scenario with validation. The above three provide enough benefit for what we need.

You might wonder why I'm describing these properties. Why should you care? Well as programmers you gain much by writing code with similar properties. Consider...

You know you don't have to remember to check for nulls, catch any exceptions, worry about our internal AddService™ being online.
As long as the overall order of elements is preserved, you needn't care about the order in which groups are composed. i.e. we know that a+b+c+d+e will safely yield the same result if we batch up execution of (a+b) and (c+d+e) then add their results last. And parenthesis support is already provided by the programming language.
If ever forced into composition by some code path and you can opt out by specifying the 0 because we know that 0+x and x+0 are the same as x. No need to overload methods or whatnot.

Simple right? Well have you ever heard the term “monoid” thrown around? (Not “monad”.) Guess what? We've just discussed all that makes a monoid what it is and you learned it as a young child.

A monoid is a binary operation (x+x=x) that has 3 properties:

Identity: The 0 is what we call an identity element. 0+x = x = x+0
Associativity: That's the ability to add/remove parentheses without changing the result.
Closure: Always returns a result of the same type, no RuntimeExceptions, no nulls.

If jargon from abstract algebra intimidates you, know that it's mostly just terminology. You already know the concepts and have for years. The knowledge is very accessible and it's incredibly useful to be able to identify these kinds of properties about your code.

Speaking of code, let's implement this new requirement as a monoid. We'll add Validator.+ for composition and ensure it preserves the associativity property, and Validator.id for identity (also called zero).

(Note: If using Scalaz, Algebird or similar, you can explicitly declare your code to be a monoid to get a bunch of useful monoid-related features for free.)

case class Validator[A](f: A => Validity) {
  @inline final def apply(a: A) = f(a)

  def +(v: Validator[A]) = Validator[A](a =>
    apply(a) match {
      case Valid         => v(a)
      case e@ Invalid(_) => e
    })
}

object Validator {
  def id[A] = Validator[A](_ => Valid)
}

The difficulty of building human-language sentences scales with expressiveness. For our demo it's enough to simply have validators contain error message clauses like “is empty”, “must be lowercase” and just tack the subject on later.

First we define some helper methods pred and regex, then use them to create our validators

object Validator {
  def pred[A](f: A => Boolean, err: => String) =
    Validator[A](a => if (f(a)) Valid else Invalid(err))

  def regex(r: java.util.regex.Pattern, err: => String) =
    pred[String](a => r.matcher(a).matches, err)
}

val nonEmpty = Validator.pred[String](_.nonEmpty, "must be empty")
val lowercase = Validator.regex("^[a-z]*$".r.pattern, "must be lowercase")

val usernameV = nonEmpty + lowercase

Then we gaffe our subject to on to our error messages before displaying it and we're done.

def buildErrorMessage(field: String, err: String) = s"$field $err"

def example(u: String): Either[String, Effect] =
  usernameV(u) match {
    case Valid      => Right(fakeSave)
    case Invalid(e) => Left(buildErrorMessage("Username", e))
  }

Req 4. Explain all the reasons for rejection.

Users are complaining that they get an error message, fix their data accordingly only to have it then rejected for a different reason. They again fix their data and it is rejected again for yet another reason. It would be better to inform the user of all the things left to fix so they can amend their data in one shot.

For example, an error message could look like “Username 1) must be less than 20 chars, 2) must contain at least one number.”

In other words there can be 1 or more reasons for invalidity now. Ok, we'll amend Invalid appropriately...

case class Invalid(e1: String, en: List[String]) extends Validity

Then we just make the compiler happy...

 case class Validator[A](f: A => Validity) {
   @inline final def apply(a: A) = f(a)
 
   def +(v: Validator[A]) = Validator[A](a =>
-    apply(a) match {
-      case Valid         => v(a)
-      case e@ Invalid(_) => e
-    })
+    (apply(a), v(a)) match {
+      case (Valid          , Valid          ) => Valid
+      case (Valid          , e@ Invalid(_,_)) => e
+      case (e@ Invalid(_,_), Valid          ) => e
+      case (Invalid(e1,en) , Invalid(e2,em) ) => Invalid(e1, en ::: e2 :: em)
+    })
 }
 
 object Validator {
   def pred[A](f: A => Boolean, err: => String) =
-    Validator[A](a => if (f(a)) Valid else Invalid(err))
+    Validator[A](a => if (f(a)) Valid else Invalid(err, Nil))
 }
 
-def buildErrorMessage(field: String, err: String) = s"$field $err"
+def buildErrorMessage(field: String, h: String, t: List[String]): String = t match {
+  case Nil => s"$field $h"
+  case _   => (h :: t).zipWithIndex.map{case (e,i) => s"${i+1}) $e"}.mkString(s"$field ", ", ", ".")
+}
 
 def example(u: String): Either[String, Effect] =
   usernameV(u) match {
     case Valid         => Right(fakeSave)
-    case Invalid(e)    => Left(buildErrorMessage("Username", e))
+    case Invalid(h, t) => Left(buildErrorMessage("Username", h, t))
   }

(Note: If you're using Scalaz, NonEmptyList[A] is a better replacement for A, List[A] like I've done in Invalid. The same thing can also be achieved by OneAnd[List, A]. In fact OneAnd is a good way to have compiler-enforced non-emptiness.)

Req 5. Omit mutually-exclusive or redundant error messages.

Take the this error message: “Your name must 1) include a given name, 2) include a surname, 3) not be empty”. If the user forgot to enter their name you just want to say “hey you forgot to enter your name”, not bombard the user with details about potentially invalid names.

What does this mean? It means one rule is unnecessary if another rule fails. What we're really talking about here is the means by which rules are composed. Let's just add another composition method. We talked about the + operation in math already, well math also provides a multiplication operation too. Look at an expression like 6 + 14 + (7 * 8). Two types of composition, us explicitly clarifying our intent via parentheses. That's perfectly expressive to me and it solves our new requirement with simplicity and minimal dev. As a reminder that we can borrow from math without emulating it verbatim, instead of a symbol let's give this operation a wordy name like andIfSuccessful so that we can say nonEmpty andIfSuccessful containsNumber to indicate a validator that will only check for numbers if data isn't empty.

Just like these express different intents and yield different results

number = 4 * (2 + 10)

number = (4 * 2) + 10

So too can

rule = nonEmpty andIfSuccessful (containsNumber and isUnique)

rule = (nonEmpty andIfSuccessful containsNumber) and isUnique

Or if you don't mind custom operators

rule = nonEmpty >> (containsNumber + isUnique)

rule = (nonEmpty >> containsNumber) + isUnique

To implement this new requirement we add a single method to Validator:

def andIfSuccessful(v: Validator[A]) = Validator[A](a =>
  apply(a) match {
    case Valid           => v(a)
    case e@ Invalid(_,_) => e
  })

Conclusion

And we're done.
It's not how I would've approached code years back in my OO/Java era, nor is it like any of the code I came across written by others in that job. As an experiment I started fulfilling these requirements in Java the way old me used to code and there was a loooot of wasted code between requirements. I'd get all annoyed at each new step, so much so that I didn't even bother finishing. On the contrary, I enjoyed writing the FP.

Right, what conclusions can we draw?

FP is simple. Each validation is a single function in a wrapper.

FP is flexible. Logic is reusable and can be assembled into complex expressions easily.

FP is easily maintainable & modifiable. It has less structure, less structural dependencies, and is less code, plus the compiler's got your back.

FP is easy on the author. There was next to no rewriting or throw-away of code between requirements, and each new requirement was easy to implement.

I hope this proves an effective concrete example of FP for programmers of different backgrounds. I also hope this enables you to write more reliable software and have a happier time doing it.

Go forth and function.

(Source code)

A Year of Functional Programming

2014-06-09T18:53:00.000+10:00

It's been a year since I first came across the concept of functional programming. To say it's changed my life, is an unjust understatement. This is a reflection on that journey.

Warning: I use the term FP quite loosely throughout this article.

Where I Was

I've been coding since the age of 8, so 26 years now. I started with BASIC & different types of assembly then moved on to C, C++, PHP and Perl (those were different times, maaaan ☮), Java, JavaScript, Ruby. That's the brief gist of it. Basically: a lot of time on the clock and absolutely no exposure to FP concepts or languages. I thought functional programming just meant recursive functions (ooooo big deal right?). I really came in blind.

How It Started: Scala

Last year, I wanted the speed and static checking (note: not “types”) of Java with the conciseness and flexibility of Ruby. I came across Scala, skimmed a little and was impressed. I bought a copy of Scala For The Impatient and just ate it for breakfast. I read the entire thing in 2 or 3 days, jotted down everything useful and then just started coding. It was awesome! At first I was just coding the same way I would Java with less than half the lines of code. It is a very efficient Java.

Exposure: Haskell

Lurking around the Scala community, I came across a joke. Someone said “Scala is a gateway drug to Haskell.” I found that amusing, although not for the reasons that the author intended. Haskell? Isn't that some toy university language? An experiment or something. Is it even still alive? Scala's awesome and so powerful, why would that lead to Haskell? How.... intriguing. Inexplicably it piqued my interest and really stuck with me. Later I decided to look it up and yes, it sure was alive and very active. I was shocked to discover that it compiles to machine code (binary) and is comparable in speed to C/C++. What?! It seems idiotic now but a year ago I thought it was some interpreted equation solver. I'm not alone in that ignorance, sadly; talking about it to some mates over lunch last year and a friend incredulously burst out laughing, “Haskell?” as if I was trying to tell him my dishwasher had an impressive static typing system. It saddens me to realise how in the dark I was, and how many people still are. Haskell is pretty frikken awesome! True to the gateway drug prophecy, I do now look to Haskell as an improvement to using Scala. But let's get back on track.

Exposure: Scalaz

I also started seeing contentious discussion of some library called Scalaz. Curious, I had a look at the code to see what people were on about, and didn't understand it at all. I'd see classes like Blah[F[_], A, B], methods with confusing names that take params like G[_], A → C, F[X → G[Y]], implementations like F.boo_(f(x))(g)(x), and I'd just think “What the hell is this? How is this useful?”. I was used to methods that did something pertaining to a goal in its domain. This Scalaz code was very alien to me and yet, very intriguing. Some obviously-smart person spent time making the alphabet soup permutations, why?

I've since discovered that answer to that question and I never could've imagined the amount of benefit it would yield. Instead of methods with domain-specific meaning, I now see functions for morphisms, in accordance with relevant laws. Simply put: changing the shape or content of types. No mention of any kind of problem-domain; applicable to all problem-domains! It's been surprising over the last year to discover just how applicable this is. This kind of abstraction is prolific, it's in your code right now, disguised, and intertwined with your business logic. Without knowledge of these kind of abstractions (and the awareness that a type-component level of abstraction is indeed possible) you are doomed to reinvent the wheel for all time and not even realise it. Identifying and using these abstractions, your code becomes more reusable, simple, readable, flexible, and testable. When you learn to recognise it, it's breathtaking.

FP: The Basics

Now that I'd been exposed to FP I started actively learning about it. At first I learned about referential transparency, purity and side-effects. I nodded agreement but had major reservations about the feasibility of adhering to such principals, and so at first I wasn't really sold on it. Or rather, I was hesitant. I may have been guilty of mumbling sentences involving the term “real-world”. Next came immutability. Now I'm a guy who used to use final religiously in Java, const on everything in C++ back in the day, and here FP is advocating for data immutability. Not just religiously advocating but providing real, elegant solutions to issues that you encounter using fully immutable data structures. Wow! So with immutability, composability, lenses it had its hooks in me.

Next came advocation for more expressive typing and (G)ADTs. That appealed in theory too and again I was hesitant about its feasibility. Once I experimentally applied it to some parts of my project, I was blown away by how well it worked. That became the gateway into thinking of code/types algebraically, which lead to...

FP: Maths

I loved maths back in school and always found it easy. Reading FP literature I started coming across lots of math and at first thought “great! I'm awesome at maths!” but then, trying to make sense of some FP math stuff, I'd find myself spending hours clicking link after link, realising that I wasn't getting it and, in many cases, still couldn't even make sense of the notation. It became daunting. Even depressing. Frequently demotivating.

The good news is that everything you need is out there; you just have to be prepared to learn more than you think you need. I persisted, I stopped viewing it as an annoying bridge and starting treating it as a fun subject on its own and, before long things made sense again. It opens new doors when you learn it.

Example: I had a browser tab (about co-Yoneda lemma) open for 3 months because I couldn't make sense of it. It took months (granted not everyday) of trying then confusion then tangents to understand background and whatever it was that threw me off. Once I learned that final piece of background info, I went from understanding only the first 5% to 100%. It was a great feeling.

Feeling Intimidated

Looking back there were times when I felt learning FP quite intimidating. When I'm in/around/reading conversations between experienced FP'ers quite often I've seriously felt like a moron. I started wondering if I gave my head a good slap, would a potato fall out. It can be intimidating when you're not used to it. But really, my advice to you, Reader, is that everyone's nice and happy to help when you're confused. I have a problem asking for help but I've seen everyone else do it and be received kindly... then I swoop in an absorb all the knowledge, hehe.

It's a mindset change. I wish I'd known this earlier as it would've saved me frustration and doubt, but you kind of need to unlearn what you think you know about coding, then go back to the basics. Imagine you've driven trains for decades, and spontaneously decide you want to be a pilot. No, you can't just read a plane manual in the morning and be in Tokyo in the afternoon. No, if you grab a beer with experienced pilots you won't be able to talk about aviation at their level. It's normal, right? Be patient, learn the basics, have fun, you'll get there.

On that note, I highly recommend Fuctional Programming in Scala, it's a phenomenal book. It helped me wade my way from total confusion to comfortable comprehension on a large number of FP topics with which I was struggling trying to learn from blogs.

Realisation: Abstractions

Recently I looked at some code I wrote 8 months ago and was shocked! I looked at one file written in “good OO-style”, lots of inheritance and code reuse, and just thought “this is just a monoid and a bunch of crap because I didn't realise this is a monoid” so I rewrote the entire thing to about a third of the code size and ended up with double the flexibility. Shortly after I saw another file and this time thought “these are all just endofunctors,” and lo and behold, rewrote it to about a third of the code size and the final product being both easier to use and more powerful.

I now see more abstractions than I used to. Amazingly, I'm also starting to see similar abstractions outside of code, like in UI design, in (software) requirements. It's brilliant! If you're not on-board but aspire to write “DRY” code, you will love this.

Realisation: Confidence. Types vs Tests

I require a degree of confidence in my code/system, that varies with the project. I do whatever must be done to achieve that. In Ruby, that often meant testing it from every angle imaginable, which cost me significant effort and negated the benefit of the language itself being concise. In Java too, I felt the need to test rigorously.

At first I was the same in Scala, but since learning more FP, I test way less and have more confidence. Why? The static type system. By combining lower-level abstractions, an expressive type system, efficient and immutable data structures, and the laws of parametricity, in most cases when something compiles, it works. Simple as that. There are hard proofs available to you, I'm not talking about fuzzy feelings here. I didn't have much respect for static types coming from Java because it's hard to get much mileage out of it (even in Java 8 – switch over enum needs a default block? Argh fuck off! Gee maybe maybe all interfaces should have a catch-all method too then. That really boiled my blood the other day. Sorry-), anyway: Java as a static typing system is like an abusive alcoholic as a parent. They may put food on the table and clothes your back, but that's a far cry from a good parent. (And you'll become damaged.) Scala on the other hand teaches you to trust again. Trust. Trust in the compiler. I've come to learn that when you trust the compiler and can express yourself appropriately, entire classes of problems go away, and with it the need for testing. It's joyous.

Sadly though, eventually you get to a point where Scala starts to struggle. It gets stupid, it can't work out what you mean, what you're saying, you have to gently hold its hand and coax it with explicit type declarations or tricks with type variance or rank-n types. Once you get to that level you start to feel like you've outgrown Scala and now need a big boy's compiler which can lead to habitual grumbling and regular reading about Haskell, Idris, Agda, Coq, et al.

However when you do need tests, you can write a single test for a bunch of functions using a single expression. How? Laws. Properties. Don't know what I mean? Pushing an item on to a stack should always increase its size by 1, the popping of which should reduce its size by 1 and return the item pushed, and return a stack equivalent to what you started with. Using libraries like ScalaCheck, turning that into a single expression like pop(push(start, item)) == (start, item) which is essentially all you need to write; ScalaCheck will generates test data for you.

Where Next?

What does the future hold for me? Well, I could never go back to dynamically-typed language again.

I will stick with Scala as I've invested a lot in it and it's still the best language I know well. I'd like to get more hands-on experience with Haskell; I don't know its trade-offs that well but its type system seems angelic. Got my eye on Idris, too.

Academia!

I used to get excited discovering new libraries. I'd always think “Great! I wonder what this will allow me to do.” Well now I feel that way about research & academic papers. They are the same thing except smarter, more lasting, more dependable, and they yield more flexibility. It's awesome and I've got decades of catching up to do! Over the next year I'll definitely spend a lot of time learning more FP and comp-sci theory. I'd also like to be able to understand most Haskell blogs I come across. They promise very intelligent constructs (which aren't specific to Haskell) but the typing usually gets a bit too dense; it'd be nice to be able to read those articles with ease.

Don't fall for the industry dogma that academia isn't applicable to the real-world. What a load of horseshit that lie is. It does apply to the real-world, it's here to help you achieve your goals, it will save you significant time and effort, even with the initial learning overhead considered. Don't say you don't have time to do things in half the time. If you're always busy and your business are super fast-paced agile scrum need-it-yesterday kinda people, well I know you don't have time, but what I'm offering is this: say you just need to get something out the door and can do it quickly and messily in 1000 lines in 6 hours with 10 bugs and 10 “abilities”, well if you spend a bit of your own time learning you could perform the same task in 400 lines in 4 hours with maybe 1 bug and 20 “abilities”. You've just saved 2 hours up-front, not to mention days of savings when adding new features, fixing bugs, etc. That's applicable to you, the “real-world” and the industry. I've spent years in the industry and not just as a coder and I wish I'd known about this stuff back then because it would've saved me so much time, effort and stress. There seems to be this odd disdain for academia throughout the industry. Reject it. It's an ignorant justification of laziness and short-sightedness. It's false. I encourage you to take the leap.

Scala: Methods vs Functions

2013-10-26T11:06:00.000+11:00

It's a bright, windy Saturday morning. Sipping a nice, warm coffee I find myself musing over the performance implications of subtleties between methods and functions in Scala. Functions are first-class citizens in Scala and represented internally as instances of Functionn (where n is the arity); effectively, an interface with an apply method. Methods are methods are methods; they are directly invocable in JVM-land.

So what differences are there that could affect performance? I can think of:

Because functions are traits with abstract type members, in JVM-land those abstract types will be erased to Objects. I presume this means boxing for your primitives like int and long (unless scalac has any tricks up its sleeve like @specialized)
When passing a method to a higher-order function, methods will need to be boxed into a Function. For example, in
def method(s:String) = s.length List("abc").map(method)
the map method requires an instance of Function1 so I (again, presume) the compiler generates a synthetic instance of Function1 as a proxy to the target method.
The JVM can invoke methods directly. To invoke a function, it will have to first load up the Function object and then invoke its apply method. That's an extra hop.
Probably more.

So those are some of the differences between functions and methods in Scala. Let's see how they perform. There's an awesome little micro-benchmarking tool called ScalaMeter. It takes about 2 min to get started with it. I decided to test 1,000,000 reps along three axes:

Direct invocation vs passing to someone else
Primitive vs Object argument
Primitive vs Object result

The Results

		Fn improvement over Method
Direct	int → int	-6.32%
	int → str	-9.53%
	str → int	-4.39%
	str → str	-1.09%
As Arg	int → int	0.31%
	int → str	0.94%
	str → int	-0.22%
	str → str	-0.24%

What can we see?

It seems that there is a boxing cost for functions' i/o.
It seems that there is a slight cost when invoking functions directly.
It seems that there is no real cost boxing methods into functions.

Conclusion

If you're like one-week-ago-me (pft he's idiot!), you might think you're helping the compiler out by writing functions instead of methods when their only use is to be passed around. Well it doesn't appear to be so.
Just use methods and let your mind (if you're lucky enough to have its cooperation) worry about and solve other things.

Code and raw results available here: https://github.com/japgolly/misc/tree/methods_and_functions

Keyed Lenses

2013-09-05T14:56:00.000+10:00

TL;DR: Lenses are cool. I've come up with keyed lenses which I find helpful. Hopefully you will too. Do you? Have I just reinvented the wheel in ignorance?

Annual blog post time. So this year I've been imploding and exploding with enthusiasm over functional programming. Already I've found some FP perspectives and strategies mind-boggle-blow-blast-ingly effective, and beautiful. My day project is in a significantly better state owing to FP. If you're not onboard I recommend reading Learn You a Haskell for Great Good! and Functional Programming in Scala.

(Btw: big thanks to NICTA, specifically Tony Morris & Mark Hibberd who held 2 free FP courses and gracefully tolerated numerous dumb-shit moments from me. I think it's a semi-annual recurring thing so keep an eye out on scala-functional for the next one if you're interested and in Australia.)

What is a lens? If you don't know what a lens is, it's basically a decoupled getter/setter that be composed with other lenses, so that the depth and structure of data can be hidden. In traditional OO you might not see the merits but when your data structures are all immutable, the benefit is immense. There are plenty of good resources online to learn more, such as this, this and this.

KeyedLenses

What I'm calling a KeyedLens, is a lens that points to a value in a composite value such that a key is required. A Map is an obvious example.

(NOTE: Scalaz has some basic support for this -- I am aware of it -- but I find that it doesn't my needs and/or the way I try to use it. I find that the call-site syntax becomes long and nasty, it doesn't compose well, and it creates new lenses every time it's used which is inefficient).

Let's start with some toy data.
I'm going to use Scala and the awesome Scalaz library, and there's a link to the KeyedLens source code at the end of the post. (If you don't know Scala, just imagine it's pseudo-code. The concepts translate into almost anything.)

Let's model a band from a guitarist's point of view.

Here we're modelling the mighty band Tesseract (think Pink Floyd + Meshuggah).
There are two places where I'm going to use a keyed lens.

To access the guitar of a given band member. (guitarL)
To access the string gauge of a given guitar. (stringGaugeL)

Here are the lens definitions:

This gives us the following lenses:

LENS	TYPE
guitarTuningL	`LensFamily[Guitar, Guitar, String, String]`
stringGaugeL	`LensFamily[(Guitar,Int), Guitar, Double, Double]`
bandNameL	`LensFamily[Band, Band, String, String]`
guitarL	`LensFamily[(Band,Person), Band, Guitar, Guitar]`
guitaristsTuningL	`LensFamily[(Band,Person), Band, String, String]`
guitaristsGaugeL	`LensFamily[(Band,(Person,Int)), Band, Double, Double]`

Notice the keys always get propagated to the left.
Now let's see them in action. This is what appeals to me the most.

Usage. The Fun Part.

Get with one key. What is Acle's guitar tuned to?

scala> guitaristsTuningL.get(band, acle)
res0: String = BEADGBE

Get with two keys. What is the gauge of Acle's 7th string?

scala> guitaristsGaugeL.get(band, (acle, 7))
res1: Double = 0.059

Set with one key. I want to change Acle's tuning.

scala> guitaristsTuningL.set((band, acle), "G#FA#D#FA#D#")

res2: Band = Band(Tesseract,Map(Person(Acle) -> Guitar(7,G#FA#D#FA#D#,List(0.011, 0.014, 0.018, 0.028, 0.038, 0.049, 0.059)), Person(James) -> Guitar(6,EADGBE,List(0.01, 0.013, 0.017, 0.026, 0.036, 0.046))),Set(Person(Jay), Person(Amos), Person(Ashe)))

Set with two keys. I want to lower the gauge of Acle's 7th string.

scala> guitaristsGaugeL.set((band, (acle, 7)), 0.0666666)

res3: Band = Band(Tesseract,Map(Person(Acle) -> Guitar(7,BEADGBE,List(0.011, 0.014, 0.018, 0.028, 0.038, 0.049, 0.0666666)), Person(James) -> Guitar(6,EADGBE,List(0.01, 0.013, 0.017, 0.026, 0.036, 0.046))),Set(Person(Jay), Person(Amos), Person(Ashe)))

[If you're new to lenses, keep in mind that all the data here is immutable. Objects are copied and reused.]

I really like the brevity of these lines.
I like that you get a single lens that requires a key be provided.
I don't like the (A, (K1,K2)) type of guitaristsGaugeL which is easily changable into (A, K1, K2) but then what happens when it's further recomposed? I'd probably need methods like compose2, compose3, compose4, etc. Will think later.

What do you think? Does anything like this already exist?

Source code for KeyedLenses.scala is here.

Thanks!

Trial: Choosing Lift over Rails

2013-04-25T11:57:00.003+10:00

Today, I'm going to start on a slightly scary journey. I'm going to start work on a new webapp but I'm not going to use Ruby on Rails which I (mostly) love (with plugins) and already know well. Instead I'm going to use the Scala-based Lift framework which I have never used and is completely alien to me. This is going to cost me a lot of time at the beginning, not just because I don't know the API, but because the mindset of the beast seems drastically different to typical web frameworks that most of us are used to.

(NOTE re RAILS: I will make many comparisons to RoR because it's the web framework I know best and cos it's well-known. Just like everyone thinks Maccas (ie. McDonalds for non-Aussies) when you think fast-food. It's tough being on top. Bad luck RoR.)

(Have you made a similar transition? I'd love to know how it went. Let me know!)

Why?

In no particular order...

Security. Big security focus. Immune to lots of common vulnerabilities. I think it even automatically uses random param names for POSTs, etc.
Speed. Compiles to Java bytecode, runs on JVM. Parallel rendering. Scala has built-in, native XML support so that should be faster than parsing textual templates. I read somewhere that a modest, old processor can comfortably serve 300 req/sec on a single processor, doing a modest amount of transformation. With threading I've read that this can exceed 20x the speed of the same webapp in RoR running multi-process, presumably on something like Puma, Thin, whatnot. (No sources, sorry, it's arbitrary anyway. And we all know Ruby can be scaled, that's not the topic here.)
Snippets rather than MVC. MVC has never felt right to me. It's great for trivial DB-interface-like-webapp kind of stuff, but outside of that I've often run into ambiguous scenarios that don't feel comfortable... although I have used it happily for lack of a better alternative. Lift instead uses snippets which are pieces of logic/functionality that you can use all over the place in as many views as you like. Seems much better in terms of reusability, organisation on the other hand I'm not sure yet. Snippets are also executed in parallel -- nice.
Easier Ajax with wiring/binding, server-push, more. In addition to what the links say, most of the Ajax plumbing seems to be automatic; you don't even need to declare URLs or actions. Also type and parameter safety -- excellent.

Those are the main reasons that come to mind. There other niceties too such as lazy loading. The doco flaunts designer friendly templates as some awesome feature but I'm personally on the fence about it. I'm a one-man everything team at the moment so it doesn't immediately appeal to my situation. I can see it potentially making CSS dev faster (because you can work off a static template with all cases hardcoded which gets wiped by Lift) but I think any small gains will be offset by the major productivity loss of not having HAML.

Downsides. The Oh Noes.

Basically, less <good thing>.

Less adoption. Less incumbency. This results in less libraries to choose from. Less plugins. I assume more reinvent-the-wheel kind of stuff for me to write.

Less community. It especially won't be as big as RoR's. This means less examples and code online, less questions on StackOverflow, less forums and blogs, less information and help. It sounds like the mailing list is friendly enough and I'll be joining today but it's nicer to have more resources at your disposal rather than just always hitting up the same guys for help.

Less developers on the market. If this app makes me millions of dollars and I get to a point where I need to outsource or hire then I'm going have significantly less people that can do the job. It's hard not to compare to RoR which is ubiquitous these days and which would cause no sweat finding capable helpers.

I'm going to accept these problems because Scala is awesome, Lift is philosophically fresh and whispers of great advantages once you climb the learning curve, and there are aspects of Lift that will have a direct impact on time, money and resources. For example, I'll be able to host my project free for longer (due to improved performance). If my project gets popular, yes, I'll have it harder looking for manpower but, (and this runs contrary to my innate tendencies), I often read & hear (with supporting evidence) that it's best to focus entirely on the short term when starting up a small business or venture (notice I avoided calling it a "startup"). Worries like scalability, resourcing, support, etc. can and should be dealt with once the project premise is proven to be successful and profitable. Like the 37-Signals guys say, "A business without a path to profit isn't a business, it's a hobby", and if this gets off the ground and becomes just a hobby then I'm not going to fuss about that stuff anyway.

Lessons Learnt on My Second Android App - Pt.1

2013-04-17T14:55:00.000+10:00

Hi. Recently I released my second Android app. It's a collection of game timers and bells and whistles to be used whilst playing board games, card games, party games, the like. Practically, that means you can use it to play 4 player Scrabble where each player has 2 minutes-per-turn and a total of 25 min for the entire game, for example. Or, for a game like Scattegories you can have an hourglass set to run for a random amount of time between 1~2 min and then scare the crap out of everyone when it goes off. That kind of thing.

If you're interested, you can peruse some screenshots here and find the app here: “Time Us!”

Moving on, I'd like to share and record for future-me, the lessons I learned and observations I made whilst creating this app.

Scale Bitmaps Effeciently

Those tiny graphic files in your resources consume a surprising amount of memory when in use. A 100KB PNG is compressed and when loaded can comsume 10MB of memory as a Bitmap. As we know memory is quite important on mobile devices, especially on older devices. If the user is on a small LDPI device for example, their tired old phone isn't going to have much memory to spare. If you're scaling bitmaps there's a chance that you're consuming more memory that you need to, and the amount of waste gets higher with older devices where it matters more.

Say you have a large graphic that you scale to a smaller size, you're going to find that without some special options in-place, the device will load the full graphic into memory and retain it as it was loaded prior to scaling. I originally assumed that on a small LDPI device the graphic would scale down to about 20% (or whatever) of its size and accordingly only consume 20% of the full size in memory. Right? No. By default Android loads first, scales later and doesn't let go of that full-sized bitmap.

So how can you reduce memory consumption? The first recommended solution (because it is more processor-performant as well) is to prepare different copies of your graphics for each density. Ok great, but there are scenarios where that's not the best approach so what then? It turns out that there's a whole trove of info on this issue online in the official Android docs, the specifics on this scaling issue here:
http://developer.android.com/training/displaying-bitmaps/load-bitmap.html

You can mostly just copy-and-paste the utility code from the link above. By applying the methods therein, I was able to reduce memory usage by up to 80% or so (from memory), depending on the device. I didn't know this page existed until I needed it so I suggest you bookmark it and/or keep a mental note if you don't read the above page immediately.

Free Bitmap Memory Manually

Now that we know a little more about bitmap memory consumption, let's talk about another snag I encountered.

My app's home screen has two looks: dark & light. The light-mode background consists of 2 images. On a w360dp XHDPI phone, it consumes 5.8MB for bkgd img #1, and 4.0MB for #2, making a total of 9.8MB.

(FYI: You can easily see the memory profile of, and analyse your app by doing this from Eclipse: DDMS → Devices → Select process and click “Dump HPROF file” in the toolbar.)

What would you expect to happen to that 9.8MB when the user hits a button and the app moves on to the next activity (screen)? I expected that the OS would release the memory because it's not in use. Not so. As long as the home activity is in the task history the memory will stay consumed and locked which in my case means for the entirety of the app. Now maybe technology is catching up but the old phone I had for two years before my newer Galaxy Nexus, was a HTC Magic, and that thing was constantly chugging due to memory problems and as such. Consequentially, I'm not comfortable with holding onto 10MB for no reason on a mobile device.

So what can you do? Well I could draw both images onto a single canvas but that's not really solving the problem, that would just drop the 10MB down to 6MB and it would still be unavailable later. To solve the problem I did three things:

Remove, dereference, recycle the images in onStop().
I wrote a utility function to do just that, and then simply called in in my activity's onStop() method. It looks like this:
Set the images during onStart().
In your activity, just use View.setBackground(Drawable) or View.setBackgroundDrawable(Drawable) in your onStart() callback to restore the images that you clear in onStop().
Turn off hardware acceleration.
Devices running Android 4.2 require an additional kick in the pants to get them to let go of the memory. You'll want to turn hardware acceleration off or else you'll end up with instances of GLES20DisplayList in memory that consume exactly the same amount of space. Instructions to turn off hardware acceleration can be found here: http://developer.android.com/guide/topics/graphics/hardware-accel.html

There's actually a whole bunch of doco on bitmaps and memory on the official Android site so for more info, take a look at http://developer.android.com/training/displaying-bitmaps/index.html.

PNGOUT

If you have a moderate amount of graphics and like me you optimise (ie. re-generate) them for multiple densities, then the size of your APK can balloon fast. This situation lead me to discover a utility called PNGOUT (Windows, ArchLinux, Other Linux + OSX). It optimises your PNGs and gets them down to a smaller size, from memory, using a special compression algorithm targeted at graphics. As with most types of ultra-compression it can be slow but the results are worth it.

I ran it over my PNGs. It took 30min and reduced my total size from 7.4MB to 5.9MB, a 20% reduction. Nice.

Screen Widths

I learned a few things about screen width. Firstly I had to find a database of devices and specs. One doesn't exist but I did find two admirable attempts here and here. After some analysis, from what I can tell, most older phones have a width of 320dp and I don't think there are any phones with less than that -- AdMob's smallest banner ads are 320dp. For LDPI and MDPI, 320dp is a safe bet. Once you get to HDPI, it seems that around 70% of devices are 320dp, 25% are 360dp and the rest are larger than that. Here's the most interesting piece of news: for XHDPI it seems that the 360dp is the minimum and majority. There doesn't seem to be any 320dp XHDPI devices in existence!

What does this mean? Why do I care? It means that on some devices in some scenarios, you're going to end up with more free space that you expected and a smaller perception. Instead a larger graphic is more appropriate and that's something to consider when generating graphics. You'll need to make a conscious decision about which dimension is more important to you for each graphic, and enlarge appropriately. If you want a graphic to be perceived as being the same size on a 320dp-width and a 360dp-width HDPI device, then you might want to generate another copy at 360 ÷ 320 = 1.125x the original size. If you're supporting tablets then it might pay to do something similar for the most common tablet sizes.

Here's what I ended up with:

Dir	Factor	Content
`drawable-ldpi`	0.75	Everything
`drawable-mdpi`	1	Everything
`drawable-hdpi`	1.5	Everything
`drawable-w360dp-hdpi`	1.6875	Width-sensitive GFX only
`drawable-xhdpi`	2 or 2.25 depending on the gfx	Everything

If you'd like to learn more have a read of these:
Supporting Different Densities
Supporting Different Screen Sizes

More Next Time

I learnt more but I'll post that next time. If you read this far I hope I've been helpful.

Android Project Templates

2013-02-19T08:54:00.000+11:00

Today I'd like to share some templates I've made for creating Android apps. They all use Maven and have been coerced into compatibility with Eclipse. There are a few different flavours which I will describe below. If you want to skip the reading and just want the code, np, it's all here: https://github.com/japgolly/reference.

Template #1: Java

Code available here.

This is the gist of what I use when I'm using Android + Java. It comes with a few add-ons to mitigate Java's verbosity and inflexibility and the fact that they all work together in harmony, even with Eclipse, well that's what makes this template gold. It took a while to get working way back when.

Features

Lombok: Integrated into both Maven and Eclipse.
Modified to play nice with Android, and added the ability to add @Inject to generated constructors.
What is Lombok?
AndroidAnnotations: Integrated into both Maven and Eclipse.
What is AndroidAnnotations?
CoFoJa: Contracts For Java provides annotations that can be used to verify pre-conditions, post-conditions and invariants. The validations are inherited so they're great for interfaces. Also they can be omitted from bytecode during compilation meaning you get more checking during tests and dev with no performance penalty in prod.
More about CoFoJa.
Unit testing: Robolectric is being used for unit testing (along with an example of usage). I'm actually using a test library of mine (open-source) that includes Robolectric, FEST, Mockito and some Android-specific testing utilities.
What is Robolectric?
Integration testing: Robotium is being used for integration testing (along with an example of usage).
What is Robotium?
Android Lint: There's a profile that run Android Lint. Great for CI builds.
Findbugs: Findbugs has been integrated (not that it was effort). What's nice is that I've included a rules file to exclude certain Android-specific warnings. I'm a nice guy.
Proguard: Dev & release configs for Proguard are included.
Release: A release profile is included that runs Proguard, signs artifacts, and runs zipAlign.
AdMob: It's in there, ready to go but can also be deleted just as easily if you don't want it.
Eclipse settings: Auto-format on save. Auto-organise-imports on save. A TODOC tag for doco todos. More.

Template #2: Scala

Code available here.

Here you can use Scala instead of Java to build your app. The introduction of Scala renders obsolete the need for Lombok and AndroidAnnotations which simplifies the build process. As above, everything in this build works from CLI with Maven, and within Eclipse.

Features

Scala: Scala app code. Scala unit tests. Scala integration tests. Scala proguard. Scala Eclipse. Scala everything!
Unit testing: Robolectric is being used for unit testing (along with an example of usage). I'm actually using a test library of mine (open-source) that includes Robolectric, FEST, Mockito and some Android-specific testing utilities.
What is Robolectric?
Integration testing: Robotium is being used for integration testing (along with an example of usage).
What is Robotium?
Android Lint: There's a profile that run Android Lint. Great for CI builds.
Proguard: Proguard is mandatory when Scala is used. Rules included so that Scala works. Dev & release configs for Proguard are included.
Release: A release profile is included that runs Proguard, signs artifacts, and runs zipAlign.
AdMob: It's in there, ready to go but can also be deleted just as easily if you don't want it.

NOTE: You cannot use Scala and Java together because it wrecks Eclipse. If you're using VIM or some other editor inferior to mighty VIM, you'll be fine with both languages simultaneously, Maven can handle it, just don't expect a happy life using Eclipse.

Template #3: Scala + Free/Paid Editions of Your App

Code available here.

Same as the Scala template above except this time instead of having a single app, you have an Android library and two separate apps that extend it.

Features

Everything in template #2. See above.
Shared APK Library: Anything shared between your free & paid apps will live in the shared library. It's an Android library so resources can be shared too.
Separate Apps: Your free and paid apps are separate apps that extend the shared library. Additional code and/or resources can be dropped in as necessary.

Usage

To use one of these templates just copy the entire folder (and ensure you capture files starting with .). Next use the find_template_values.sh script to find places in the project where you need to swap out example values for real values. There's also a helper script, replace.sh that you can use to perform recursive, mass regex replacements.

To load your project into Eclipse, simply hit File → Import... → Existing Projects into Workspace. All the required Eclipse settings and files are already there in the template.

Enjoy.

Releasing an App for Android - Part 2

2013-02-10T09:44:00.001+11:00

Continuing on from part 1...

4. Free and Paid Editions.

The Google Play marketplace doesn't allow free & paid editions of apps. If you want free & paid editions you will need to manage and release two separate apps with separate package names.

To do this I decided to go with the mainstream approach, namely, creating a separate projects. If you have shared code without shared Android resources (unlikely) then you'd simply move the shared code out into its own project and include the resulting jar as a dependency in the free & paid projects. With Android it's similar, you create an APK library. Practically this means adding android.library=true to your library's project properties and changing its pom packaging from apk to apklib. The free/paid projects wont contain much but they will need their own AndroidManifests and a reference to the library in the project properties such as android.library.reference.1=../library. That's the gist of it.

One annoying thing is that, like a few things that have come out of the Android dev camp, it's a hack. The resulting library is basically just a zip of the resources and the SOURCE CODE. When you use an Android library in a top-level project it unzips, merges and recompiles the library code. How annoying. The reason is the generated R.java will produce different ids outside of the library, hence the recompilation. As far as I understand the system is therefore a quick hack. It might be fair enough - I don't know how much pressure the Android dudes are under or how well/poorly staffed they are. But technologically speaking is still a hack. Oh well. Make sure Scala incremental compilation is on as it takes a large bite of the build pain.

In retrospect it was a good thing that I saved this step until the end. The time to alter the build & structure (especially if you've got a working reference) near the end of the project will be much less than both the time spent by managing 3 separate projects instead of 1; and the time wasted by waiting on the extra compilation, dexing and proguard'ing.

Tip #1: Use a single project for as long as you can get away with but anticipate that towards the end you'll need a little time to split out library, free, and paid projects.

Tip #2: Anything edition-specific should be separated from anything shared. For example, create an ad banner layout and then <include> it rather than using an AdView directly. Later you can configure things so that the paid edition gets a NOP layout.

5. Advertising.

I started not knowing anything about ads. Basically I did this:

Sign up with AdMob.
Fill in all your bank details so they can pay you. You'll also need a merchant account with Google Play.
Add your app but don't give it a market URL yet. This allows you to get an ID that you can start using before you release.
AdMob doesn't just show you their own ads; they connect to other ad networks and can show you their ads too. That was surprising. Advantage us because it means that once you get ads working in your app and you want to use a different ad provider like InMobi or MobFox, you can just sign up with those guys, get your ID and plug it into AdMob on their website. No code changes required. You can even configure Admob to show ads from a combination of providers, or use different providers for certain countries.
Get the AdMob SDK and wire it into your project.
Follow the instructions on this page to integrate ads into your layout: https://developers.google.com/mobile-ads-sdk/docs/admob/fundamentals. Reading all those pages is a good idea.
Finally, if you want ads to refresh every 60 sec or so, don't do it in code; play around in the AdMob website and you'll find settings for it. You can also customise the appearance of ads on the website or in code.

Done.

Now I don't know much about how it all works yet but this is what I've gleamed:

Every time an ad is shown to a user it's called an “impression”. You get nothing, $0.00, for impressions.
Every time an ad is clicked you get something small like $0.03.
RPM = “Revenue Per Mile” = the amount of money you've made per 1000 impressions.
Fill rate = Quote: Fill rate represents the percentage of ad requests that satisfy the ad requests sent by the app. It is a measure of AdMob's ability to serve ads in your app with the existing inventory.
You'll see eCPM everywhere, it stands for “effective cost per mile”. It's a metric indicating how effective/profitable a particular ad network has been for you so far. Don't freak out when AdMob says eCPM $0.00 when you first sign up, that just means you haven't had any clicks yet. The formula is 1000 x Cost-per-Click x Click-Through-Rate.

And we've reached the boundary of my knowledge on the topic. Hope you enjoyed the tour.

Tip #1: Filter logcat by tag “Ads” to see all your AdMob logs.

Tip #2: Search the AdMob logs for your test device ID and then plug that into the ads:testDevices attribute of your view. From what I hear the AdMob support is notoriously bad so if you get in trouble with your account, it's far from easy to get it back online.

Tip #3: eCPM is a metric of how well an ad network has been for someone. It's not a setting that you need to configure (even though it looks that way from AdMobs UI).

6. Localisation.

Not much to say here. Android warns you constantly about externalising your strings which is good. It doesn't monitor your source code though. If paranoid or you have a large team, Eclipse can warn when it finds strings without something like //$NON-NLS-1. Checkstyle et al probably have a similar feature.

Tip #1: Either put non-translatable strings into a file called donottranslate.xml (seriously – Android Lint requirement) or give them an attribute translatable="false".

Tip #2: A split to paid/free editions will introduce new strings. Consider this before sending out text for translation.

Tip #3: The marketplace description(s) will also need translation. Write and include that before sending out text for translation.

Tip #4: Your text is easier to manage when all in one place. Therefore don't include separate strings in the free/paid versions; put them all in the library project together then just reference them differently in each app. If you don't want to change the references then reference an alias and change the alias in each project. Example:

Library project:
  <string name="app_name_free">Bananas (free)</string>
  <string name="app_name_paid">Bananas (pro)</string>
  <string name="app_name">@string/app_name_paid</string>

Free project:
  <string name="app_name">@string/app_name_free</string>

Tip #5: Use the app in each language to spot-check the layout. Certain languages might need tweaking to look good. For example, a language like Japanese that doesn't use spaces might need a manual endline (\n) so that it doesn't end up with one character dangling and the word split across lines.

7. Proguard.

If you don't know what Proguard is, it's a tool that shrinks, "validates", optimises and obfuscates your binaries.

Scala on Android demands it.
Optimisation is very slow.
Obfuscation didn't seem to do very much to my app although it happily obfuscated my dependency libraries.

I managed to speed things up by having two proguard configs: dev & release. Turn off optimisation and obfuscation in the dev config and you will save a lot of time. With optimisation enabled it can take minutes at a time. I've uploaded my configs here: https://gist.github.com/japgolly/4747423

Tip #1: Reuse existing proguard configs.

Tip #2: Use separate dev & release proguard configs.

8. Signing.

Your app will need to be signed (and then zip-aligned, in that order) in order to release it. To sign it you need to have your own certificate that doesn't expire until at least 2033 (which isn't the best for security but I don't make the rules). It's a Java thing so just google keytool and you'll be done pretty quickly. You'll need to create and save 2 passwords: one for the key, one for the store. I generated mine with this command: (tip: script it or purge it from shell history when done)

keytool -genkey -v \
  -keystore <keystore filename> \
  -alias <keystore name> \
  -keyalg RSA -keysize 4096 -validity 10000 \
  -keypass 'xxxxxxxxxx' \
  -storepass 'yyyyyyyyyy'

Once you're done, make sure you backup your keys and passwords then it's time to integrate signing into your build. For a Maven project you need to weave a massive blob of shit into your poms in order to integrate signing and zip-align. They should be in a release profile and all up it comes to around 100 lines of XML (!). I don't remember how much I was able to refactor into my parent pom (as my project is multi-module) but I don't think it was all of it. I'll post all the Maven stuff shortly anyway. Most of the time spent on signing was getting Maven to work properly.

Tip #1: Either ditch Maven for something better, or make sure you copy a working Maven Android project that has signing. (I'll post my Maven stuff shortly.)

9. Build Automation.

Last but not least, we arrive at something that is actually important from day #1 of the coding period: build automation. Very important. Obviously. I used Maven and ended up creating a multi-module project which I'll post more on later. Suffice to say Maven gave me trouble, and cost me too much time. I want to switch to something else and I have my eye on SBT.

I did some tests on SBT recently and the results supported the rumours that SBT is faster than Maven. A build of a single, simple Android project took 35 sec with Maven, verses the same project at 24 sec with SBT. That's a 31% saving. A clean build of my multi-module Maven project takes 2 minutes and I wonder how fast it would be with SBT. If I assumed flat 31% saving again, that's 120 sec down to 82 sec. Nice. I didn't investigate further because it's time-consuming and because I worry about Eclipse integration. I'm sure Eclipse + SBT = happy days, but it seems that Eclipse + ADT + SBT = pain. ADT is extremely inflexible and the sbt-android plugin is way too poorly documented for an SBT noob like me. If I changed the dir & file structure to suit ADT an expert might be able to modify the sbt-android build pretty easily but I can't. Also, it practically has a seizure if you give it a full AndroidManifest as it wants to create its own. I imagine that ADT wouldn't be too happy about that. Would loooooove to be shown the error of my ignorant ways here so if anyone knows, ping me.

Another thing is that Maven is archaic and verbose. Reuse between modules is impossible in some situations; there are 6-year-old bugs and feature requests for mixins that are still open... SBT on the other hand is concise and allows as much reuse as you can shake a stick at (and I can shake a stick at A LOT of reuse).

But any further bitching about Maven is just that at this point: bitching. So I'll suck it up and declare that though sweat and toil I've come to have a great multi-module Maven Android project setup with library, free/paid projects, and free/paid (instrumented) integration test projects working together, nuances coerced, effective for dev and release builds. I'll be posting it shortly.

In Conclusion

Looking back on all these issues I can now see how it took 5 or 6 days to go from 95% dev complete to released, despite doing 0%-95% in 2 days. I'll be applying these lessons during my other Android projects and I hope it helps others (yes you!) as well.

Releasing an App for Android - Part 1

2013-02-08T12:01:00.002+11:00

Last week I decided I would create a little micro Android app and release it so that I could get experience with the process. I recently came to the opinion that it's a good idea to hit learning curves with something small first rather than attempting both “big” and “new” at the same time. So I created an Android app called BPM Tapper. What I found surprising was that I went from vague idea to 95% code complete in 2 days, but then it took another 5 or 6 days to get it released.

Now, my goal here with this blog post is twofold. Firstly, to document what I went through releasing my first Android application so that in future I remember everything that needs to be done; great for planning. Secondly, I want to document tips and strategies to improve next time.

1. Android Completeness

By Android Completeness I'm referring to things that are either required or expected of common Android applications. The tasks I came across are as follows.

1.1. Inspecting Android Lint warnings.

This is an easy one. The Android SDK provides a lint tool that analyses your compiled code and resources, and creates a list of warnings. Not all Lint warnings are correct but they all should be vetted and resolved if necessary.

1.2. Handling interruption.

Your app can be interrupted at any time, e.g. if the phone gets a call while your app is in use, the OS will kick your app out and data will be lost if you don't handle it. Therefore if you don't want users to lose their on-screen state when using your app (or changing orientations), you generally need to implement two methods in your activities. onRestoreInstanceState and onSaveInstanceState. Scala made this really simple; case classes are serialisable so one simply calls Bundle.[get|put]Serializable. Example:

val measure = savedInstanceState .getSerializable(BUNDLE_KEY_MEASURE).asInstanceOf[Measure]

Too easy.

1.3. Integrating with the Android Backup Manager.

My app is stateless so I just flipped the allowBackup switch off in the AndroidManifest and didn't have the need to write any code here, but it is something to be aware of. It's easy to forget but very important for stateful apps.

1.4. Supporting multiple screen densities.

Different phones have different screen densities and you should at least be targeting support for MDPI through to XHDPI. Different screen densities (loosely) require different copies of your graphics at different sizes. This wasn't a problem for me because I create all my graphics in SVG and have a script to convert them to PNGs using ImageMagick. (See the Artwork section for such a script.)

1.5. Providing landscape layouts.

Not mandatory but I've seen quite a few reviews on the marketplace where people get quite passionate (read: vehement) about not being able to turn their phone on its side. Personally I don't see why it's such a big deal but I'm not everyone. Creating a new layout for landscape-mode and having it coexist is easy. However, designing the landscape UI that is pleasing and effective is the time-consuming part.

Tip #1: Always use something like Inkscape to create the screen before you hit the code. Use real sizes and colours. Don't assume and leave out anything that you plan to do in code later.

Tip #2: Commit to doing this to all screens in your app up-front, or no none at all. Or do some. But make that decision early on and ensure you can make it work. You don't want to create 8 landscape layouts then later come across a few that won't work then decide to scrap the entire landscape orientation ability.

1.6. Ensuring layouts look acceptable in various conditions.

This means compiling a testing matrix of relevant attributes such as:

Screen size.
Screen density.
Screen orientation.
Android version.
Ads vs no-ads (they do take up valuable screen space).
Language.

This is pain. The theory is simple right? The screen will look different depending on each factor above. You want your beloved app to look great on all users' phones.

Screen density I've addressed above. That, language and Android API version can just be eyeballed manually I think. No need to give them their own axis in a matrix. But I did want a matrix of ads-or-not, orientation, and screen size. What was annoying is that screen size is not screen size. There are resource qualifiers for screen size, namely small, normal, large, xlarge. Now they didn't work as expected for me. Due to my immense frustration and exhaustion at the time, I don't remember the details but I found them not aligning with my expectations at all. Instead I decided to use screen density to create different settings for different screen sizes, because there is a strong correlation between density and screen size anyway. I mean, no one has a 1200x800 LDPI phone. It doesn't exist. Thus I used a test matrix like this:

	ads		no ads
	port	land	port	land
ldpi	✓	✓	✓	✓
mdpi	✓	✓	✓	✓
hdpi	✓	✓	✓	✓
xdpi	✓	✓	✓	✓

Tip #1: Create a test matrix and use it to ensure things look good.

Tip #2: Eyeball the smallest screen sizes once in each language. Button widths can be twice as long as English depending on the target language which may cause side-effects with your layouts.

Tip #3: Always start with the lowest density and work your way up. A MDPI screen will choose values-ldpi/ over values/ if it exists and values-mdpi/ doesn't.

Tip #4: Don't trust the screen heights you can choose in the ADT layout editor. I have an old HTC Magic that has around 100dp less height than the closest config available from the editor.

Tip #5: Do not leave any hardcoded dimensions in your layout XML (obvious) or your styles.xml (not obvious). When adjusting for each screen it's going to be the dimensions like the text sizes, the padding & margin sizes that you'll be adjusting.

Tip #6: Do your utmost, do your friend's utmost (!) to avoid using dimensions to affect the layout itself. For example, a dimension for the margin between elements is one thing but if you want a large river of negative space, don't use dimensions, anchor both sizes of the space to stuff so that the river size is organic. This makes for more consistent layouts on different devices and means you won't have to adjust the value for different screens.

2. Scala.

Learn it! Use it! It is a brilliant language. It allows you to write better code, more concisely. There is so much boilerplate bullshit that Java requires, so many hoops to jump through, and so many obstacles that prevent you writing the kind of code that you (well, I) want to. Scala does away with nearly all of those! Its strong focus on immutability, functions, maximum code reuse (yay for multiple inheritance again!) and conciseness, not to mention all the other modern goodness such as pattern matching/partial functions; for comprehensions; case classes; implicit conversions, classes, and parameters... The list goes on. It's very productive language and you will be more productive with it.

Like anything it does have it's shortcomings though: flaky IDE support, slow compilation time, WTF-inducing stacktraces, controversial lack of binary compatibility between major versions (which doesn't bother me at all actually), massive stdlib that Android can't handle without proguard, too many custom operators at times (look at SBT with its <+= <<= ++= <++= etc, good god!).

But all in all, Scala is worth it. IDE support will continue to improve and compilation time and stdlib modularisation should be fixed later this year.

Scala can especially be a godsend when it comes to cutting out Android/Java repetition. Consider this Java:

protected TextView nameView;
protected void onCreate(Bundle savedInstanceState) {
    ...
    nameView = (TextView) findViewById(R.id.name);
}

And now consider this Scala:

lazy val nameView = find[TextView](R.id.name)

Here's another example. Java:

view.setOnClickListener(new OnClickListener() {
    @Override
    public void onClick(View view) {
        onDelete();
    }
});

Verses Scala:

view.setOnClickListener( onDelete _ )

Scala also saved me time on the testing front. Your test code becomes amazing when you can use traits for multiple inheritance, and write methods that accept lambdas or Ruby-style blocks. You'll end up with something approaching a testing DSL before you know it, and most of your test cases will be under 3 lines which certainly taking the wind of the sails of any test-case-writing-laziness.

Tip #1: Use an incremental compiler for Scala. With the scala-maven-plugin this is a single config item.

Tip #2: Copy a proguard.cfg with Scala settings and set your build to always use proguard.

Tip #3: Use traits to create collections of test helper code. Do as little directly as possible in your immediate test case code.

Tip #4: Use implicit conversions for all things Android.

Tip #5: Create an ActivityHelperTrait or similar and load it up with implicit value classes and inlined helper methods. Example:

@inline def find[T <: View](resId: Int): T = findViewById(resId).asInstanceOf[T]

3. Artwork.

Get in the habit of using vector-based graphics and using ImageMagick's convert tool (it's command-line). Vector graphics look great at any size and you can convert to generate bitmaps at various sizes (which Android needs). If you want you can use SVGs directly in Android with svg-android but there are still places where you need fixed-size images.

You will need to create or procure a visual to use as your application icon before release. Ensure you have it as an SVG (or some other vector-based format) then use ImageMagick to create various sizes. The marketplace requires 512x512, where as your app requires 36x36, 48x48, 72x72, 92x92 for ldpi, mdpi, hdpi and xhdpi respectivelly.

For all kinds of visuals you'll need to have multiple copies in separate sizes to support different screen densities. I wrote a Ruby script to convert all my visuals. Here is an except of that script:

Tip #1: Use 32-bit PNGs if you want an alpha channel.

Tip #2: ImageMagick will be your best friend for conversion. Script it.

Tip #3: Inkscape is a great tool. Very easy to use and everything is vector-based.

Tip #4: There are plenty of good-looking, vector-based visuals out there that are public-domain. There are also sites like The Noun Project that have brilliant visuals that you can usually use free as long as you attribute the author, or can buy cheaply.

To be continued...

I have 6 more points. Read about them in part 2.

Maven: Now with Colour™ !

2012-11-18T19:40:00.003+11:00

Maven.

I'm an command-line kinda guy. GUIs didn't event exist when I was a kid. When I use Maven it's often from the command-line. A curious thing happens when you run Maven from the CLI more than 3 times in a row. You slowly lose the ability to read those useful little symbols that we affectionately refer to as "the alphabet", your eyes begin to ache and you start to wonder who's really in control: you or your eyelids, the list oozes on. The reason for this is that you get a lot of information as the build runs, and more importantly it's all the same colour, usually bright white on black depending on your terminal config. Things get really hard to see and it takes a lot of precious concentration and brain-juice to decipher the results. Usually this is because you're searching for 1 particular line out of 100 or so with no real visual hints. I don't like it. My brain may well just be getting old or maybe I'm just working too much lately but I find that I don't really have much brainpower to spare, especially when you're in the middle of a problem with a large context. Don't like it.

So I wrote a script.
Get it here: https://gist.github.com/4104053

The great thing about Linux is that you can treat nearly everything as either a file (like /dev and /proc, awesome!) or a pipeline. By simply piping Maven output through a processor (while being careful not to block) you can apply colour all over the place by using some smart regex to insert strings that the terminal parses in order to allow user control over various terminal attributes, in this case: colour. (See ANSI escape codes on Wikipedia for more info.) Now I'm not the first person to take this approach, but the great thing is that because these scripts are so easy to write, anyone can create or customise one for their own personal needs.

My script uses said approach to do the following:

Clear the screen before running.
Highlight the name of each Maven phase and the plugin & goal responsible.
Colour the number of test passes, failures, errors.
Colour the test classes & methods of test failures and errors.
Colour Maven warnings & errors.
Highlight total build success / failure.
Remove that bunch of shit you get at the end of failed Maven builds. When Maven fails (or tests fail) it dumps a bunch of info that basically amounts to "Try using -e or -X else here's our website." Ok for beginners maybe but I don't need to see it all the time. I have "mvn -help" and Google if I need them.

Now that feature-set might not sound like much but makes a world of difference. It turns my frown upside down. Twice! (Hey...but that means-

Here are some screenshots:

Coloured Maven when things are happy

Coloured Maven when things go wrong

Rant: Stupid Maven

2012-11-01T17:56:00.000+11:00

I'm back in the Java world again, and already I'm annoyed. The Ruby world isn't without its problems but Ruby problems are generally much easier to solve when they do appear. You generally don't end up wanting to punch the screen and scream "give my life baaaack!" (Thor's internals get me close though.)

Today's problem: Maven.
Now, I've been a big fan and advocate of Maven for years. Therefore, upon returning to the Java world I happily sought out my good friend Maven and started using it immediately with my new project. ...Only to have problem... after problem... after problem... So now I'm going to vent.

TO DEPLOY to a local instance of Nexus you need to plonk big blob of shit (ie. XML) in every single project POM file. Adding <distrubutionManagement> in user profile's settings.xml doesn't work. WTF! In Maven-land, any artifact can be deployed to any repository so long as the repo configuration allows it. A beauty of Maven is the interoperability of artifacts & repositories, this is, a project that can deploy to Maven Central can also deploy to an intranet Nexus or Artifactory; no code needs to change; the protocol is standard. Forcing the storage of repo settings alongside the project's build instructions doesn't seem very bright to me. It's nice... but how to build a project should depend on where a project's gonna go after its built (especially when there's no impact on build anyway). In reality they are separate concerns but the system doesn't allow that separation. If repo info could be stored in either project or user-profile settings so that people could decide based on the project circumstances then fine, but as it stands currently, it is infuriatingly not the case and not supported.

SO, NOW to make things work, I need to have the same parent pom for all of my projects that I plan to deploy to my local Nexus. The alternative would be to copy & paste the same 20 lines or so of XML into each project: umm no. What's annoying about this is a) it's something else to maintain (as opposed to settings.xml which wouldn't have to be deployed or available as an artifact), b) it introduces a new dependency (literally, not runtime) to all my projects now, great; and c) Maven poms use single-inheritence -- no mixins; to inherit from a different parent I have more stuffing around to do. Bad architecture. Annoying.

Parent POMs (multi-module or standalone) allow you to configure plugins, however those plugin settings aren't automatically inherited in child projects. For settings I think you need to redeclare the plugin in the child project (not 100%, I don't even care anymore). What really bugs me is that version specifications are not inherited. WTF GUYS! So if I have a multi-module project (and I tried this) where in I declare the version of a plugin in the parent pom, the child does not use the specified version of the project. OMG. I tried with <inherited>true</inherited>, I tried in both <plugins> and <pluginManagement>, it doesn't matter. Now I guess I'm supposed to declare the versions in as properties and specify plugin versions in every single child module instead? Not only am I averse to duplication for the obvious reasons, but now my pom files are all massive! Now I googled this and apparently this is deliberate because "you should always specify specific versions, inheriting versions is evil and disallowed", yada yada. Yep, well it's one thing in the context of isolated projects but in the context of multiple modules comprising a single project, it's the opposite. The modules are components that are meant to be built together to create a larger system. Declaring certain plugin versions at project-scope is entirely reasonable and has many advantages. Secondly, when you don't specify a plugin version, where does it come from? The build still works so it's coming from somewhere, right? It comes from the Maven super POM which is like a template that all poms inherit. In it, default plugin versions are specified left-right-and-centre, and they are inherited and used without explicit specification or "evil" consequences (although there are some unless the release plugin has been updated to hardcode all plugin vers in child poms during release - haven't used it in a few years). Double standards.

Finally XML. Come on. It's 2012. XML has always been a tedious, ugly, unacceptably verbose format. The reasons that made it appealing in 1998 are no longer valid in my opinion. Especially not in the context of pom.xml. One the best things about Maven is that you don't have to write a bunch of crap for every project in order to get it to build in an automated fashion; Maven allows you to simply say "I'm blah v1.0 and I need v2 of X and v3.4 of Y to build. See ya!" and it takes care of the rest so long as you follow its standards. Add a few dependencies though and add a single command (argLine) to surefire (the Maven testing plugin) and you're looking at over 100 lines of XML. Writing everything manually in Ant would only come to about 80 or so and it uses XML too! There was a project called Maven Polygot a while back that was supposed to allow the pom to work in different formats but it seems dead or at least progressing slowly enough that it is moot anyway. Progress is pretty generally slow in Maven-land. Point: In this day and age I resent having to use XML almost anywhere. Have you ever filled in a form by hand, a tax return, a car rego, and seen:
Phone Number: ________________ /Phone Number
I haven't. I really dislike XML and I'm glad it's finally starting to trend away. Look at how concise things could be:

pom.yml -- https://github.com/mrdon/maven-yamlpom-plugin/wiki
Gradle -- http://www.gradle.org/docs/current/userguide/artifact_dependencies_tutorial.html
Buildr -- http://stackoverflow.com/questions/1015525/why-use-buildr-instead-of-ant-or-maven and scroll down a little.

Rant.stop.

Problems with at_exit{}, exit(), and RSpec

2012-09-12T14:47:00.001+10:00

I had an interesting problem the other day, working on a Ruby project of mine. I ran my tests: (note: I have "rak" aliased to "bundle exec rake")

rak test

which internally expands to the equivelent of:

rak test:spec test:int

which runs my specs and integration tests in that order.

Then an odd thing happened. My specs failed but then the integration tests ran anyway and scrolled the spec failure off the screen to report happy success. After some digging around I discovered the following:

I'd had this test failure for a few commits without noticing.
My CI builds on Travis CI were all reporting success (although the failure message was there in the build logs should one manually check.)
Rake itself and RSpec's rake task were both fine.
Running my tests directly with the rspec CLI, I was getting an exit code of 0 on both success and failure.

The Problem

RSpec was returning an error/exit/status code of 0 despite test failure. It should be non-zero so that external processes like Travis CI and Rake can determine that something's gone wrong and react accordingly.

I'm going to cut a tedious story down to the result here. After investigation I learned that rspec worked as expected again when I avoided this piece of code in my tests: What this piece of code does is:

create a temporary directory the first time it's called
reuse that temporary directory on subsequent calls
remove the directory at the end of the process's lifecycle

Why does that affect RSpec returning non-zero on failure? Because RSpec itself doesn't run immediately; it wants to wait until all of your specs have been loaded first. The way it accomplishes that is it registers itself to run via an at_exit block and then calls exit when it's finished with your specs. Still sounds like there's no problem right? Well this is what happens from that point on...

RSpec finishes running tests and calls exit with 0 for success or 1 for failure.
Accordingly, Ruby creates an instance of SystemExit and plonks it into the $! global variable.
Now that the process is shutting down, my at_exit in the snippet above starts running.
My cleanup code (correctly, validly and legally) runs Ruby's FileUtils.remove_entry_secure to remove the temporary directly created during tests. This isn't a problem in itself.
Here's the gotcha: FileUtils.remove_entry_secure removes the directory and sets $! to nil to indicate that no exception occurred.
Ruby ends the process and sets the exit code to the result of $!.status which was lost in the previous step.

The Solution

That was the problem. Now what's the solution?
Simple. Just take care to preserve the exit status in your at_exit so that it ends with what it started with. Here is a helper method that I created: (You can also view this file directly on Github in a utility library of mine.)

Then the solution becomes simply use at_exit_preserving_exit_status instead of at_exit in first snippet, and everything works again! Happy days!

TL;DR: Conclusion

Be careful that you don't corrupt the value of $! when using at_exit. If you're not careful (or don't use a handy, safe function like presented above), then you can corrupt the exit status of RSpec in particular, and other libraries that work in a similar fashion.

I Have Returned

2012-09-12T13:45:00.000+10:00

Over the past 4 months I've spent a lot of time overseas on holiday. I did a bunch of Asia with some mates, I visited India with my girlfriend. It was great fun and good to get away and have some new experiences, be in situations that you normally wouldn't (or even want to in some cases). I enjoyed myself and I'm glad I did it.

And now, I want no more of it for at least 6 months! I've been on 13 planes over the last 4 months. I'm tired of travelling. Now that I'm back home for good, I'm looking forward to finally being able to focus on work again.

Thus, I'll start giving this blog attention again. I have returned.

Ruby Mutex Reentrancy

2012-04-23T10:00:00.001+10:00

This morning I was making some Ruby code of mine thread-safe which is always fun. (I'm serious btw. I frikking love multithreaded programming!) In doing so I came across something that I found a bit surprising.

Consider the following snippet: Think it will work? Let's try...

<internal:prelude>:8:in `lock': deadlock; recursive locking (ThreadError)
 from :8:in `synchronize'
 from reentrancy.rb:5:in `block in '
 from :10:in `synchronize'
 from reentrancy.rb:4:in `'

Shocking!
Mutex is not reentrant. Wow. Ok. Let's try something else...

Let's change that Mutux into a Monitor and try again. Alrighty, let's put on fresh underwear and give it a whirl...

Monitor is reentrant.

Ah, the world makes sense again. If I had to code my own reentrancy I would've cried and hated Ruby a little bit. My love and faith in Ruby remains, yay!

Is There A Cost?

Nothing is free. Is there a performance penalty? Time for some benchmarks.

Here is a little benchmarking script that acquires and releases both a mutex and monitor 1 million times each: Benchmarking results:

                 user     system      total        real
Mutex        0.400000   0.000000   0.400000 (  0.406259)
Monitor      0.870000   0.010000   0.880000 (  0.864888)

Ouch, monitor takes over the double the time that mutex does. That's the trade-off.

What About JRuby

I'm curious, let's try JRuby too. We'll change bm to bmbm and fire it up.

Rehearsal ---------------------------------------------
Mutex       0.571000   0.000000   0.571000 (  0.539000)
Monitor     2.012000   0.000000   2.012000 (  2.012000)
------------------------------------ total: 2.583000sec

                user     system      total        real
Mutex       0.321000   0.000000   0.321000 (  0.321000)
Monitor     1.696000   0.000000   1.696000 (  1.696000)

Wow, Monitor is 5.3x slower when using JRuby!!! Hmmm, I suspect JIT just need more time to warmup. Here's a new benchmarking script with a big warmup: And the results:

> jruby --1.9 --fast reentrancy-benchmark-jruby.rb
Warmup #1/20
...
Warmup #20/20
                user     system      total        real
Mutex       0.357000   0.000000   0.357000 (  0.357000)
Monitor     0.768000   0.000000   0.768000 (  0.768000)

Ok, that's on-par with the MRI results. Mutex is fast off-the-bat with JRuby where as Monitor will be a lot slower at first then decrease to a little over double the speed of mutex.

Conclusion

Mutex: No reentrancy. Fast, less than half the speed of Monitor.
Monitor: Reentrancy. Slow, little over twice as slow as Mutex.

Ruby JSON Libraries

2012-04-18T20:27:00.000+10:00

Over the last 5 years or so I'd been away from the world of Ruby. I still used Ruby at work and home for various little things [cos it's awesome!] but that's quite different to living in it, especially seeing its community is one of the most fast-paced I've seen. So when I came back recently and I needed a JSON library, I went searching and found (what felt like) 100 different JSON libraries...

Long story short, I benchmarked them. Feel free to skip to the end of this post to just get the conclusion and be on your way.

What Was Used

NAME	VERSION	BUILD
MRI Ruby	1.9.3p125	ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-linux]
JRuby	1.6.7	jruby 1.6.7 (ruby-1.9.2-p312) (2012-02-22 3e82bc8) (OpenJDK 64-Bit Server VM 1.7.0_03-icedtea) [linux-amd64-java]
"	1.7.0.dev	jruby 1.7.0.dev (ruby-1.9.3-p139) (2012-04-15 b4b38d4) (Java HotSpot(TM) 64-Bit Server VM 1.7.0_03) [linux-amd64-java]
OpenJDK	1.7.0_03	OpenJDK Runtime Environment (IcedTea7 2.1) (ArchLinux build 7.b147_2.1-3-x86_64) OpenJDK 64-Bit Server VM (build 22.0-b10, mixed mode)
Oracle Java	1.7.0_03	java version "1.7.0_03" Java(TM) SE Runtime Environment (build 1.7.0_03-b04) Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02, mixed mode)
MultiJson	1.3.2	c73bc389fa1b0b1c0b8225ea77ff3e2dee312304

The following JSON libraries were tested:

NAME	GEM NAME	VERSION
Optimized JSON (Oj)	oj	1.2.4
YAJL	yajl-ruby	1.1.0
JSON JRuby	json-jruby	1.5.0
JSON Pure	json_pure	1.6.6
JSON gem	json	1.6.6
OkJson	okjson	Version packaged with MultiJson 1.3.2

I created a little app to benchmark each library that performs two functions 100,000 times and records the time taken. Said two functions are:

[Writing] Generates JSON for Ruby data structure:

{
  a: 2,
  b: (1..50).to_a, # i.e. an array of 1,2,3,4,5,6, ... ,49,50
  c: %w[asf xcvb sdfg sdf gfsd],
  d: {
    omg: 'hedfasgdsfg',
    wewr: 34,
    sfgjbsdf: %w[sdfg sdfgsdfgnj klj kj hkuih ui hu kjb bkj b sdfg],
  },
}

[Reading] Parses this JSON:

{"a":2, "b":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50], "c":["asf","xcvb","sdfg","sdf","gfsd"], "d":{"omg":"hedfasgdsfg", "wewr":34, "sfgjbsdf":["sdfg","sdfgsdfgnj","klj","kj","hkuih","ui","hu","kjb","bkj","b","sdfg"]}}

If you're interested you can grab the code here: https://github.com/japgolly/WebServerBenchmark/tree/json_libraries
You're free to read it, play with it, hack it, print it and eat it, make love to it; It's all good.

Finally, these tests were performed on a Q9550 with 8GB RAM running Arch Linux 64-bit.

➤ uname -a
Linux golly-desktop 3.3.2-1-ARCH #1 SMP PREEMPT Sat Apr 14 09:48:37 CEST 2012 x86_64 Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz GenuineIntel GNU/Linux

Results: First Cut

The numeric axis represents number of seconds taken to perform 100,000 operations.
Less is better.

	Linear Graph	Logarithmic Graph
MRI
JRuby (on Oracle Java)

Wow. Two things are immediately obvious:

OkJson is extremely slow on both Ruby implementations. Whatever its design goals, performance isn't one of them.
JRuby runs YAJL like my grandmother runs marathons.

I really thought JRuby would perform better. Maybe 1.7 will be better. I know the JRuby team expect significant performance gains; I wonder if they've implemented them yet... Let's try using the latest dev version of JRuby 1.7.0!

Also I think I'll try using JRuby 1.6.7 with the OpenJDK implementation of Java and see if that gives better results.

Results: Lots of JRuby Love

[Update 2012-04-23: New JRuby library discovered, see conclusion.]

Alrighty, done. Here are the results: (note: using a logarithmic scale here again) And the same thing expressed differently:

Jeez, it's not getting much better for JRuby...

Ok, enough of this. Let's get rid of the council-working options [hey, Aussies get that!] and just look at the feasible ones.

Results: The Finalists

The numeric axis represents number of seconds taken to perform 100,000 operations.
Less is better.

Conclusion

If you use MRI, use Oj unless you generate more JSON than you parse, in which case use YAJL.

If you use JRuby, you've only really got one choice: json-jruby. You can expect roughly the same performance with OpenJDK, Oracle Java and the latest dev build of JRuby.

Update 2012-04-23: jrjackson for JRuby is approx 4x faster than json-jruby and faster than MRI. If you use JRuby, you want this!!

I Have Blog

2012-04-02T16:22:00.000+10:00

And I decided I would create a new blog, the first in nearly 6 years.

And I decided I would overcome the ennui of asynchronous communication with a strategy (!), seeing things with new perspectives won through experience now that I'm the ripe, hoary age of 31.

And I decided I would bless this new blog with an irrelevant excerpt I love from a brilliant saga called Malazan Book of the Fallen by Steven Erikson.

Her finger provided the drama, ploughing a traumatic furrow across the well-worn path. The ants scurried in confusion, and Samar Dev watched them scrabbling fierce with the insult, the soldiers with their heads lifted and mandibles opened wide as if they would challenge the gods.

I don't know why I love that so much, but I do. Hey! I did say it was irrelevant.