Sunday, June 23, 2013

Fake Theorems for Free

This article documents an element of Scalaz design that I practice, because I believe it to be an element of Scalaz design principles, and quite a good one at that. It explains why Functor[Set] was removed yet Foldable[Set] remains. More broadly, it explains why a functor may be considered invalid even though “it doesn't break any laws”. It is a useful discipline to apply to your own Scala code.
  1. Do not use runtime type information in an unconstrained way.
  2. Corollary: do not use Object#equals or Object#hashCode in unconstrained contexts, because that would count as #1.
The simplest way to state and remember it is “for all means for all”. Another, if you prefer, might be “if I don't know anything about it, I can't look at it”.

We accept this constraint for the same reason that we accept the constraint of referential transparency: it gives us powerful reasoning tools about our code. Specifically, it gives us our free theorems back.

Madness

Let's consider a basic signature.

     def const[A](a: A, a2: A): A

With the principle intact, there are only two total, referentially transparent functions that we can write with this signature.

     def const[A](a: A, a2: A): A = a
     
     def const2[A](a: A, a2: A): A = a2

That is, we can return one or the other argument. We can't “look at” either A, so we can't do tests on them or combine them in some way.

Much of Scalaz is minimally documented because it is easy enough to apply this approach to more complex functions once you have a bit of practice. Many Scalaz functions are the only function you could write with such a signature.

Now, let us imagine that we permit the unconstrained use of runtime type information. Here are functions that are referentially transparent, which you will find insane anyway.

     def const3[A](a: A, a2: A): A = (a, a2) match {
       case (s: Int, s2: Int) => if (s < s2) a else a2
       case _ => a
     }
     
     def const4[A](a: A, a2: A): A =
       if (a.## < a2.##) a else a2

Now, look at what we have lost! If the lowly const can be driven mad this way, imagine what could happen with fmap. One of our most powerful tools for reasoning about generic code has been lost. No, this kind of thing is not meant for the realm of Scalaz.

Missing theorems

For completeness's sake, let us see the list of theorems from Theorems for free!, figure 1 on page 3, for which I can think of a counterexample, still meeting the stated function's signature, if we violate the above explained principle.

In each case, I have assumed all other functions but the one in question have their standard definitions as explained on the previous page of the paper. I recommend having the paper open to page 3 to follow along. They are restated in fake-Scala because you might like that. Let lift(f) be (_ map f), or f* as written in the paper.

head[X]: List[X] => X
a compose head = head compose lift(a)

tail[X]: List[X] => List[X]
lift(a) compose tail = tail compose lift(a)

++[X]: (List[X], List[X]) => List[X]
lift(a)(xs ++ ys) = lift(a)(xs) ++ lift(a)(ys)

zip[X, Y]: ((List[X], List[Y])) => List[(X, Y)]
lift(a product b) compose zip = zip compose (lift(a) product lift(b))

filter[X]: (X => Boolean) => List[X] => List[X]
lift(a) compose filter(p compose a) = filter(p) compose a

sort[X]: ((X, X) => Boolean) => List[X] => List[X]
wherever for all x, y in A , (x < y) = (a(x) <' a(y)), also lift(a) compose sort(<) = sort(<') compose lift(a)

fold[X, Y]: ((X, Y) => Y, Y) => List[X] => Y
wherever for all x in A, y in B, b(x + y) = a(x) * b(y) and b(u) = u', also b compose fold(+, u) = fold(*, u') compose lift(a)

Object#equals and Object#hashCode are sufficient to break all these free theorems, though many creative obliterations via type tests of the const3 kind also exist.

By contrast, here are the ones which I think are preserved. I hesitate to positively state that they are, just because there are so many possibilities opened up by runtime type information.

fst[X, Y]: ((X, Y)) => X
a compose fst = fst compose (a product b)

snd[X, Y]: ((X, Y)) => Y
b compose snd = snd compose (a product b)

I[X]: X => X
a compose I = I compose a

K[X, Y]: (X, Y) => X
a(K(x, y)) = K(a(x), a(y))

Here is a useful excerpt from the paper itself, section 3.4 “Polymorphic equality”, of which you may consider this entire article a mere expansion.

... polymorphic equality cannot be defined in the pure polymorphic lambda calculus. Polymorphic equality can be added as a constant, but then parametricity will not hold (for terms containing the constant).

This suggests that we need some way to tame the power of the polymorphic equality operator. Exactly such taming is provided by the eqtype variables of Standard ML [Mil87], or more generally by the type classes of Haskell [HW88, WB89].

Compromise

Scalaz has some tools to help deal with things here. The Equal typeclass contains the equalIsNatural method as runtime evidence that Object#equals is expected to work; this evidence is used by other parts of Scalaz, and available to you.

Scalaz also provides

     implicit def setMonoid[A]: Monoid[Set[A]]

Relative to Functor, this is more or less harmless, because Monoid isn't so powerful; once you have Monoid evidence in hand, it doesn't “carry” any parametric polymorphism the way most Scalaz typeclasses do. It provides no means to actually fill sets, and the semigroup is also symmetric, so it seems unlikely that there is a way to write Monoid-generic code that can use this definition to break things.

More typical are definitions like

     implicit def setOrder[A: Order]: Order[Set[A]]

Which may use Object#equals, but is constrained in a way that they can be sure it's safe to do so, just as implied in the quote above.

Insofar as “compromise” characterizes the above choices, I think Scalaz's position in the space of possibilities is quite good. However, I would be loath to see any further relaxing of the principles I have described here, and I hope you would be too.

Mistakes are part of history

And sometimes, later, they turn out not to be mistakes at all.

Has this never happened to you?  For my part, sometimes I am mistaken, and sometimes I am even mistaken about what I am mistaken about.  So it is worthwhile to keep records of failed experiments.

You can always delete information later, as a log-viewing tool might, but you can never get it back if you just deleted it in the first place.

Please consider this, git lovers, before performing your next rebase or squashed merge.

(My favorite VC quote courtesy ddaa of GNU Arch land, of all places)