Today, I want to talk about a topic I’ve been meaning to drag up for a while. A couple of weeks ago, Robert started a kerfluffle by stating that Scala was not a functional programming language. One of the things that became clear in the responses was that many people who were debating whether a given language X was or was not a functional programming language didn’t have a good idea of what a functional programming language was. The situation was made worse, I think, by the fact that many of the key terms in the programming industry do not come with rigorous definitions, and thus tend to shade into buzzwords at the edges. This is not the case with functional- there is a very rigorous definition of what is meant by “functional”, and I’d like to introduce people to it.

So, what is it that differentiates the functional programming languages from all the other programming languages? It is simply this: the functional programming languages use, as their fundamental model of computation, the lambda calculus, while all the other programming languages use the Turing machine as their fundamental model of computation. (Well, technically, I should say functional programming languages vr.s imperative programming languages- as languages in other paradigms use other models. For example, SQL uses the relational model, Prolog uses a logic model, and so on. However, pretty much all the languages people actually think about when discussing programming languages are either functional or imperative, so I’ll stick with the easy generality.)

What do I mean by “fundamental model of computation”? Well, all languages can be thought of in two layers: one, some core Turing-complete language, and then layers of either abstractions or syntactic sugar (depending upon whether you like them or not) which are defined in terms of the base Turing-complete language. The core language for imperative languages is then a variant of the classic Turing machine model of computation one might call “the C language”. In this language, memory is an array of bytes that can be read from and written to, and you have one or more CPUs which read memory, perform simple arithmetic, branch on conditions, and so on. That’s what I mean by the fundamental model of computation of these languages is the Turing Machine.

The fundamental model of computation is the Lambda Calculus, and this shows up in two different ways. First, one thing that many functional languages do is to write their specifications explicitly in terms of a translation to the lambda calculus to specify the behavior of a program written in the language (this is known as “denotational semantics”). And second, almost all functional programming languages implement their compilers to use an explicit lambda-calculus-like intermediate language- Haskell has Core, Lisp and Scheme have their “desugared” representation (after all macros have been applied), Ocaml has it’s lispish intermediate representation, and so on.

So what is this lambda calculus I’ve been going on about? Well, the basic idea is that, to do any computation, you only need two things. The first thing you need is function abstraction- the definition of an unnamed, single-argument, function. Alonzo Church, who first defined the Lambda calculus used the rather obscure notation to define a function as the greek letter lambda, followed by the one-character name of the argument to the function, followed by a period, followed by the expression which was the body of the function. So the identity function, which given any value, simply returns that value, would look like “λx.x” I’m going to use a slight more human-readable approach- I’m going to replace the λ character with the word “fun”, the period with “->”, and allow white space and allow multi-character names. So I might write the identity function as “fun x -> x”, or even “fun whatever -> whatever”. The change in notation doesn’t change the fundamental nature. Note that this is the source of the name “lambda expression” in languages like Haskell and Lisp- expressions that introduce unnamed local functions.

The only other thing you can do in the Lambda Calculus is to call functions. You call a function by applying an argument to it. I’m going to follow the standard convention that application is just the two names in a row- so “f x” is applying the value x to the function named f. We can replace f with some other expression, including a Lambda expression, if we want- and we can When you apply an argument to an expression, you replace the application with the body of the function, with all the occurrences of the argument name replaced with whatever value was applied. So the expression “(fun x -> x x) y” becomes “y y”.

The theoreticians went to great lengths to precisely define what they mean by “replacing all occurrences of the variable with the the value applied”, and can go on at great lengths about how precisely this works (throwing around terms like “alpha renaming”), but in the end things work exactly like you expect them to. The expression “(fun x -> x x) (x y)” becomes “(x y) (x y)”- there is no confusion between the argument x within the anonymous function, and the x in the value being applied. This works even in multiple levels- the expression “(fun x -> (fun x -> x x)) (x x)) (x y)” becomes first “(fun x -> x x) ((x y) (x y))” and then “((x y) (x y)) ((x y) (x y))”. The x in the innermost function (“(fun x -> x x)”) is a different x than the other x’s.

It is perfectly valid to think of function application as a string manipulation. If I have a (fun x -> some expression), and I apply some value to it, then the result is just some expression with all the x’s *textually* replaced with the “some value” (except for those which are shadowed by another argument).

As an aside, I will add parenthesis where needed to disambiguate things, and also elide them where not needed. The only difference they make is grouping, they have no other meaning.

So that’s all there is too it to the Lambda calculus. No, really, that’s all- just anonymous function abstraction, and function application. I can see you’re doubtful about this, so let me address some of your concerns.

First, I specified that a function only took one argument- how do you have a function that takes two, or more, arguments? Easy- you have a function that takes one argument, and returns a function that takes the second argument. For example, function composition could be defined as fun f -> (fun g -> (fun x -> f (g x))) – read that as a function that takes an argument f, and returns a function that takes an argument g and return a function that takes an argument x and return f (g x).

So how do we represent integers, using only functions and applications? Easily (if not obviously)- the number one, for instance, is a function fun s -> fun z -> s z – given a “successor” function s and a “zero” z, one is then the successor to zero. Two is fun s -> fun z -> s s z, the successor to the successor to zero, three is fun s -> fun z -> s s s z, and so on.

To add two numbers, say x and y, is again simple, if subtle. The addition function is just fun x -> fun y -> fun s -> fun z -> x s (y s z). This looks odd, so let me run you through an example to show that it does, in fact work- let’s add the numbers 3 and 2. Now, three is just (fun s -> fun z -> s s s z) and two is just (fun s -> fun z -> s s z), so then we get (each step applying one argument to one function, in no particular order):

(fun x -> fun y -> fun s -> fun z -> x s (y s z)) (fun s -> fun z -> s s s z) (fun s -> fun z -> s s z)

(fun y -> fun s -> fun z -> (fun s -> fun z -> s s s z) s (y s z)) (fun s -> fun z -> s s z)

(fun y -> fun s -> fun z -> (fun z -> s s s z) (y s z)) (fun s -> fun z -> s s z)

(fun y -> fun s -> fun z -> s s s (y s z)) (fun s -> fun z -> s s z)

(fun s -> fun z -> s s s ((fun s -> fun z -> s s z) s z))

(fun s -> fun z -> s s s (fun z -> s s z) z)

(fun s -> fun z -> s s s s s z)

And at the end we get the unsurprising answer of the successor to the successor to the successor to successor to the successor to zero, known more colloquially as five. Addition works by replacing the “zero” (or where we start counting) of the x value with the y value- to define multiplication, we instead diddle with the concept of “successor”:

(fun x -> fun y -> fun s -> fun z -> x (y s) z)

I’ll leave it to you to verify that the above code does, in fact, multiply two numbers. For the record, this trick is known as the “Church encoding” of the numerals. Obviously, this is a horribly inefficient method of implementing numbers- adding two numbers M and N is O(M+N) step. In real implementations Church Numerals are replaced by more efficient implementations (generally based on hardware representations of ints). But this is nothing more than an optimization.

Another example I’d like to make is how to represent lists in the lambda calculus. Obviously, as all we have are functions, the representation of a list has to be some function- I’m going to choose the right fold function. The right fold takes a function f, an initial value y, and a list of values x1, x2, … xn, and calculates the result f x1 (f x2 (… (f xn y))). Note that it starts from the end of the list and works backwards (this is unlike the left fold, or reduce, operation, which starts at the front of the list and works forward). I’ll get to why I choose the right fold in a moment, lets make the list.

The empty list is simple- it just returns the initial value unmodified: (fun f -> fun y -> y). Note the structure of our list is function of two arguments, the function to fold over the list, and the initial value. To prepend an element onto the head of the list, we just add a new layer onto the right fold: (fun x -> fun t -> fun f -> fun y -> f x (t f y)). Conceptually, think of this function as taking two arguments- the value to prepend onto the list, and the list to prepend onto, and it returns a new list, except that the lists passed in and returned are themselves right-fold functions.

So what do we get when we cons the values x1, x2, and x3 onto the empty list? I’ll let you work through the steps, but it shouldn’t surprise you to learn you end up with (fun f -> fun y -> f x1 (f x2 (f x3 y))), which is exactly what we expect it to be.

I picked fold right, instead of fold left, because most everything else done on lists is easy to implement given fold right. For example, map (given a function f, create a new list by calling f on every element of a given list) is just folding right with a function of (fun x -> fun y -> cons (f x) y) (where cons is our prepend element function, defined above), and the initial value being the empty list. To append list a to list b, you simple fold right over list a with a function of cons and an initial value of list b. And so on.

Again, this is not how lists are actually implemented in functional languages- generally the implementations use some variation on the classic cons cell approach. But again, this is just an optimization.

There is lots more to the lambda calculus- for example, I haven’t even mentioned the famed Y-combinator, which introduces recursion to the Lambda Calculus. There is way more than I can cover in a single blog post, in fact, it’d take a large chunk of a book (which, I comment, has already been written).

But what I’m trying to do here is to give you a taste, a feel for the lambda calculus- because the laundry list of features which are normally given as requirements to be a functional programming language all come directly and demonstrably from using the lambda calculus as the language’s fundamental model of computation.

Take, for example, first class functions- the ability to pass functions around as values, return them from functions, even store them in data structures. In the lambda calculus, functions are the *only* first class objects, as we’ve seen. Even things like integers are second class citizens in the lambda calculus. This isn’t some political statement about the Kingdom of the Nouns. It arises directly from the Lambda Calculus.

Immutability of data structures is another hallmark of functional programming. The list implementation I gave above is immutable- the cons function, prepending an element onto the head of a list, doesn’t change the original list. It creates a new list, which is the same as the old list but plus the new element. The Lambda Calculus is innocent of the notion of mutability- most functional languages have some sort of mutability tacked on (as it were) to a greater or lesser extent, but the default assumption of both the programmers and the language is that data structures are immutable. You don’t modify the old data structure, you create a new data structure and return it.

Or take tail call optimization- I have stated the opinion that all real functional languages have tail call optimization. Why is this? Because the Lambda Calculus doesn’t have loops in the classic C style. The very concept of a loop in C is mutable/imperative in nature- you have one or more mutable “loop” variables (often named “i”or “j”) which get updated. Instead, the Lambda Calculus uses recursion, specifically recursive tail calls. But the Lambda Calculus lives in the realm of pure theory, where stacks (if they even exist, being an implementation detail after all) are infinitely big. Here in the real world, stacks exist and are not infinitely big, so you need tail call optimization to allow the use of recursion for looping. Again, this isn’t to say that languages that don’t have tail call optimization are bad languages, just that they are not functional programming languages.

Currying (taking a function that takes two arguments and turning it into a function that takes one argument and returns a function that takes the second argument) and partial function evaluation also come out of Lambda Calculus- remember, the Lambda Calculus doesn’t have multi-argument functions *at all*, it (implicitly) curries everything. And partial function evaluation is also the norm.

Some things which are common in functional languages never the less do not arise naturally out the Lambda Calculus, and are thus are not required to be a functional language. For example, variant types (sorry, Robert) and static typing- neither are *required* to be a functional language (and we see that the Lisps- Lisp, Scheme, Clojure, et. al., don’t have either).

At this point, I feel compelled to point out two things. The first is that there is a difference between programming in the functional *style*, and being a functional programming language. This makes sense if you step back and think about it for a moment- for example, you can program in an Object Oriented fashion in a non-OO language, as GTK (written in C) demonstrates. But that doesn’t mean that C is an OO language (not C++, C). Likewise, you can program in a functional style in a non-functional language. However, it’s much harder to learn the functional style in a non-functional language, for the same reason it’s hard to learn OO in pure C. So what is a functional programming language and what isn’t, still matters. The people claiming that you can learn functional programming in, for example, Python or Ruby, are as wrong as those who maintained you could learn OO in C or Pascal, and for the same reason.

The second thing I need to point out is that simply because language X is not a functional programming language, doesn’t make it a bad programming language. In fact, not being a functional programming language is often an advantage- the best programming languages are the ones that have a specific vision for what the language should be. And this works both ways, I comment- functional programming languages are a lousy place to learn OO. Those languages that even try to do OO (OCaml, Clojure), the OO parts are clearly second class citizens that the seasoned developers avoid where ever possible. And many functional programming languages (Haskell, SML) don’t even try to do OO. Languages that try to be all things to all people (C++, I’m looking at you) are so complex that they end up being nothing to nobody (any one else here remember PL/I?).

In this sense, I think Guido was right- Python isn’t a functional programming language, *and shouldn’t try to be*. Rather than trying to be a second-rate Haskell or Scheme, it should concentrate on being a first-rate Python.

Pingback: links for 2009-07-23 | .:: a few thoughts on the subject by rob wright ::.

Pingback: Scala: Post-Functional, Post-Modern, or Just Perl++?

Pingback: Stuff i'd like to share with developers/programmers - Digit Technology Discussion Forum - Tech Discussion Forums in India