What is a functional programming language? July 23, 2009 | 09:58 am

Today, I want to talk about a topic I’ve been meaning to drag up for a while. A couple of weeks ago, Robert started a kerfluffle by stating that Scala was not a functional programming language. One of the things that became clear in the responses was that many people who were debating whether a given language X was or was not a functional programming language didn’t have a good idea of what a functional programming language was. The situation was made worse, I think, by the fact that many of the key terms in the programming industry do not come with rigorous definitions, and thus tend to shade into buzzwords at the edges. This is not the case with functional- there is a very rigorous definition of what is meant by “functional”, and I’d like to introduce people to it.

So, what is it that differentiates the functional programming languages from all the other programming languages? It is simply this: the functional programming languages use, as their fundamental model of computation, the lambda calculus, while all the other programming languages use the Turing machine as their fundamental model of computation. (Well, technically, I should say functional programming languages vr.s imperative programming languages- as languages in other paradigms use other models. For example, SQL uses the relational model, Prolog uses a logic model, and so on. However, pretty much all the languages people actually think about when discussing programming languages are either functional or imperative, so I’ll stick with the easy generality.)

What do I mean by “fundamental model of computation”? Well, all languages can be thought of in two layers: one, some core Turing-complete language, and then layers of either abstractions or syntactic sugar (depending upon whether you like them or not) which are defined in terms of the base Turing-complete language. The core language for imperative languages is then a variant of the classic Turing machine model of computation one might call “the C language”. In this language, memory is an array of bytes that can be read from and written to, and you have one or more CPUs which read memory, perform simple arithmetic, branch on conditions, and so on. That’s what I mean by the fundamental model of computation of these languages is the Turing Machine.

The fundamental model of computation is the Lambda Calculus, and this shows up in two different ways. First, one thing that many functional languages do is to write their specifications explicitly in terms of a translation to the lambda calculus to specify the behavior of a program written in the language (this is known as “denotational semantics”). And second, almost all functional programming languages implement their compilers to use an explicit lambda-calculus-like intermediate language- Haskell has Core, Lisp and Scheme have their “desugared” representation (after all macros have been applied), Ocaml has it’s lispish intermediate representation, and so on.

So what is this lambda calculus I’ve been going on about? Well, the basic idea is that, to do any computation, you only need two things. The first thing you need is function abstraction- the definition of an unnamed, single-argument, function. Alonzo Church, who first defined the Lambda calculus used the rather obscure notation to define a function as the greek letter lambda, followed by the one-character name of the argument to the function, followed by a period, followed by the expression which was the body of the function. So the identity function, which given any value, simply returns that value, would look like “λx.x” I’m going to use a slight more human-readable approach- I’m going to replace the λ character with the word “fun”, the period with “->”, and allow white space and allow multi-character names. So I might write the identity function as “fun x -> x”, or even “fun whatever -> whatever”. The change in notation doesn’t change the fundamental nature. Note that this is the source of the name “lambda expression” in languages like Haskell and Lisp- expressions that introduce unnamed local functions.

The only other thing you can do in the Lambda Calculus is to call functions. You call a function by applying an argument to it. I’m going to follow the standard convention that application is just the two names in a row- so “f x” is applying the value x to the function named f. We can replace f with some other expression, including a Lambda expression, if we want- and we can When you apply an argument to an expression, you replace the application with the body of the function, with all the occurrences of the argument name replaced with whatever value was applied. So the expression “(fun x -> x x) y” becomes “y y”.

The theoreticians went to great lengths to precisely define what they mean by “replacing all occurrences of the variable with the the value applied”, and can go on at great lengths about how precisely this works (throwing around terms like “alpha renaming”), but in the end things work exactly like you expect them to. The expression “(fun x -&gt x x) (x y)” becomes “(x y) (x y)”- there is no confusion between the argument x within the anonymous function, and the x in the value being applied. This works even in multiple levels- the expression “(fun x -> (fun x -> x x)) (x x)) (x y)” becomes first “(fun x -> x x) ((x y) (x y))” and then “((x y) (x y)) ((x y) (x y))”. The x in the innermost function (“(fun x -> x x)”) is a different x than the other x’s.

It is perfectly valid to think of function application as a string manipulation. If I have a (fun x -> some expression), and I apply some value to it, then the result is just some expression with all the x’s textually replaced with the “some value” (except for those which are shadowed by another argument).

As an aside, I will add parenthesis where needed to disambiguate things, and also elide them where not needed. The only difference they make is grouping, they have no other meaning.

So that’s all there is too it to the Lambda calculus. No, really, that’s all- just anonymous function abstraction, and function application. I can see you’re doubtful about this, so let me address some of your concerns.

First, I specified that a function only took one argument- how do you have a function that takes two, or more, arguments? Easy- you have a function that takes one argument, and returns a function that takes the second argument. For example, function composition could be defined as fun f -> (fun g -> (fun x -&gt f (g x))) – read that as a function that takes an argument f, and returns a function that takes an argument g and return a function that takes an argument x and return f (g x).

So how do we represent integers, using only functions and applications? Easily (if not obviously)- the number one, for instance, is a function fun s -> fun z -> s z – given a “successor” function s and a “zero” z, one is then the successor to zero. Two is fun s -> fun z -> s s z, the successor to the successor to zero, three is fun s -> fun z -> s s s z, and so on.

To add two numbers, say x and y, is again simple, if subtle. The addition function is just fun x -> fun y -> fun s -> fun z -> x s (y s z). This looks odd, so let me run you through an example to show that it does, in fact work- let’s add the numbers 3 and 2. Now, three is just (fun s -> fun z -> s s s z) and two is just (fun s -> fun z -> s s z), so then we get (each step applying one argument to one function, in no particular order):

(fun x -> fun y -> fun s -> fun z -> x s (y s z)) (fun s -> fun z -> s s s z) (fun s -> fun z -> s s z)

(fun y -> fun s -> fun z -> (fun s -> fun z -> s s s z) s (y s z)) (fun s -> fun z -> s s z)

(fun y -> fun s -> fun z -> (fun z -> s s s z) (y s z)) (fun s -> fun z -> s s z)

(fun y -> fun s -> fun z -> s s s (y s z)) (fun s -> fun z -> s s z)

(fun s -> fun z -> s s s ((fun s -> fun z -> s s z) s z))

(fun s -> fun z -> s s s (fun z -> s s z) z)

(fun s -> fun z -> s s s s s z)

And at the end we get the unsurprising answer of the successor to the successor to the successor to successor to the successor to zero, known more colloquially as five. Addition works by replacing the “zero” (or where we start counting) of the x value with the y value- to define multiplication, we instead diddle with the concept of “successor”:

(fun x -> fun y -> fun s -> fun z -> x (y s) z)

I’ll leave it to you to verify that the above code does, in fact, multiply two numbers. For the record, this trick is known as the “Church encoding” of the numerals. Obviously, this is a horribly inefficient method of implementing numbers- adding two numbers M and N is O(M+N) step. In real implementations Church Numerals are replaced by more efficient implementations (generally based on hardware representations of ints). But this is nothing more than an optimization.

Another example I’d like to make is how to represent lists in the lambda calculus. Obviously, as all we have are functions, the representation of a list has to be some function- I’m going to choose the right fold function. The right fold takes a function f, an initial value y, and a list of values x1, x2, … xn, and calculates the result f x1 (f x2 (… (f xn y))). Note that it starts from the end of the list and works backwards (this is unlike the left fold, or reduce, operation, which starts at the front of the list and works forward). I’ll get to why I choose the right fold in a moment, lets make the list.

The empty list is simple- it just returns the initial value unmodified: (fun f -> fun y -> y). Note the structure of our list is function of two arguments, the function to fold over the list, and the initial value. To prepend an element onto the head of the list, we just add a new layer onto the right fold: (fun x -> fun t -> fun f -> fun y -> f x (t f y)). Conceptually, think of this function as taking two arguments- the value to prepend onto the list, and the list to prepend onto, and it returns a new list, except that the lists passed in and returned are themselves right-fold functions.

So what do we get when we cons the values x1, x2, and x3 onto the empty list? I’ll let you work through the steps, but it shouldn’t surprise you to learn you end up with (fun f -> fun y -> f x1 (f x2 (f x3 y))), which is exactly what we expect it to be.

I picked fold right, instead of fold left, because most everything else done on lists is easy to implement given fold right. For example, map (given a function f, create a new list by calling f on every element of a given list) is just folding right with a function of (fun x -> fun y -> cons (f x) y) (where cons is our prepend element function, defined above), and the initial value being the empty list. To append list a to list b, you simple fold right over list a with a function of cons and an initial value of list b. And so on.

Again, this is not how lists are actually implemented in functional languages- generally the implementations use some variation on the classic cons cell approach. But again, this is just an optimization.

There is lots more to the lambda calculus- for example, I haven’t even mentioned the famed Y-combinator, which introduces recursion to the Lambda Calculus. There is way more than I can cover in a single blog post, in fact, it’d take a large chunk of a book (which, I comment, has already been written).

But what I’m trying to do here is to give you a taste, a feel for the lambda calculus- because the laundry list of features which are normally given as requirements to be a functional programming language all come directly and demonstrably from using the lambda calculus as the language’s fundamental model of computation.

Take, for example, first class functions- the ability to pass functions around as values, return them from functions, even store them in data structures. In the lambda calculus, functions are the only first class objects, as we’ve seen. Even things like integers are second class citizens in the lambda calculus. This isn’t some political statement about the Kingdom of the Nouns. It arises directly from the Lambda Calculus.

Immutability of data structures is another hallmark of functional programming. The list implementation I gave above is immutable- the cons function, prepending an element onto the head of a list, doesn’t change the original list. It creates a new list, which is the same as the old list but plus the new element. The Lambda Calculus is innocent of the notion of mutability- most functional languages have some sort of mutability tacked on (as it were) to a greater or lesser extent, but the default assumption of both the programmers and the language is that data structures are immutable. You don’t modify the old data structure, you create a new data structure and return it.

Or take tail call optimization- I have stated the opinion that all real functional languages have tail call optimization. Why is this? Because the Lambda Calculus doesn’t have loops in the classic C style. The very concept of a loop in C is mutable/imperative in nature- you have one or more mutable “loop” variables (often named “i”or “j”) which get updated. Instead, the Lambda Calculus uses recursion, specifically recursive tail calls. But the Lambda Calculus lives in the realm of pure theory, where stacks (if they even exist, being an implementation detail after all) are infinitely big. Here in the real world, stacks exist and are not infinitely big, so you need tail call optimization to allow the use of recursion for looping. Again, this isn’t to say that languages that don’t have tail call optimization are bad languages, just that they are not functional programming languages.

Currying (taking a function that takes two arguments and turning it into a function that takes one argument and returns a function that takes the second argument) and partial function evaluation also come out of Lambda Calculus- remember, the Lambda Calculus doesn’t have multi-argument functions at all, it (implicitly) curries everything. And partial function evaluation is also the norm.

Some things which are common in functional languages never the less do not arise naturally out the Lambda Calculus, and are thus are not required to be a functional language. For example, variant types (sorry, Robert) and static typing- neither are required to be a functional language (and we see that the Lisps- Lisp, Scheme, Clojure, et. al., don’t have either).

At this point, I feel compelled to point out two things. The first is that there is a difference between programming in the functional style, and being a functional programming language. This makes sense if you step back and think about it for a moment- for example, you can program in an Object Oriented fashion in a non-OO language, as GTK (written in C) demonstrates. But that doesn’t mean that C is an OO language (not C++, C). Likewise, you can program in a functional style in a non-functional language. However, it’s much harder to learn the functional style in a non-functional language, for the same reason it’s hard to learn OO in pure C. So what is a functional programming language and what isn’t, still matters. The people claiming that you can learn functional programming in, for example, Python or Ruby, are as wrong as those who maintained you could learn OO in C or Pascal, and for the same reason.

The second thing I need to point out is that simply because language X is not a functional programming language, doesn’t make it a bad programming language. In fact, not being a functional programming language is often an advantage- the best programming languages are the ones that have a specific vision for what the language should be. And this works both ways, I comment- functional programming languages are a lousy place to learn OO. Those languages that even try to do OO (OCaml, Clojure), the OO parts are clearly second class citizens that the seasoned developers avoid where ever possible. And many functional programming languages (Haskell, SML) don’t even try to do OO. Languages that try to be all things to all people (C++, I’m looking at you) are so complex that they end up being nothing to nobody (any one else here remember PL/I?).

In this sense, I think Guido was right- Python isn’t a functional programming language, and shouldn’t try to be. Rather than trying to be a second-rate Haskell or Scheme, it should concentrate on being a first-rate Python.

  • syskill

    The core language for imperative languages is then a variant of the classic Turing machine model of computation one might call “the C language”. In this language, memory is an array of bytes that can be read from and written to, and you have one or more CPUs which read memory, perform simple arithmetic, branch on conditions, and so on.

    That is an accurate description of a von Neumann architecture, not so much of a Turing machine. I don’t know of any programming language that is actually based on the Turing machine concept.

  • bluestorm

    syskill > Brainfuck ?

    > The very concept of a loop in C is mutable/imperative in nature- you
    > have one or more mutable loop variables (often named i or j)
    > which get updated.

    That explanation only encompass a few special cases of for loop. A better one is that, whatever the loop conditions are (if there are), the body of the loops doesn’t “return” anything, and is thus necessarily a side effect.

    You’re right that typing isn’t included per se in untyped lambda calculus, but typing is indeed a major component of the studies of various lambda calculi (normalization, yada yada). One could say that the typing discipline of functional languages derives from the typing experiments around lambda calculus (even if types were used for set theory and other sort of things), and that typing is an essential part of, well, typed functional languages.

  • Brian

    Syskill: I will cop to sweeping a theoretical complexity under the rug there. The Von Neumann architecture isn’t used for proving computability or complexity theorems (that I know of)- it’s not “at the same level” the Lambda Calculus is. The Turing Machine is the correct analog.

    In retrospect, there is a parallel here I should have explicitly called out- as you say, modern programming languages (at least those that aren’t intended to be jokes, like Brainfuck) don’t actually use the Turing Machine as their target, instead they target the Von Neumann Architecture. Likewise, functional programming languages don’t go all the way to the Lambda Calculus either- for example, no one actually uses Church encoded numerals as their basic integer type (except for joke languages like Unlambda). There isn’t a good name for the “enriched Lambda Calculus” that functional languages compile to, that I know of, that has any currency at all (anyone remember the Categorical Abstract Machine?).

    Of course, actually looking at the implementations of compilers makes the situation more confusing yet, as you end up running smack dab into the equivalence between the Lambda Calculus and the Turing Machine/Von Neumann Architecture. You can implement the Lambda Calculus on a Turing Machine or Von Neumann architecture, and you can implement both Turing Machines and Von Neumann architectures in the Lambda Calculus. So the compilers for functional languages “spit out” Von Neumann architecture assuming code. Worse than that, most compilers for imperative languages first translate the imperative language into a single static assignment form- which is basically just a weird form of a functional language, which then gets compiled back into an imperative form to be executed by the stock Von Neumann architecture hardware. So within the compiler for a given language of either type, there is a confusion of paradigms.

  • Pingback: links for 2009-07-23 | .:: a few thoughts on the subject by rob wright ::.

  • ezekiel

    A functional programming language is based on the idea of a mathematical function. A function is an atomic unit of code that takes input values and returns a result value–with no other side effects. The key benefit of such an approach is referential transparency. To this “functional” core, most real-world languages bend the rules to allow for side effects in controlled ways.

    Various functional programming languages have extended this idea in various directions: different extents of lambda calculus, different tolerance of mutable variables, different approaches to data types, etc.

    The article is correct to note that functional programming languages tend to be more abstracted from the microprocessor that runs the program.

  • Dan

    (fun x -> (fun x -> x x)) (x x)) (x y) should instead be
    (fun x -> (fun x -> x x) (x x)) (x y)

  • Marc Stock

    “Those languages that even try to do OO (OCaml, Clojure), the OO parts are clearly second class citizens that the seasoned developers avoid where ever possible. ”

    Clojure is not an OO language, nor does it try to be. I presume you meant to write Scala?

  • Andy Troutman

    This multi function evaluation seems incorrect to me, and if not I’m assuming there is a gap in my understanding.

    >> (fun x -> (fun x -> x x)) (x x)) (x y)” becomes first
    >>“(fun x -> x x) ((x y) (x y))” and then
    >>“((x y) (x y)) ((x y) (x y))”.
    >>The x in the innermost function (”(fun x -> x x)”) is a different x than the other x’s.

    shouldn’t it first become
    (fun x -> (xx) (xx))(x y)?
    I would have thought that the right most xx pair would first be applied to the inner-most function, leaving you with what I have above, but your example seems to have applied xy first. in either case, we end up at the same final solution of
    (xy)(xy)(xy)(xy), just wondering where my mistake in evaluation is…
    Thanks,
    Andy T

  • Dan

    @Andy Troutman: as I wrote a couple of comments ago the parenthesization is not correct (there is an unbalanced closed parenthesis which was not opened before).
    So (fun x -> (fun x -> x x)) (x x)) (x y) should be (fun x -> (fun x -> x x) (x x)) (x y) which is alpha-equivalent to (fun a -> (fun b -> b b) (a a)) (x y) and now it is up to your reduction strategy: if you have the full beta you can choose between:

    … -> (fun a -> (a a) (a a)) (x y) -> ((x y) (x y)) ((x y) (x y))
    or
    … -> (fun b -> b b) ((x y) (x y)) -> ((x y) (x y)) ((x y) (x y))

    and as you can see, and have noticed yourself, both strategies yields the same result; this does not happen by chance but is a very important property of every well-formed lambda-calculus term, see Church-Rosser’s theorem for more details [http://en.wikipedia.org/wiki/Church%E2%80%93Rosser_theorem]

  • http://james-iry.blogspot.com/2009/05/brief-incomplete-and-mostly-wrong.html James Iry

    Your premise is off. There actually are many different definitions of “functional.” Some people use it the way you’ve used it, but historically Lisp was called functional even before it had scoping rules that resembled lambda calculus. Joy is generally called functional, but it doesn’t mapping to LC is complicated (LC is about “points” where Joy is basically about being “point-free”).

    Even when you try to put rigor to the definition, you end up very broad. For instance Sabry (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.27.7800) says ” A language is purely functional if (i) it includes every simply typed calculus term and (ii) its call-by-name, call-by-need, and call-by-value implementations are equivalent (modulo divergence and errors).” The second part of his definition is the “pure” part. The first is “functional.” By that definition Scala is functional (though, obviously not pure). Every term in the simply typed LC is a term in Scala (in fact, you can encode every term in both untyped LC and much more richly typed LCs).

    Finally, it’s worth reading Cardelli’s work on embedding OO style dynamic dispatch semantics into LC. It shows that if you dig beneath the skin, standard OO and “impure” functional languages aren’t all that different and, further, there’s no fundamental reason that an OO language can’t be pure in the same way that Haskell is.

  • Brian

    There is an emotional “logic” that goes something like this: Functional programming languages are good. My programming language is good. Therefor my programming language is a functional programming language.” The people using this logic then look for a definition of “functional programming language” which allows the conclusion to be true.

    The down side of this is that, after enough people doing this enough times, no words have any meaning other than “subjectively good ” and “subjectively bad”. At which point all debates devolve down into screeching, chest thumping, and poo flinging. If you don’t believe this, look at politics.

    An important point I didn’t make is that this is not my definition. This isn’t Brian Hurt trying to enforce his will on the debate. This is the accepted (if not explicitly stated) definition the functional programming community itself has accepted for decades. For example, when reading John Hughes’ seminal “Why Functional Programming Matters”, this is the definition he’s using. He doesn’t explicitly come out and say “lambda calculus”, but his description leaves little doubt that is what he means.

    And John McCarthy intended Lisp to be an implementation of the Lambda Calculus from day one. It was only his (admitted) lack of understanding of the Lambda Calculus that caused him to screw it up originally. Lisp today is most definitely based on the Lambda Calculus, and thus is a functional programming language.

    There is also a difference between a “pure functional programming language” and just a “functional programming language”. In a pure functional programming language, call by need, call by name, and call by value are indeed equivalent- but in an impure functional language they aren’t. I’m just stating the definition of what a functional language, pure or impure, is. If you insist that the only functional programming languages are the pure ones, then SML, Ocaml, and Lisp are not functional languages (which will earn you a bwuh? from all involved).

    My definition is based on the “user interface” of the language., not the implementation. As I mentioned in the comment above, it’s not unusual for the implementations of languages to switch between paradigms- imperative languages being translated to functional styles and then back again, for example. Obviously I didn’t make the clear enough in the original post.

    There are languages that are neither functional nor imperative. In the original article I explicitly call out Prolog as a language that is neither. It’s quite likely that J is in a similar boat, and is neither fish nor fowl. Again, it’s not about the implementation of the languages- to work on modern hardware, all languages must sooner or later be compiled down to some variant of the Von Neumann architecture. It’s the user interface.

    As for OO being capable of being pure, remember I’m an Ocaml programmer. Ocaml does include support for purely functional objects. Object Orientation isn’t Turing complete, and thus needs some additional language on top of it to actually do anything. OO on top of functional works quite well, actually.

  • http://james-iry.blogspot.com/2009/05/brief-incomplete-and-mostly-wrong.html James Iry

    Note that I am not arguing with your definition exactly. I’m arguing that there have been many commonly used definitions and that yours is one among many. In particular, if McCarthy’s Lisp was the first language to be called functional then the people who called it that were using a different definition from yours. McCarthy’s Lisp is by no means a conservative extension to LC. Same deal with Joy.

    And Sabry’s definition is also different. You use the phrase “based on” where he uses a broader phrase “includes.” I’ve even seen definitions that say something like “the STLC is macro expressible” (where macro expressible is defined by http://www.ccs.neu.edu/scheme/pubs/scp91-felleisen.ps.gz). That’s an extremely broad definition that includes Java as a functional language! But it’s a useful definition because it allows Joy into the fold.

    I’m not saying that any of these definitions is “right.” That’s silly. We’re talking about natural language where a definitions are malleable beasts. What I am saying is that the word “functional” (and related phrases “functional programming” and “functional programming language”) have had so many common and widely used definitions that calling any of them “right” absolutely is going run you into the wall of some other historical usage.

  • Aaron

    Doesn’t forcing functions to take only a single parameter limit you to seperable functions though? In math, not all multi-variable functions are reducible to seperable single variable functions.

  • Brian

    Aaron: No. Let’s say you have a non-seperable two-argument function, with say x and y being the arguments. You’d create this function as (fun x -> (fun y -> some expr using x and y)). If you apply a value, say z, to this function, the result is to take the body of the function- in this case, (fun y-> some expr), and in it replace all occurrences of x with z- even down into the body of this lambda expression. So the value of applying z to (fun x -> (fun y -> some expression using x and y)) is (fun y -> some expression using z and y). Or, more concretely, the result of applying the value 1 to the expression (fun x -> (fun y-> x + y)) is (fun y-> 1 + y). You can generalize this to how ever many arguments you want.

  • http://www.cs.tufts.edu/~nr/ Norman Ramsey

    How do algebraic data types fit into your picture? They are an important model of computation in many functional languages, but neither Church nor von Neumann would recognize them.

    Rod Burstall, phone your office!

  • http://enfranchisedmind.com/blog/posts/author/bhurt-aw/ Brian Hurt

    Algebraic data types are very nice to have, but not necessary to be a functional language (as witnessed by neither Lisp nor Scheme having them).

  • Pingback: Scala: Post-Functional, Post-Modern, or Just Perl++?

  • Bo

    Function application is left-associative.

    Therefore (f x y z) means (((f x) y) z).

    Therefore your church numeral are *all wrong*.

    For example, three is not (… s s s z) but (… s (s (s z))).

  • Mike Hat

    Does anyone think that we will soon see computing architectures that natively support functional “definitions” of what we want done?

    Won’t MHz yield to M-core when we finally get beyond the current crop of simplistic CPUs? Math will return to it’s rightful place atop the pyramid of knowledge and the way we express it.

  • Pingback: Stuff i'd like to share with developers/programmers - Digit Technology Discussion Forum - Tech Discussion Forums in India

  • Michael

    Great post! Certainly cleared up quite a few concepts for me; I haven’t delved deep into functional languages.

    One question though. I’m not sure I understood how integers are represented. Your example takes 1 and basically represents it as “the successor of zero”, which is a pretty natural thing to say considering how the set of integers can be constructed, but my question is, how do you then define/represent zero?

    This has me stumped – maybe the answer is obvious but I can’t see it.

    Cheers.

  • http://enfranchisedmind.com/blog/posts/author/bhurt-aw/ Brian Hurt

    Sorry for the delay in responding. Anyways, zero is just (fun s -> (fun z -> z)).