Folding in Haskell¶
(These notes are based in part on chapter 10 of Haskell Programming from First Principles, by Christopher Allen and Julie Mornouki.)
Folding is a general name for a family of related recursive patterns. The essential idea of folding is to take a list and reduce it to, for instance, a single number.
For example, to sum the list [1,2,3,4]
, we can evaluate it as 1 + (2 +
(3 + 4))
. Folding generalizes this pattern to work with lists of any type
and any folding function that makes sense.
Concrete Examples of Folding¶
To understand folding in general, lets look at a few concrete examples of functions that follow the fold pattern.
mysum lst
sums the numbers on a list:
> mysum :: [Int] -> Int
> mysum [] = 0 -- important: base case must be 1 for multiplication
> mysum (x:xs) = x + mysum xs
For example:
mysum [2,1,5,2] 10
myprod
calculates the products of numbers on the list, and is very
similar:
> myprod :: [Int] -> Int
> myprod [] = 1 -- important: base case must be 1 for multiplication
> myprod (x:xs) = x * myprod xs
For example:
> myprod [2,1,5,2]
20
Notice how similar the implementation of myprod
is to mysum
. The only
differences are the name, the value returned for the base case, and function
used in the recursive case.
Calculating the length a list is also an example of a fold:
> len :: [a] -> Int
> len [] = 0
> len (_:xs) = 1 + len xs -- _ is an anonymous variable, i.e. one we don't use
For example:
> len [2,1,5,2]
4
The standard Haskell function concat lol
concatenates all the list in
lol
, e.g.:
> concat ["one","two","three"]
"onetwothree"
So we can write our own version like this:
> myconcat :: [[a]] -> [a]
> myconcat [] = []
> myconcat (x:xs) = x ++ (myconcat xs)
For example:
> myconcat ["one","two","three"]
"onetwothree"
Right Folds¶
All the functions in the previous section have the general structure of a right-associative fold:
- A base case that returns a value such as
0
,1
, or[]
. The exact value depends upon the what makes sense for function. - A recursive case that takes the first element of the list and combines it with the rest of the folded list. The combining function always takes two inputs.
Here is the general right fold:
> myfoldr :: (a -> b -> b) -> b -> [a] -> b
> myfoldr _ init_val [] = init_val
> myfoldr f init_val (x:xs) = f x (myfoldr f init_val xs)
Take some time understand the type signature — it provides a lot of useful information!
Also note that Haskell already provides a standard version of this function
called foldr
.
myfoldr
lets us write single-line versions of all the functions in the
previous section:
> myfoldr (+) 0 [2,1,5,2] -- sum of a list
10
> myfoldr (*) 1 [2,1,5,2] -- product of a list
20
> myfoldr (\x y -> 1 + y) 0 [2,1,5,2] -- length of a list
4
> myfoldr (++) [] [[1,2], [3,4], [5]] -- concatenation of a list
[1,2,3,4,5]
myfoldr
is right associative, and so myfoldr (+) 0 [2,1,5,2]
is
evaluated like this:
2 + (1 + (5 + (2 + 0)))
The right-most +
is evaluated first, then the second to right-most +
,
and so on to the very first +
which is evaluated last.
A left associative fold is like a right fold, but the expression is
instead bracketed from left to right. Using the myfoldl
function defined
below, the expression myfoldl (+) 0 [2,1,5,2]
is evaluated like this:
(((0 + 2) + 1) + 5) + 2
Here’s a definition of myfoldl
:
> myfoldl :: (b -> a -> b) -> b -> [a] -> b
> myfoldl f acc [] = acc
> myfoldl f acc (x:xs) = myfoldl f (f acc x) xs
For some cases, there is no difference between using a left fold or a right
fold. For example, 2 + (1 + (5 + (2 + 0)))
and (((0 + 2) +
1) + 5) + 2
evaluate to the same thing because +
is associative, i.e.
\((a + b) + c = a + (b + c)\) is true for any numbers
\(a\), \(b\), and \(c\).
*
and ++
are also associative, and so in the examples above you can
use either myfoldr
with myfoldl
. However, (\x y -> 1 + y)
, the
combiner function used to get the length of a list, is not associative, and
so myfoldr
and myfoldl
give different results:
myfoldr (\x y -> 1 + y) 0 [2,1,5,2] -- length of the list
4
myfoldl (\x y -> 1 + y) 0 [2,1,5,2] -- oops: not the length of the list
3
The folds are different because (\x y -> 1 + y)
is not an associative
function. The easiest way to see this is to let f
equal (\x y -> 1 +
y)
, and then do these two calculations:
(f (f 1 2) 3) -- left associative = (f 3 3) = 4
(f 1 (f 2 3)) -- right associative
= (f 1 4)
= 5
Another difference between left and right folds is how they interact with
Haskells lazy evaluation. For example, left folds can work with infinite
lists in some cases because they fold starting at the beginning of the list
and move to the right. But right folds cannot work with infinite lists
because they start at the right end of the list; but infinite lists don’t have
a right end, and foldr
loops through the list forever looking for a final
element.
There is also a version of foldl
called foldl'
that is a more
efficient version of foldl
. In practice, the choice is usually between
foldr
and foldl'
.
Note
There is much more to folding that we have discussed here! We have just been folding lists, but it is possible to fold other structures, such as trees. In mathematics, folds are known as catamorphisms, and there is a body of theoretical work that defines exactly what properties are needed to do folding, and what you can calculate with them.
foldr and the Structure of Lists¶
A useful way of thinking about folding comes from examining the structure of a
list. In Haskell, the list [1,2,3]
can be written like this:
(1 : (2 : (3 : [])))
:
is the cons operator, i.e. if xs
is a list, then x:xs
creates
a new list that is the same as xs
except x
has been added to the
front.
myfoldr f init lst
can be thought of as replacing each :
in lst
with f
, and then replacing the []
in lst
with init
:
myfoldr f init lst
(1 `f` (2 `f` (3 `f` init))) -- infix style
(f 1 (f 2 (f 3 init))) -- prefix style
The notation `f`
is how Haskell lets you use a pre-fix function in an
infix way, i.e. a `f` b
is the same as f a b
.
The Type Signature for Folding¶
The type signatures for foldl
and foldr
are similar, but not quite the
same:
> :type foldr
foldr :: (a -> b -> b) -> b -> [a] -> b
> :type foldl
foldl :: (a -> b -> a) -> a -> [b] -> a
Look carefully at the three inputs foldr
takes:
:type foldr
foldr :: (a -> b -> b) -> b -> [a] -> b
- First input: a function of type
a -> b -> b
. This is the combiner function that takes an input of typea
, an input of typeb
, and then returns an output of typeb
. - Second input: An item of type
b
. This is the initial value of the fold, i.e. for summing this would be 0. Note that this is the same as the type of the second parameter of the first function. - Third input: A list of items of type
a
. This is the list being folded. Note thata
is the same type as the first input to the function passed tofoldr
. - Output: An item of type
b
. The list is reduced to a value of typeb
. Note that this is the same type as the initial value (the second input) passed tofoldr
.
A similarly careful analysis of the type signature for foldl
shows that it
is different. For instance, the initial value passed to foldl
s combiner
function is the first input, while for foldr
it was the second input.
Exercises¶
What does
foldr (:) [] "apple"
evaluate to?Suppose
snoc
is defined like this:-- snoc 5 [1,2,3] is [1,2,3,5] snoc :: a -> [a] -> [a] snoc x [] = [x] snoc x (y:ys) = y:(snoc x ys)
What does
foldr snoc [] "apple"
evaluate to?The standard Prelude function
const
is defined like this:const :: a -> b -> a const x y = x
- What is the type of
foldr const
? Remember, don’t worry aboutFoldable
: you should assume here that we are only folding lists. - What does
foldr const 'a' "mouse"
evaluate to?
- What is the type of
Suppose
f = foldr (-) 0
andg = foldl (-) 0
.- Do
f
andg
have the same type? - Find a value
x
such thatf x /= g x
. - Find a value
y
, with length greater than 1, such thatf y == g y
. - Find all values of
a
,b
, andc
such thatf [a,b,c] == g [a,b,c]
.
- Do
Answers¶
"apple"
"elppa"
Answers:
b -> [b] -> b
'm'
Answers:
Yes:
> :type foldr (-) 0 foldr (-) 0 :: (Num b, Foldable t) => t b -> b > :type foldl (-) 0 foldl (-) 0 :: (Num b, Foldable t) => t b -> b
There are many such values of
x
that will work, for example:> foldr (-) 0 [1,2] -1 > foldl (-) 0 [1,2] -3
There are many such values of
y
that will work, for example:> foldl (-) 0 [1,2,-1] -2 > foldr (-) 0 [1,2,-1] -2
To figure this out we will evaluate both expressions on their own. First,
f [a,b,c]
isa - (b - (c - 0))
, which simplifies toa - (b - c) = a - b + c
.Second,
g [a,b,c]
is((0 - a) - b) - c
, which simplifies to-a - b -c
.Since
f [a,b,c] == g [a,b,c]
,a - b + c == -a - b -c
. The-b
terms cancel out, so this simplifies toa + c == -a + -c
, ora + c == -(a + c)
. The only way this equation can be true is ifa + c == 0
, i.e. ifa == -c
.So
f [a,b,c] == g [a,b,c]
just whena == -c
, e.g. when the list has the form[-c,b,c]
. For example:> foldr (-) 0 [0,6,0] -6 > foldl (-) 0 [0,6,0] -6 > foldr (-) 0 [14,2,-14] -2 > foldl (-) 0 [14,2,-14] -2