Thinking Functionally with Haskell (26 page)

BOOK: Thinking Functionally with Haskell
13.07Mb size Format: txt, pdf, ePub

= {definition of
revcat
}

reverse (x:xs) ++ ys

= {definition of
reverse
}

(reverse xs ++ [x]) ++ ys

= {associativity of
(++)
}

reverse xs ++ ([x] ++ ys)

= {definition of
(:)
}

reverse xs ++ (x:ys)

= {definition of
revcat
}

revcat xs (x:ys)

Hence

revcat [] ys
= ys

revcat (x:xs) ys = revcat xs (x:ys)

As to the running time,
T
(
revcat
)(
m
,
n
) = Θ(
m
). In particular,
T
(
reverse
(
n
) =
T
(
revcat
(
n
, 0) = Θ(
n
) That gives a linear-time computation for reversing a list.

Here is another example. The function
length
is defined by

length :: [a] -> Int

length []
= 0

length (x:xs) = length xs + 1

We have
T
(
length
)(
n
) = Θ(
n
), so there is no time advantage in calculating another definition. Nevertheless, define
lenplus
by

lenplus :: [a] -> Int -> Int

lenplus xs n = length xs + n

If we go through exactly the same calculation for
lenplus
as we did for
revcat
, we arrive at

lenplus [] n
= n

lenplus (x:xs) n = lenplus xs (1+n)

The reason the calculation goes through is that
(+)
, like
(++)
, is an associative operation. The advantage of defining
length xs = lenplus xs 0 = foldl (\n x -> 1+n) 0 xs

is that, by using
foldl'
in place of
foldl
, the length of a list can be computed in constant space. That indeed is how
length
is defined in Haskell’s prelude.

As the really astute reader might have spotted, there is actually no need to go through the calculations above. Both the examples are, in fact, instances of a law already described in the previous chapter, namely that
foldr (<>) e xs = foldl (@) e xs

for all finite lists
xs
provided

x <> (y @ z) = (x <> y) @ z

x <> e = e @ x

The two instances are:

foldr (\x n -> n+1) 0 xs = foldl (\n x -> 1+n) 0 xs

foldr (\x xs -> xs++[x]) [] xs

= foldl (\xs x -> [x]++xs) [] xs

We leave the detailed verification of these equations as an exercise.

For a final demonstration of the accumulating parameter technique we move from lists to trees. Consider the data declaration
data GenTree a = Node a [GenTree a]

An element of this type is a tree consisting of a node with a label and a list of subtrees. Such trees arise in problems that can be formulated in terms of positions and moves. The label of a node specifies the current position, and the number of subtrees corresponds to the number of possible moves in the current position. Each subtree has a label that specifies the result of making the move, and its subtrees describe the moves that can be made from the new position. And so on.

Here is a function for computing the list of labels in a tree:

labels :: GenTree a -> [a]

labels (Node x ts) = x:concat (map labels ts)

The method is simple enough: compute the labels of each subtree, concatenate the results, and stick the label of the tree at the front of the final list.

Let us analyse the running time of this program on a tree
t
. To keep things simple, suppose that
t
is a
perfect k
-ary tree of height
h
. What that means is that if
h
= 1 then
t
has no subtrees, while if
h
> 1 then
t
has exactly
k
subtrees, each with height
h
−1. The number
s
(
h
,
k
) of labels in such a tree satisfies
s
(1,
t
) = 1,

s
(
h
+1,
k
) = 1 +
ks
(
h
,
k
),

with solution
s
(
h
,
k
) = Θ(
k
h
). Now we have

T
(
labels
)(1,
k
) = Θ(1),

T
(
labels
)(
h
+1,
k
) = Θ(1) +
T
(
concat
)(
k
,
s
) +
T
(
map labels
)(
h
,
k
),

where
s
=
s
(
h
,
k
). The term
T
(
map labels
)(
h
,
k
) estimates the running time of applying
map labels
to a list of length
k
of trees all of height
h
. In general, given a list of length
k
consisting of elements each of size
n
, we have
T
(
map f
)(
k
,
n
) =
kT
(
f
)(
n
) +Θ(
k
).

Furthermore
T
(
concat
)(
k
,
s
) = Θ(
ks
) = Θ(
k
h
+1
). Hence
T
(
labels
)(
h
+1,
k
) = Θ(
k
h
+1
) +
kT
(
labels
)(
h
,
k
) since Θ(1) +Θ(
k
) = Θ(
k
). The solution is given by

T
(
labels
)(
h
,
k
) = Θ(
hk
h
) = Θ(
s
log
s
).

In words, computing the labels of a tree using the definition above takes time that is asymptotically greater than the size of the tree by a logarithmic factor.

Let us now see what an accumulating parameter can do. Define
labcat
by

labcat :: [GenTree a] -> [a] -> [a]

labcat ts xs = concat (map labels ts) ++ xs

As well as adding in a list
xs
we have also generalised the first argument from a tree to a list of trees. We have
labels t = labcat [t] []
, so any improvement on
labcat
leads to a corresponding improvement on
labels
.

We now synthesise an alternative definition for
labcat
. For the base case we obtain
labcat [] xs = xs

For the inductive case we reason:

labcat (Node x us:vs) xs

=
{definition}

concat (map labels (Node x us:vs)) ++ xs

=
{definitions}

labels (Node x us) ++ concat (map labels vs) ++ xs

=
{definition}

x:concat (map labels us) ++ concat (map labels vs) ++ xs

=
{definition of
labcat
}

x:concat (map labels us) ++ labcat vs xs

=
{definition of
labcat
(again)}

labcat us (labcat vs xs)

The result of this calculation is the following program for
labels
:

labels t = labcat [t] []

labcat [] xs
= xs

labcat (Node x us:vs) = x:labcat us (labcat vs xs)

For the timing analysis, let
T
(
labcat
)(
h
,
k
,
n
) estimate the running time of
labcat ts xs

when
ts
is a list of length
n
of trees, each of which is a perfect
k
-ary tree of height
h
(the size of
xs
is ignored since it doesn’t affect the estimate). Then
T
(
labcat
)(
h
,
k
, 0) = Θ(1),

T
(
labcat
)(1,
k
,
n
+1) = Θ(1) +
T
(
labcat
)(1,
k
,
n
)),
T
(
labcat
)(
h
+1,
k
,
n
+1) = Θ(1) +
T
(
labcat
)(
h
,
k
,
k
) +

T
(
labcat
)(
h
+1,
k
,
n
).

Solving the first two equations gives
T
(
labcat
)(1,
k
,
n
) = Θ(
n
). An induction argument now shows
T
(
labcat
)(
h
,
k
,
n
) = Θ(
k
h
n
). Hence
T
(
labels
)(
h
,
k
) =
T
(
labcat
)(
h
,
k
, 1) =Θ(
k
h
) = Θ(
s
).

That means we can compute the labels of a tree in time proportional to the size of the tree, a logarithmic improvement over our first version.

7.6 Tupling

We met the idea of tupling two functions in the discussion of the function
mean
. Tupling is sort of dual to the method of accumulating parameters: we generalise a function not by including an extra argument but by including an extra result.

The canonical example of the power of tupling is the Fibonacci function:

fib :: Int -> Integer

fib 0 = 0

fib 1 = 1

fib n = fib (n-1) + fib (n-2)

The time to evaluate
fib
by these three equations is given by

T
(
fib
)(0) = Θ(1),

T
(
fib
)(1) = Θ(1),

T
(
fib
)(
n
) =
T
(
fib
)(
n
−1) +
T
(
fib
)(
n
−2) +Θ(1).

The timing function therefore satisfies equations very like that of
fib
itself. In fact
T
(
fib
)(
n
) = Θ(
ϕ
n
), where
ϕ
is the golden ratio
ϕ
= (1 + √5)/2. That means that the running time to compute
fib
on an input
n
is exponential in
n
.

Now consider the function
fib2
defined by

fib2 n = (fib n,fib (n+1))

Clearly
fib n = fst (fib2 n)
. Synthesis of a direct recursive definition of
fib2
yields

fib2 0 = (0,1)

fib2 n = (b,a+b) where (a,b) = fib2 (n-1)

This program takes linear time. In this example the tupling strategy leads to a dramatic increase in efficiency, from exponential time to linear time.

It’s great fun to formulate general laws that encapsulate gains in efficiency. One such law concerns the computation of
(foldr f a xs, foldr g b xs)

As expressed above, the two applications of
foldr
involve two traversals of the list
xs
. There is a modest time advantage, and possibly a greater space advantage, in formulating a version that traverses the list only once. In fact
(foldr f a xs, foldr g b xs) = foldr h (a,b) xs

where

h x (y,z) = (f x y,g x z)

The result can be proved by induction and we leave details as an easy exercise.

As one more example, we again move from lists to trees. But this time we have a different kind of tree, a
leaf-labelled binary tree
:
data BinTree a = Leaf a | Fork (BinTree a) (BinTree a)

In contrast to a
GenTree
discussed above, a
BinTree
is either a leaf, with an associated label, or a fork of two subtrees.

Suppose we wanted to build such a tree with a given list as the labels. More precisely, we want to define a function
build
satisfying
labels (build xs) = xs

for all finite nonempty lists
xs
, where
labels
returns the labels of a binary tree:

labels :: BinTree a -> [a]

labels (Leaf x)
= [x]

labels (Fork u v) = labels u ++ labels v

We are attuned now to possible optimisations, and the definition of
labels
suggest that it could be improved with an accumulating parameter. So it can, but that is not our primary interest here, and we leave the optimisation as an exercise.

One way to build a tree is to arrange that half the list goes into the left subtree, and the other half into the right subtree:

build :: [a] -> BinTree a

build [x] = Leaf x

build xs
= Fork (build ys) (build zs)

where (ys,zs) = halve xs

The function
halve
made an appearance in
Section 4.8
:

halve xs = (take m xs,drop m xs)

where m = length xs `div` 2

Thus
halve
splits a list into two approximately equal halves. The definition of
halve
involves a traversal of the list to find its length, and two further (partial) traversals to compute the two components. It is therefore a prime candidate for applying the tupling strategy to get something better. But as with
labels
we are going to ignore that particular optimisation for now. And we are also going to
ignore the proof that this definition of
build
meets its specification. That’s three calculations we are leaving as exercises in order to concentrate on a fourth.

Other books

His Call by Emma Hart
Saved by the SEAL by Diana Gardin
Holding Out for a Hero by Stacey Joy Netzel
A History Maker by Alasdair Gray
Wisdom Keeper by Ilarion Merculieff