tim.src

%	July 95: added Allyn Dimock's patches to include Gofer versions


H> module Tim where
H> import Utils
H> import Language

% $Date: 91/09/10 14:48:59 $
% $Revision: 1.9 $
% (c) 1991 Simon Peyton Jones & David Lester.
\chapter{TIM: the three instruction machine}
\label{sect:tim}

TIM\index{TIM},
the Three Instruction Machine, at first appears to be a very different
form of reduction machine from those we have seen so far.  Nevertheless,
it turns out that we can transform a G-machine into a TIM in
a series of relatively simple steps.
In this chapter we describe these steps, thereby showing how the TIM works,
define a complete minimal
TIM compiler and evaluator, and then develop a sequence of improvements and
optimisations to it.

TIM was invented by Fairbairn and Wray, and their original paper \cite{FW87}
is well worth reading.  It describes
TIM in a completely different way from the approach taken in this chapter.
The material developed in this chapter goes considerably beyond
Fairbairn and Wray's work, however, so the level of detail increases
in later sections where less well-known ideas are discussed and implemented.
Many of the new ideas presented are due to Guy Argo
and are presented in his FPCA paper
\cite{Argo89} and his Ph.D.\ thesis \cite{ArgoThesis}.
%Wakeling and Dix have also made a useful contribution
%\cite{WD89}.

\section{Background: How TIM works}

Consider the following function definition:
\begin{verbatim}
	f x y = g E1 E2
\end{verbatim}
where @E1@ and @E2@ are arbitrary (and perhaps complex) expressions, and
@g@ is some other function.
Both the template instantiation machine (Chapter~\ref{sect:template})
and the G-machine (Chapter~\ref{sect:g-machine}) will perform
the following reduction:
\begin{verbatim}
		@       reduces to      @
	       / \                     / \
	      @   y                   @   E2
	     / \                     / \
	    f   x                   g   E1
\end{verbatim}
The G-machine will take quite a few (simple) instructions to do this, whereas
the template machine does it in one (complicated) step, but the
net result is the same.

In this picture, @E1@ and @E2@ are the {\em graphs of\/}
the expressions @E1@ and @E2@.  For example, if @E1@ was @(x+y)*(x-y)@, the
first argument of @g@ would be a graph of @(x+y)*(x-y)@.  This graph has to
be laboriously built in the heap (by code generated by
the ${\cal C}$ compilation scheme).
Sadly this might be wasted work, because
@g@ might discard its first argument without using it.
We would like to find some way of limiting the amount of graph-building
done for arguments to functions.

\subsection{Flattening\index{flattening}}
\label{sect:tim:flatten}

Step 1 of our transformation does just this.  Suppose we replace the
definition of @f@ with the following new one:
\begin{verbatim}
	f x y = g (c1 x y) (c2 x y)
	c1 x y = E1
	c2 x y = E2
\end{verbatim}
We have invented two auxiliary functions, @c1@ and @c2@.
This definition is plainly equivalent to the old one, but
{\em no matter how large or complicated @E1@ is, the only work done during
the @f@ reduction is to build the graph of @(c1 x y)@}.

Better still, for a G-machine implementation, there is a
further benefit which we get automatically.
With the first definition, @E1@ would be compiled by the ${\cal C}$
scheme; no advantage can be taken of the optimisations present in the
${\cal E}$ scheme when compiling arithmetic expressions.
But with the second definition, the expression @E1@ is now the right-hand
side of a supercombinator, so all these optimisations apply.
We can evaluate @(x+y)*(x-y)@ much more efficiently in this way.

Of course, @E1@ and @E2@ might themselves contain large expressions
which will get compiled with the ${\cal C}$ scheme (for example, suppose
@E2@ was @(h E3 E4)@), so we must apply the
transformation again to the right-hand sides of @c1@ and @c2@.
The result is a {\em flattened\/} program, so-called because
no expression has a nested structure.

\subsection{Tupling\index{tupling}}

The next observation is that both @c1@ and @c2@ are applied to both @x@ and
@y@, so we have to construct the graphs of @(c1 x y)@ and @(c2 x y)@ before
calling @g@.  If @c1@ and @c2@ had lots of arguments, rather than just two,
the graphs could get quite big.
The two graphs are so similar to each other that
it is natural to ask whether these argument graphs
could share some common part to avoid
duplication, and thereby reduce heap allocation.
We can express this idea with a
second transformation:
\begin{verbatim}
	f x y = let tup = (x,y)
		in g (c1 tup) (c2 tup)
	c1 (x,y) = E1
	c2 (x,y) = E2
\end{verbatim}
The idea is that @f@ first packages up its arguments into a tuple, and
then passes this single tuple to @c1@ and @c2@.
With this definition of @f@, the @f@-reduction looks like this:
\begin{verbatim}
		@       reduces to      @
	       / \                     / \
	      @   y                   /   @
	     / \                     @   / \
	    f   x                   / \ c2  \
				   g   @     \
				      / \_____\
				     c1        \
						-----
						|  -|---> x
						-----
						|  -|---> y
						-----
\end{verbatim}

\subsection{Spinelessness\index{spinelessness}}

Looking at the previous picture, you can see that {\em
the arguments pointed to
by the spine are always of the form @(c tup)@}, for some supercombinator
@c@ and tuple @tup@.
During reduction, we build up a stack of pointers to these arguments.
But since they are now all of the same form, we could instead stack
the (root of) the arguments themselves!  So, after the @f@-reduction, the
stack would look like this:
\begin{verbatim}
	|       |       |
	|-------------- |
	|   c2  |   ----|---\
	|---------------|    \    ---------------
	|   c1  |   ------------> |     |       | x
	|---------------|         |-------------|
				  |     |       | y
				  ---------------
\end{verbatim}
Each item on the spine stack\index{spine stack} is now
a pair of a code pointer and a pointer to a tuple.
You can think of this pair as an application node, the code defining a function
which is being applied to the tuple.
On entry to @f@, the (roots of the) arguments @x@ and @y@ were on the stack,
so the tuple of @x@ and @y@ is actually a tuple of code pointer/tuple pointer
pairs.

A code pointer/tuple pointer pair is called a {\em closure\index{closure}},
and a tuple
of such closures is called a {\em frame\index{frame}}.
A pointer to a frame is called
a {\em frame pointer\index{frame pointer}}.
Notice that there is no spine in the heap any more; the stack {\em is\/} the
spine of the expression being evaluated.
TIM is a spineless machine.

\subsection{An example}
\label{sect:tim:compose-eg}

It is time for an example of how a TIM program might work.
Consider the function @compose2@, defined like this:
\begin{verbatim}
	compose2 f g x = f (g x x)
\end{verbatim}
The `flattened' form of @compose2@ would be
\begin{verbatim}
	compose2 f g x = f (c1 g x)
	c1 g x = g x x
\end{verbatim}

When @compose2@ is entered, its three arguments will be on top of the stack,
like this:
\begin{verbatim}
	|       |       |
	|---------------|
      x | x-code| x-frm |
	|---------------|
      g | g-code| g-frm |
	|---------------|
      f | f-code| f-frm |
	|---------------|
\end{verbatim}
The first thing to do is to form the tuple (frame)
of these three arguments in the
heap.  We can then remove them from the stack.  We will keep a pointer to the
new frame in a special register, called the {\em frame pointer}.
This is done by the instruction
\begin{verbatim}
	Take 3
\end{verbatim}
The state of the machine now looks like this:
\begin{verbatim}
	|       |       |
	|---------------|
					-----------------
  Frame ptr ------------------------> f | f-code| f-frm |
					|---------------|
				      g | g-code| g-frm |
					|---------------|
				      x | x-code| x-frm |
					-----------------
\end{verbatim}

Next, we have to prepare the arguments for @f@.  There is only one, namely
@(g x x)@, and we want to push a
closure for it onto the stack.  The frame pointer
for the closure is just the current frame pointer register, and so
the instruction need only supply a code label:
\begin{verbatim}
	Push (Label "c1")
\end{verbatim}

Finally, we want to jump to @f@.  Since @f@ is an argument to @compose@,
not a global supercombinator, @f@ is represented by a closure
in the current frame. What we must
do is fetch the closure,
load its frame pointer into the frame pointer register, and
its code pointer into the program counter.
This is done by the instruction:
\begin{verbatim}
	Enter (Arg 1)           -- f is argument 1
\end{verbatim}
After this instruction, the state of the machine is like this:
\begin{verbatim}
	|       |       |
	|---------------|               -----------------
	|   c1  |   ----|-----------> f | f-code| f-frm |
	|---------------|               |---------------|
				      g | g-code| g-frm |
  Frame ptr:   f-frm                    |---------------|
  Program ctr: f-code                 x | x-code| x-frm |
					-----------------
\end{verbatim}

That is it!  The main body of @compose2@ consists of just these three
instructions:
\begin{verbatim}
    compose2:   Take 3                  -- 3 arguments
		Push (Label "c1")       -- closure for (g x x)
		Enter (Arg 1)           -- f is argument 1
\end{verbatim}
We still need to deal with the label @c1@, though.
When the closure for @(g x x)@ is needed, it will be entered with the @Enter@
instruction, so that the program counter will point to @c1@, and the
frame pointer to the original frame containing @f@, @g@ and @x@.
At this point, all we need do is to prepare the argument for @g@, namely @x@,
and enter @g@:
\begin{verbatim}
	c1:     Push (Arg 3)            -- x is argument 3
		Push (Arg 3)            -- x again
		Enter (Arg 2)           -- g is argument 2
\end{verbatim}
The @Push (Arg 3)@ instruction fetches a copy of the closure for @x@ from the
current frame, and pushes it onto the stack.  Then the @Enter (Arg 2)@
instruction applies @g@ to the argument(s) now on the
stack\footnote{%
There might be more than just two if the stack was non-empty when the
@(g x x)@ closure was entered.}.

\subsection{Defining the machine with state transition rules}

You can see why it is called the Three Instruction Machine: there are
three dominant instructions: @Take@, @Push@ and @Enter@.  In some ways,
it is rather optimistic to claim that it has only three instructions,
because @Push@ and @Enter@ both have
several `addressing modes'\index{addressing modes} and,
furthermore, we will need to invent quite a few
brand new instructions in due course.  Still, it makes a nice name.

As usual, we use state transition rules\index{state transition rules}
to express the precise
effect of each instruction.
First of all we must define the {\em state\/} of the machine.  It is
a quintuple:
\[
\mbox{\em (instructions, frame pointer, stack, heap, code store)}
\]
or $(i,f,s,h,c)$ for short.
The code store is the only item which has not already been described.
It contains a collection
of pieces of code, each of which has a label.
In practice, the code store contains the compiled supercombinator definitions,
each labelled with the name of the supercombinator, though in principle it
could also contain other labelled code fragments if that proved useful.

We now develop the transition rules for each of the instructions.
$@Take@~ n$ forms the top $n$ elements of the stack into a new frame, and
makes the current frame pointer point to it.
\timrule{
    \timstate
	{@Take@~n:i}
	{f}
	{c_1:\ldots:c_n:s}
	{h} {c}
}{
    \timstate
	{i} {f'} {s}
	{h[f':\langle c_1,\ldots,c_n \rangle]}
	{c}
} \label{rule:take}

Now we come to the rules for @Push@ and @Enter@.  These two instructions
have just the same addressing modes\index{addressing modes}
(@Arg@, @Label@ and so on), and
there is a very definite relationship between them,
which we dignify with a formal statement:
\begin{important}
{\em The @Push@/@Enter@ relationship.}
\index{Push/Enter relationship@@@Push@/@Enter@ relationship}
If the instruction $@Push@~arg$ pushes a closure $(i,f)$ onto the stack,
then $@Enter@~arg$ will load $i$ into the program counter and $f$ into
the current frame pointer.
\end{important}
The instruction
$@Push@~(@Arg@~ n)$ fetches the $n$th closure from the current frame, and
pushes it onto the stack.
\timrule{
    \timstate
	{@Push@\ (@Arg@\ k):i}
	{f} {s}
	{h[f:\langle (i_1,f_1),\ldots,(i_k,f_k),\ldots,(i_n,f_n) \rangle]}
	{c}
}{
    \timstate
	{i} {f}
	{(i_k,f_k):s}
	{h} {c}
}
$@Push@~(@Label@~ l)$ looks up the label $l$ in the code store, and
pushes a closure consisting of this code pointer together with the
current frame pointer:
\timrule{
    \timstate
	{@Push@\ (@Label@\ l):i}
	{f} {s} {h} {c[l:i']}
}{
    \timstate
	{i} {f}
	{(i',f):s}
	{h} {c}
}

In the @compose@ example, we had to invent an arbitrary label @c1@.
It is a nuisance having to invent these labels, and instead we will
simply
add a new form for the push instruction, $@Push (Code@~i@)@$, which makes the
target code sequence $i$ part of the instruction itself.
Thus, instead of
\begin{verbatim}
	Push (Label "c1")
\end{verbatim}
we can write
\begin{verbatim}
	Push (Code [Push (Arg 3), Push (Arg 3), Enter (Arg 2)])
\end{verbatim}

Here is the appropriate state transition rule:
\timrule{
    \timstate
	{@Push@\ (@Code@\ i'):i}
	{f} {s} {h} {c}
}{
    \timstate
	{i} {f}
	{(i',f):s}
	{h} {c}
}
So far we have three `addressing modes'\index{addressing modes}:
@Arg@, @Code@, @Label@.  We need
to add one more, @IntConst@, for integer constants.  For example, the call
@(f 6)@ would compile to the code
\begin{verbatim}
	Push (IntConst 6)
	Enter (Label "f")
\end{verbatim}

The @Push@ instruction always pushes a closure (that is, a
code pointer/frame pointer pair)
onto the stack, but in the case of integer constants it is
not at all obvious what closure it should push.
Since we need somewhere to store
the integer\index{integer!representation in TIM}
itself, let us `steal' the frame pointer slot for that
purpose\footnote{%
We are making the implicit assumption
that an integer is no larger than a frame pointer, which
is usually true in practice.
}.  This decision leads to the following rule, where @intCode@ is the
(as yet undetermined) code sequence for integer closures:
\timrule
{\timstate{@Push (IntConst@~n@)@:i}{f}{s}{h}{c}}
{\timstate{i}{f}{(@intCode@,n):s}{h}{c}}
\label{rule:intconst}

What should @intCode@ do?  For the present our machine will do no arithmetic,
so an easy solution is to make @intCode@ the empty code sequence:

1> intCode = []

If an integer closure is ever entered, the machine will jump to the
empty code sequence, which will halt execution.  This will allow us
to write programs which return integers, which is enough for Mark 1.

So much for the @Push@ instruction.
The rules for the @Enter@ instruction, one for each addressing mode,
follow directly from the @Push@/@Enter@ relationship:
\timrule{
    \timstate
	{[@Enter@\ (@Label@\ l)]}
	{f} {s} {h} {c[l:i]}
}{
   \timstate
	{i} {f} {s} {h} {c}
}
\timrule{
   \timstate
	{[@Enter@\ (@Arg@\ k)]}
	{f} {s}
	{h[f:\langle (i_1,f_1),\ldots,(i_k,f_k),\ldots,(i_n,f_n) \rangle]}
	{c}
}{
   \timstate
	{i_k} {f_k} {s} {h} {c}
}
\timrule{
    \timstate
	{[@Enter@\ (@Code@\ i)]}
	{f} {s} {h} {c}
}{
    \timstate
	{i} {f}
	{s}
	{h} {c}
} \label{rule:tim:enter-code}
\timrule{
    \timstate
	{[@Enter@\ (@IntConst@\ n)]}
	{f} {s} {h} {c}
}{
    \timstate
	{@intCode@} {n}
	{s} {h} {c}
}

\subsection{Compilation}

We have now given a precise statement of what each TIM instruction does.
It remains to describe how to translate a program into TIM instructions.
This we do, as before, using a set of {\em compilation schemes}.
\index{compilation schemes}
Each supercombinator is compiled with the \tSC{} scheme, which is given
in Figure~\ref{fig:tim:schemes}.
The initial environment passed into \tSC{} binds each supercombinator name
to a @Label@ addressing mode for it.
The \tSC{} scheme just produces a
@Take@ instruction and invokes the \tR{} scheme, passing it an environment
augmented by bindings which say what addressing mode to use for each argument.
\begin{figure*}
$\begin{array}{|l|}
\hline
\\
\parbox{29pc}{
$\SC{def}~\rho$ is the TIM code for the supercombinator definition $def$,
in the environment $\rho$}
\\
\\
\begin{array}{rcll}
\SC{f\ x_1\ \ldots\ x_n\ @=@\ e}~\rho & = & @Take@\ n\ @:@\
		\R{e}\ \rho[x_1 \mapsto @Arg@~1,\ldots,x_n \mapsto @Arg@~n] &
\end{array} \\
\\
\hline
\\
%
\parbox[t]{29pc}{
$\R{e}~\rho$ is TIM code which applies the value of the expression
$e$ in environment $\rho$ to the arguments on the stack.
}
\\
\\
\begin{array}{rcll}
\R{e_1~e_2}~\rho & = & @Push@~(\A{e_2}~\rho)~@:@~\R{e_1}~\rho & \\
%
\R{a}~\rho       & = & @Enter@~(\A{a}~\rho) &
				  \mbox{where $a$ is an integer, variable,} \\
		 &   &          & \mbox{or supercombinator}
\end{array} \\
\\
\hline
\\
\parbox[t]{29pc}{
$\A{e}~\rho$ is a TIM addressing mode for expression $e$
in environment $\rho$.}
\\
\\
\begin{array}{rcll}
\A{x}~\rho      & = & \rho~x &
			\mbox{where $x$ is bound by $\rho$}     \\
%
\A{n}~\rho      & = & @IntConst@~ n &
			\mbox{where $n$ is an integer}  \\
%
\A{e}~\rho      & = & @Code@~(\R{e}~\rho) & \mbox{otherwise}
\end{array} \\
\\
\hline
\end{array}$
\caption{The \tSC{}, \tR{} and \tA{} compilation schemes}
\label{fig:tim:schemes}
\end{figure*}

The \tR{} scheme (Figure~\ref{fig:tim:schemes})
simply pushes arguments onto the stack until it finds
a variable or supercombinator, which it enters.
It uses the \tA{} scheme to generate the
correct addressing mode.  Notice the way that the flattening\index{flattening}
process
described in Section~\ref{sect:tim:flatten} is carried out `on the fly'
by these rules.

For the present, we omit arithmetic, data structures, case
analysis and @let(rec)@ expressions.
They will all be added later.

\subsection{Updating}

So far there has been no mention of updating\index{updates}.
That is because, now that
the spine has vanished, there are no spine nodes to update!
Indeed, the machine as so far described is a tree-reduction machine.
Shared arguments may be evaluated repeatedly.
Doing updates properly is the Achilles' heel of
spineless implementations.  It is utterly necessary, because otherwise
an unbounded amount of work could be duplicated,
yet it adds complexity
which loses some of the elegance and speed (duplication aside)
of the non-updating version.

We will return to updating later in Section~\ref{sect:tim-updates},
but meanwhile it is enough to implement the non-updating version.

\section{Mark 1: A minimal TIM}
\label{minimal-tim}
\index{TIM!Mark 1}

In this section we will develop a minimal, but complete, TIM implementation,
without arithmetic, data structures or updates.  These will be added
in subsequent sections.

\subsection{Overall structure}

The structure is much the same as for the template instantiation
interpreter.
The @run@ function is the composition of four functions, @parse@, @compile@,
@eval@ and @showResults@, just as before.  The type of @parse@ is given in
Chapter~\ref{sect:language}; the types for the other three are given below:

M> run         :: [char] -> [char]
M> compile     :: coreProgram -> timState
M> eval        :: timState -> [timState]
M> showResults :: [timState] -> [char]
GH> runProg     :: [Char] -> [Char]
GH> compile     :: CoreProgram -> TimState
GH> eval        :: TimState -> [TimState]
GH> showResults :: [TimState] -> [Char]
>
M> run = showResults . eval . compile . parse
GH> runProg = showResults . eval . compile . parse

\par
It is often convenient to see all the intermediate states, so we
also provide @fullRun@, which uses @showFullResults@ to show each state:

M> fullRun :: [char] -> [char]
GH> fullRun :: [Char] -> [Char]
> fullRun = showFullResults . eval . compile . parse

We need to import the language module:

M> %include "language"
G> -- :a language.lhs  -- parser data types

\subsection{Data type definitions}

The data type for TIM instructions corresponds directly to the instructions
introduced so far.

M1> instruction ::= Take num
M1>                 | Enter timAMode
M1>                 | Push timAMode
GH1> data Instruction = Take Int
GH1>                  | Enter TimAMode
GH1>                  | Push TimAMode

The type of addressing modes, @timAMode@,
is separated out as a distinct data type to stress the
relationship between @Push@ and @Enter@.

M1-4> timAMode ::= Arg num
M1-4>              | Label [char]
M1-4>              | Code [instruction]
M1-4>              | IntConst num
GH1-4> data TimAMode = Arg Int
GH1-4>               | Label [Char]
GH1-4>               | Code [Instruction]
GH1-4>               | IntConst Int

The state of the TIM machine is given by the following definition:

M1-4> timState == ([instruction],        || The current instruction stream
M1-4>              framePtr,             || Address of current frame
M1-4>              timStack,             || Stack of arguments
M1-4>              timValueStack,        || Value stack (not used yet)
M1-4>              timDump,              || Dump (not used yet)
M1-4>              timHeap,              || Heap of frames
M1-4>              codeStore,            || Labelled blocks of code
M1-4>              timStats)             || Statistics
GH1-4> type TimState = ([Instruction],        -- The current instruction stream
GH1-4>                  FramePtr,             -- Address of current frame
GH1-4>                  TimStack,             -- Stack of arguments
GH1-4>                  TimValueStack,        -- Value stack (not used yet)
GH1-4>                  TimDump,              -- Dump (not used yet)
GH1-4>                  TimHeap,              -- Heap of frames
GH1-4>                  CodeStore,            -- Labelled blocks of code
GH1-4>                  TimStats)             -- Statistics

The value stack\index{value stack} and dump\index{dump}
are only required later on in this chapter, but it
is more convenient to add placeholders for them right away.

We consider the representation for each of these components in turn.
\begin{itemize}
\item
The {\em current instruction stream\/} is represented by a list of instructions.
In a real machine this would be the program counter together with
the program memory.

\item
The {\em frame pointer\/}\index{frame pointer} is usually the
address of a frame in the heap, but there are two other possibilities:
it might be used to hold an integer value, or it might be
uninitialised.  The machine always `knows' which of these three
possibilities to expect, but it is
convenient in our implementation to distinguish them by using an
algebraic data type for @framePtr@:

M> framePtr ::= FrameAddr addr           || The address of a frame
M>              | FrameInt num           || An integer value
M>              | FrameNull              || Uninitialised
GH> data FramePtr = FrameAddr Addr         -- The address of a frame
GH>               | FrameInt Int           -- An integer value
GH>               | FrameNull              -- Uninitialised

If we do not do this, Miranda will (legitimately) complain
of a type error when we try to use an address as a number.
Furthermore, having a constructor for the uninitialised state @FrameNull@
means that our interpreter will discover if we ever mistakenly try to use
an uninitialised value as a valid address.

\item
The {\em stack\index{stack!in TIM\/}} contains {\em closures}, each of which is a
pair containing a code pointer and a frame pointer.
We represent the stack as a list.

M> timStack == [closure]
GH> type TimStack = [Closure]
M> closure == ([instruction], framePtr)
GH> type Closure = ([Instruction], FramePtr)

\item
The {\em value stack\/}\index{value stack!in TIM}
and {\em dump\/}\index{dump!in TIM} are not used at all to begin with,
so we represent each of them with a dummy algebraic data type which has
just one nullary constructor.  Later we will replace these definitions
with more interesting ones.

M1> timValueStack ::= DummyTimValueStack
GH1> data TimValueStack = DummyTimValueStack
M1-3> timDump ::= DummyTimDump
GH1-3> data TimDump = DummyTimDump

\item
The {\em heap\/}\index{heap} contains
{\em frames}, each of which is a tuple of closures.
The data type of frames is important enough to merit
an abstract data type of its own.

M> timHeap == heap frame
GH> type TimHeap = Heap Frame
>
M> abstype frame
M> with fAlloc   :: timHeap -> [closure] -> (timHeap, framePtr)
M>      fGet     :: timHeap -> framePtr -> num -> closure
M>      fUpdate  :: timHeap -> framePtr -> num -> closure -> timHeap
M>      fList    :: frame -> [closure]           || Used when printing
GH> fAlloc   :: TimHeap -> [Closure] -> (TimHeap, FramePtr)
GH> fGet     :: TimHeap -> FramePtr -> Int -> Closure
GH> fUpdate  :: TimHeap -> FramePtr -> Int -> Closure -> TimHeap
GH> fList    :: Frame -> [Closure]           -- Used when printing

These operations allow frames to be built, and components to
be extracted and updated.
The first element of the list given to @fAlloc@ is numbered @1@ for
the purposes of @fGet@ and @fUpdate@.
Here is a simple implementation based on lists.

M> frame == [closure]
GH> type Frame = [Closure]
>
> fAlloc heap xs = (heap', FrameAddr addr)
>                  where
>                  (heap', addr) = hAlloc heap xs
>
M> fGet heap (FrameAddr addr) n = f ! (n-1)      || Miranda's ! operator
M>                                               || uses zero indexing
GH> fGet heap (FrameAddr addr) n = f !! (n-1)
>                                where
>                                f = hLookup heap addr
>
> fUpdate heap (FrameAddr addr) n closure
>  = hUpdate heap addr new_frame
>    where
>    frame = hLookup heap addr
>    new_frame = take (n-1) frame ++ [closure] ++ drop n frame
>
> fList f = f

\item
For each label, the {\em code store\/} gives the corresponding compiled code:

M1-5> codeStore == assoc name [instruction]
GH1-5> type CodeStore = ASSOC Name [Instruction]

We take the opportunity to provide a lookup function for
labels, which generates an error message if it fails:

M1-5> codeLookup :: codeStore -> name -> [instruction]
GH1-5> codeLookup :: CodeStore -> Name -> [Instruction]
1-5> codeLookup cstore l
1-5>  = aLookup cstore l (error ("Attempt to jump to unknown label "
1-5>                             ++ show l))

\item
As usual, we make the {\em statistics\/} into an abstract data type which we
can add to easily:

M> abstype timStats
M> with  statInitial  :: timStats
M>       statIncSteps :: timStats -> timStats
M>       statGetSteps :: timStats -> num
GH> statInitial  :: TimStats
GH> statIncSteps :: TimStats -> TimStats
GH> statGetSteps :: TimStats -> Int

\end{itemize}
The first implementation, which counts only the number of steps,
is rather simple:

M> timStats == num               || The number of steps
GH> type TimStats = Int           -- The number of steps
> statInitial = 0
> statIncSteps s = s+1
> statGetSteps s = s

Finally, we need the code for heaps and stacks:

M> %include "utils"
GH> -- :a util.lhs -- heap data type and other library functions

\subsection{Compiling a program}
\label{sect:tim:compiler}

@compile@ works very much like the template instantiation compiler,
creating an initial machine state from the program it is given.
The main difference lies in the compilation function @compileSC@ which
is applied to each supercombinator.

1-4> compile program
M1-4>     = ([Enter (Label "main")],     || Initial instructions
M1-4>        FrameNull,                  || Null frame pointer
M1-4>        initialArgStack,            || Argument stack
M1-4>        initialValueStack,          || Value stack
M1-4>        initialDump,                || Dump
M1-4>        hInitial,                   || Empty heap
M1-4>        compiled_code,              || Compiled code for supercombinators
M1-4>        statInitial)                || Initial statistics
GH1-4>     = ([Enter (Label "main")],     -- Initial instructions
GH1-4>        FrameNull,                  -- Null frame pointer
GH1-4>        initialArgStack,            -- Argument stack
GH1-4>        initialValueStack,          -- Value stack
GH1-4>        initialDump,                -- Dump
GH1-4>        hInitial,                   -- Empty heap
GH1-4>        compiled_code,              -- Compiled code for supercombinators
GH1-4>        statInitial)                -- Initial statistics
1-4>        where
1-4>        sc_defs          = preludeDefs ++ program
1-4>        compiled_sc_defs = map (compileSC initial_env) sc_defs
1-4>        compiled_code    = compiled_sc_defs ++ compiledPrimitives
1-4>        initial_env = [(name, Label name) | (name, args, body) <- sc_defs]
1-4>			  ++ [(name, Label name) | (name, code) <- compiledPrimitives]

For the moment, the argument stack is initialised to be empty.

1> initialArgStack = []

For now the value stack and dump are initialised to their dummy values. Later
we will change these definitions.

1> initialValueStack = DummyTimValueStack
1-3> initialDump = DummyTimDump

\par
\sloppy
The compiled supercombinators, @compiled_sc_defs@, is obtained by compiling
each of the supercombinators in the program, using @compileSC@.
The initial environment passed to @compileSC@ gives a suitable addressing
mode for each supercombinator.
The code store, @compiled_code@, is obtained by combining @compiled_sc_defs@
with @compiledPrimitives@.  The latter is intended to contain compiled
code for built-in primitives, but it is empty for the present:

> compiledPrimitives = []

Unlike the template machine and the G-machine, the initial heap is empty.
The reason for a non-empty initial heap in those cases was to retain sharing
for CAFs\index{CAF} (that is, supercombinators with no arguments
-- Section~\ref{sect:caf}).
In this initial version of the TIM machine, the compiled TIM code for a CAF
will be executed each time it is called, so the work
of evaluating the CAF is not shared.  We will address this problem much later,
in Section~\ref{sect:tim:caf}.

The heart of the compiler is a direct translation of the compilation
schemes \tSC{}, \tR{} and \tA{} into the functions
@compileSC@, @compileR@ and @compileA@ respectively.
The environment, $\rho$, is represented by an association list binding
names to addressing modes.  The G-machine compiler used a mapping from
names to stack offsets, but the extra flexibility of using
addressing modes turns out to be rather useful.

M> timCompilerEnv == [(name, timAMode)]
GH> type TimCompilerEnv = [(Name, TimAMode)]

\par
Now we are ready to define @compileSC@:

M> compileSC :: timCompilerEnv -> coreScDefn -> (name, [instruction])
GH> compileSC :: TimCompilerEnv -> CoreScDefn -> (Name, [Instruction])
1-2> compileSC env (name, args, body)
M1-2>  = (name, Take (#args) : instructions)
GH1-2>  = (name, Take (length args) : instructions)
1-2>     where
1-2>     instructions = compileR body new_env
1-2>     new_env = (zip2 args (map Arg [1..])) ++ env

@compileR@ takes an expression and an environment, and delivers a list
of instructions:

M1-2> compileR :: coreExpr -> timCompilerEnv -> [instruction]
GH1-2> compileR :: CoreExpr -> TimCompilerEnv -> [Instruction]
1> compileR (EAp e1 e2) env = Push (compileA e2 env) : compileR e1 env
1> compileR (EVar v)    env = [Enter (compileA (EVar v) env)]
1> compileR (ENum n)    env = [Enter (compileA (ENum n) env)]
1> compileR e           env = error "compileR: can't do this yet"

M1-2> compileA :: coreExpr -> timCompilerEnv -> timAMode
GH1-2> compileA :: CoreExpr -> TimCompilerEnv -> TimAMode
1-2> compileA (EVar v) env = aLookup env v (error ("Unknown variable " ++ v))
1-2> compileA (ENum n) env = IntConst n
1-2> compileA e        env = Code (compileR e env)

\subsection{The evaluator}

Next we need to define how the evaluator actually works.
The definition of
@eval@ is exactly as for the template instantiation machine:

> eval state
M> = state : rest_states  where
M>                        rest_states = [],               timFinal state
M>                                    = eval next_state,  otherwise
M>                        next_state  = doAdmin (step state)
GH>  = state : rest_states  where
GH>                         rest_states | timFinal state = []
GH>                                     | otherwise      = eval next_state
GH>                         next_state  = doAdmin (step state)
>
> doAdmin state = applyToStats statIncSteps state

The @timFinal@ function says when a state is a final state.
We could invent a @Stop@ instruction, but it
is just as easy
to say that we have finished when the code sequence is empty:

1-4> timFinal ([], frame, stack, vstack, dump, heap, cstore, stats) = True
1-4> timFinal state                                                 = False

The @applyToStats@ function just applies a function to the
statistics component of the state:

1-4> applyToStats stats_fun (instr, frame, stack, vstack,
1-4>                         dump, heap, cstore, stats)
1-4>  = (instr, frame, stack, vstack, dump, heap, cstore, stats_fun stats)


\subsubsection{Taking a step}

@step@ does the case analysis which takes a single instruction and
executes it. The @Take@ equation is a
straightforward transliteration of the corresponding state transition rule
(\ref{rule:take}):

1> step ((Take n:instr), fptr, stack, vstack, dump, heap, cstore,stats)
M1>  = (instr, fptr', drop n stack, vstack, dump, heap', cstore, stats), #stack >= n
GH1>  | length stack >= n = (instr, fptr', drop n stack, vstack, dump, heap', cstore, stats)
M1>  = error "Too few args for Take instruction",			       otherwise
GH1>  | otherwise         = error "Too few args for Take instruction"
1>    where (heap', fptr') = fAlloc heap (take n stack)

The equations for @Enter@ and @Push@ take advantage of the @Push@/@Enter@
relationship\index{Push/Enter relationship@@@Push@/@Enter@ relationship}
by using a common function @amToClosure@ which converts
a @timAMode@ to a closure:

1> step ([Enter am], fptr, stack, vstack, dump, heap, cstore, stats)
1>  = (instr', fptr', stack, vstack, dump, heap, cstore, stats)
1>    where (instr',fptr') = amToClosure am fptr heap cstore

1> step ((Push am:instr), fptr, stack, vstack, dump, heap, cstore, stats)
1>  = (instr, fptr, amToClosure am fptr heap cstore : stack,
1>     vstack, dump, heap, cstore, stats)

@amToClosure@ delivers the closure addressed by the addressing mode
which is its first argument:

M1-4> amToClosure :: timAMode -> framePtr -> timHeap -> codeStore -> closure
GH1-4> amToClosure :: TimAMode -> FramePtr -> TimHeap -> CodeStore -> Closure
1-4> amToClosure (Arg n)      fptr heap cstore = fGet heap fptr n
1-4> amToClosure (Code il)    fptr heap cstore = (il, fptr)
1-4> amToClosure (Label l)    fptr heap cstore = (codeLookup cstore l, fptr)
1-4> amToClosure (IntConst n) fptr heap cstore = (intCode, FrameInt n)

\subsection{Printing the results}

As with the template instantiation version we need a rather boring
collection of functions to print the results in a sensible way.
It is often useful to print out the supercombinator definitions, so
@showResults@ begins by doing so, using the definitions in the first
state:

> showFullResults states
>  = iDisplay (iConcat [
>        iStr "Supercombinator definitions", iNewline, iNewline,
>        showSCDefns first_state, iNewline, iNewline,
>        iStr "State transitions", iNewline,
>        iLayn (map showState states), iNewline, iNewline,
>        showStats (last states)
>    ])
>    where
>    (first_state:rest_states) = states

@showResults@ just shows the last state and some statistics:

1-4> showResults states
1-4>  = iDisplay (iConcat [
1-4>     showState last_state, iNewline, iNewline, showStats last_state
1-4>    ])
1-4>    where last_state = last states

\par
The rest of the functions are straightforward.  @showSCDefns@ displays
the code for each supercombinator.

M> showSCDefns :: timState -> iseq
GH> showSCDefns :: TimState -> Iseq
1-4> showSCDefns (instr, fptr, stack, vstack, dump, heap, cstore, stats)
1-4>  = iInterleave iNewline (map showSC cstore)

M> showSC :: (name, [instruction]) -> iseq
GH> showSC :: (Name, [Instruction]) -> Iseq
> showSC (name, il)
>  = iConcat [
>        iStr "Code for ", iStr name, iStr ":", iNewline,
>        iStr "   ", showInstructions Full il, iNewline, iNewline
>    ]

@showState@ displays a TIM machine state.

M> showState :: timState -> iseq
GH> showState :: TimState -> Iseq
1-4> showState (instr, fptr, stack, vstack, dump, heap, cstore, stats)
1-4>  = iConcat [
1-4>     iStr "Code:  ", showInstructions Terse instr, iNewline,
1-4>     showFrame heap fptr,
1-4>     showStack stack,
1-4>     showValueStack vstack,
1-4>     showDump dump,
1-4>     iNewline
1-4>    ]

@showFrame@ shows the frame component of a state, using @showClosure@
to display each of the closures inside it.

M> showFrame :: timHeap -> framePtr -> iseq
GH> showFrame :: TimHeap -> FramePtr -> Iseq
M> showFrame heap FrameNull = iStr "Null frame ptr" $iAppend iNewline
GH> showFrame heap FrameNull = iStr "Null frame ptr" `iAppend` iNewline
> showFrame heap (FrameAddr addr)
>  = iConcat [
>        iStr "Frame: <",
>        iIndent (iInterleave iNewline
>                             (map showClosure (fList (hLookup heap addr)))),
>        iStr ">", iNewline
>    ]
> showFrame heap (FrameInt n)
>  = iConcat [ iStr "Frame ptr (int): ", iNum n, iNewline ]

@showStack@ displays the argument stack, using @showClosure@ to display each
closure.

M> showStack :: timStack -> iseq
GH> showStack :: TimStack -> Iseq
> showStack stack
>  = iConcat [   iStr "Arg stack: [",
>                iIndent (iInterleave iNewline (map showClosure stack)),
>                iStr "]", iNewline
>    ]

\par
For the present, @showValueStack@ and @showDump@, which display the value
stack and dump, are stubs for now, because we are not using these components
of the state.

M> showValueStack :: timValueStack -> iseq
GH> showValueStack :: TimValueStack -> Iseq
1> showValueStack vstack = iNil

M> showDump :: timDump -> iseq
GH> showDump :: TimDump -> Iseq
1-3> showDump dump = iNil

@showClosure@ displays a closure, using @showFramePtr@ to display
the frame pointer.

M> showClosure :: closure -> iseq
GH> showClosure :: Closure -> Iseq
> showClosure (i,f)
>  = iConcat [   iStr "(",  showInstructions Terse i,  iStr ", ",
>                showFramePtr f,  iStr ")"
>    ]

M> showFramePtr :: framePtr -> iseq
GH> showFramePtr :: FramePtr -> Iseq
> showFramePtr FrameNull = iStr "null"
> showFramePtr (FrameAddr a) = iStr (show a)
M> showFramePtr (FrameInt n) = iStr "int " $iAppend iNum n
GH> showFramePtr (FrameInt n) = iStr "int " `iAppend` iNum n

@showStats@ is responsible for printing out accumulated statistics:

M> showStats :: timState -> iseq
GH> showStats :: TimState -> Iseq
1-4> showStats (instr, fptr, stack, vstack, dump, heap, code, stats)
1-4>  = iConcat [ iStr "Steps taken = ", iNum (statGetSteps stats), iNewline,
1-4>              iStr "No of frames allocated = ", iNum (hSize heap),
1-4>              iNewline
1-4>    ]

\subsubsection{Printing instructions}

We are going to need to print instructions and instruction sequences.
If a sequence of instructions is printed as one long line, it is
rather hard to read, so it is worth writing some code to pretty-print
them.

In fact we want to be able to print either the entire
code for an instruction sequence (for example when printing a supercombinator
definition), or just some abbreviated form of it.
An example of the latter occurs when printing the contents of the stack;
it can be helpful to see some part of the code in each closure, but we do not
want to see it all!  Accordingly, we give an extra argument, @d@, to each
function to tell it how fully to print.
The value of this argument is either
@Full@, @Terse@ or @None@.

M> howMuchToPrint ::= Full | Terse | None
GH> data HowMuchToPrint = Full | Terse | None

\par
@showInstructions@ turns a list of instructions into an @iseq@.
When @d@ is @None@, only an ellipsis is printed.
If @d@ is @Terse@, the instructions are printed all on one line, and
nested instructions are printed with @d@ as @None@.
If @d@ is @Full@, the instructions are laid out one per line, and printed
in full.

M> showInstructions :: howMuchToPrint -> [instruction] -> iseq
GH> showInstructions :: HowMuchToPrint -> [Instruction] -> Iseq
> showInstructions None il = iStr "{..}"
> showInstructions Terse il
>  = iConcat [iStr "{", iIndent (iInterleave (iStr ", ") body), iStr "}"]
>    where
>       instrs = map (showInstruction None) il
M>       body = instrs,                   #il <= nTerse
M>            = (take nTerse instrs)
M>              ++ [iStr ".."],           otherwise
GH>       body | length il <= nTerse = instrs
GH>            | otherwise           = (take nTerse instrs) ++ [iStr ".."]
> showInstructions Full il
>  = iConcat [iStr "{ ", iIndent (iInterleave sep instrs), iStr " }"]
>    where
M>    sep = iStr "," $iAppend iNewline
GH>    sep = iStr "," `iAppend` iNewline
>    instrs = map (showInstruction Full) il

@showInstruction@ turns a single instruction into an @iseq@.

M1> showInstruction d (Take m)  = (iStr "Take ")  $iAppend (iNum m)
GH1> showInstruction d (Take m)  = (iStr "Take ")  `iAppend` (iNum m)
M1> showInstruction d (Enter x) = (iStr "Enter ") $iAppend (showArg d x)
GH1> showInstruction d (Enter x) = (iStr "Enter ") `iAppend` (showArg d x)
M1> showInstruction d (Push x)  = (iStr "Push ")  $iAppend (showArg d x)
GH1> showInstruction d (Push x)  = (iStr "Push ")  `iAppend` (showArg d x)

M1-4> showArg d (Arg m)      = (iStr "Arg ")   $iAppend (iNum m)
GH1-4> showArg d (Arg m)      = (iStr "Arg ")   `iAppend` (iNum m)
M1-4> showArg d (Code il)    = (iStr "Code ")  $iAppend (showInstructions d il)
GH1-4> showArg d (Code il)    = (iStr "Code ")  `iAppend` (showInstructions d il)
M1-4> showArg d (Label s)    = (iStr "Label ") $iAppend (iStr s)
GH1-4> showArg d (Label s)    = (iStr "Label ") `iAppend` (iStr s)
M1-4> showArg d (IntConst n) = (iStr "IntConst ") $iAppend (iNum n)
GH1-4> showArg d (IntConst n) = (iStr "IntConst ") `iAppend` (iNum n)

@nTerse@ says how many instructions of a sequence should be
printed in terse form.

> nTerse = 3

%\subsection{Exercises}

\begin{exercise}
Run the machine using the following definition of @main@:
\begin{verbatim}
	main = S K K 4
\end{verbatim}
Since @S K K@ is the identity function, @main@ should reduce to @4@,
which halts the machine.  Experiment with making it a little more elaborate;
for example
\begin{verbatim}
	id = S K K ;
	id1 = id id ;
	main = id1 4
\end{verbatim}
\end{exercise}

\begin{exercise}
Add more performance instrumentation.
For example:
\begin{itemize}
\item
Measure execution time, counting one time unit for each instruction
except @Take@, for which you should count as many time units as the frame
has elements.
\item
Measure the
the heap usage, printing the total amount of heap allocated in a run.
Take account of the size of the frames,
so that you can compare your results directly with those from the
template instantiation version.
\item
Measure the maximum stack depth.
\end{itemize}
\end{exercise}

\begin{exercise} \label{ex:tim:take}
If $n=0$, then $@Take@~n$ does nothing useful.  Adapt the definition
of @compileSC@ to spot this optimisation by omitting the @Take@ instruction
altogether for CAFs\index{CAF}.
\end{exercise}

\subsection{Garbage collection\index{garbage collection}\advanced}
\label{sect:tim:gc1}

Like any heap-based system, TIM requires a garbage collector, but it
also requires one with a little added sophistication.
As usual, the garbage collector finds all the live data by starting from
the machine state; that is, from the stack and the frame pointer.
Each closure on the stack points to a frame, which must clearly be retained.
But that frame in turn contains pointers to further frames, and so on.
The question arises: {\em given a particular frame, which frame
pointers within it should be recursively followed?}

The safe answer is
`follow all of them', but this risks retaining far more data than
required.  For example, the closure for @(g x x)@ in the @compose2@ example of
Section~\ref{sect:tim:compose-eg} has a pointer to a frame containing
@f@, @g@ and @x@, but it only requires the closures for @g@ and @x@.
A naive garbage collector might follow the frame pointer from @f@'s closure
as well, thus retaining data unnecessarily.
This unwanted retention is called a \stress{space leak},
and can cause garbage collection to occur much more frequently than would
otherwise be the case.

However, this particular space leak is straightforward, if rather
tedious, to eliminate.  Each closure consists of a code pointer paired with
a frame pointer.  The code `knows' which frame elements it is going to
need, and this information
can be recorded with the code, for the garbage collector
to examine.  For example, what we have been calling a `code pointer' could
actually point to a pair, consisting
of a list of slot numbers used by the code, and
the code itself.  (In a real implementation the list might be encoded as a
bit-mask.)  How can the list of useful slots be derived?  It is simple:
just find the free variables of the expression being compiled, and use the
environment to map them into slot numbers.

\section{Mark 2: Adding arithmetic}
\index{TIM!Mark 2}

In this section we will add arithmetic\index{arithmetic!in TIM}
to our machine.

\subsection{Overview: how arithmetic works}
\label{sect:tim:arith-overview}

The original Fairbairn and Wray TIM machine had a rather devious scheme
for doing arithmetic.  Their main motivation was to keep the machine
{\em minimal}, but their approach is quite hard to understand
and requires considerable massaging to give an efficient implementation.

Instead, we will modify the TIM in a way exactly analogous to the
V-stack\index{V-stack}
of the G-machine (Section~\ref{sect:v-stack}).
We modify the state by introducing a {\em value stack\index{value stack!in TIM}},
which is a stack of (evaluated, unboxed) integers.
We extend the instruction set with a family of instructions $@Op@~op$ which
perform the arithmetic operation $op$
on the top elements of the value stack, leaving the
result on top of the value stack.
For example, the @Op Sub@ instruction removes the top two elements of the
value stack, subtracts them and pushes the result onto the value stack:
\timruleV
{\timstateV{@Op Sub@:i} {f} {s} {n_1:n_2:v} {h} {c}}
{\timstateV{i}       {f} {s} {(n_1-n_2):v} {h} {c}}

It is easy to define a complete family of arithmetic instructions,
@Op Add@, @Op Sub@, @Op Mult@, @Op Div@, @Op Neg@ and so on,
in this way.

Now consider the following function @sub@:
\begin{verbatim}
	sub a b = a - b
\end{verbatim}
What code should we generate for @sub@?
It has to take the following steps:
\begin{enumerate}
\item
The usual @Take 2@ to form its arguments into a frame.
\item
Evaluate @b@, putting its value on the value stack.
\item
Evaluate @a@, doing likewise.
\item
Subtract the value of @b@ from the value of @a@, using the @Op Sub@ instruction,
which leaves its result on top of the value stack.
\item
`Return' to the `caller'.
\end{enumerate}

We will consider the evaluation of @a@ and @b@ first.  They are represented by
closures, held in the current frame, and the
only thing we can do to a closure is
to enter it.  So presumably to evaluate @a@ we must enter the closure for @a@,
but what does it mean to enter an integer-valued closure?  So far we have
only entered {\em functions}, and integers are not functions.
Here is the key idea:
\begin{important}
{\em Integer invariant\/}\index{integer!invariant (in TIM)}:
when an integer-valued closure is entered, it computes the value of the
integer, pushes it onto the value stack, and enters the top closure on the
argument stack.
\end{important}
The closure on top of the argument stack is called the \stress{continuation},
because it says what to do next, once the evaluation of the integer is
complete.
The continuation consists of an instruction sequence, saying what to
do when evaluation of the integer is complete,
and the current frame pointer (in case it was disturbed by
the evaluation of the integer).
In other words, the continuation is a perfectly
ordinary closure.

So the code for @sub@ looks like this:
\begin{verbatim}
sub:    Take 2
	Push (Label L1)         -- Push the continuation
	Enter (Arg 2)           -- Evaluate b

L1:     Push (Label L2)         -- Push another continuation
	Enter (Arg 1)           -- Evaluate a

L2:     Op Sub                  -- Compute a-b on value stack
	Return
\end{verbatim}
What should the @Return@ instruction do?
Since the
value returned by @sub@ is an integer, and after the @Op Sub@ instruction this
integer is on top of the value stack, all @Return@ has to do is to pop
the closure on top of the argument stack and enter it:
\timruleV
{\timstateV{[@Return@]}{f}{(i',f'):s}{v}{h}{c}}
{\timstateV{i'}{f'}{s}{v}{h}{c}}
\label{rule:tim-return}
We have used labels to write the code
for @sub@.  This is not the only way to do it; an alternative is
to use the @Push Code@ instruction, which avoids the tiresome
necessity of inventing new labels.  In this style the code for @sub@ becomes:
\begin{verbatim}
sub:    Take 2
	Push (Code [    Push (Code [Op Sub, Return]),
			Enter (Arg 1)
	     ])
	Enter (Arg 2)
\end{verbatim}
Written like this, it is less easy to see what is going on
than by using labels, so we will continue to use labels in the exposition
where it makes code fragments easier to understand, but we will use the
@Push Code@ version in the compiler.

Now we must return to the question of integer constants.  Consider the
expression @(sub 4 2)@.  It will compile to the code
\begin{verbatim}
	Push (IntConst 2)
	Push (IntConst 4)
	Enter (Label "sub")
\end{verbatim}
The code for @sub@ will soon enter the closure @(IntConst 2)@, which will
place the integer @2@ in the frame pointer and jump to @intCode@.
Currently, @intCode@ is the empty code sequence (so that the machine stops
if we ever enter an integer), but we need to change that.
What should @intCode@ now do?  The answer is given by the integer invariant:
it must push the integer onto the value stack and return, thus:

2-> intCode = [PushV FramePtr, Return]

\par
@PushV FramePtr@ is a new instruction which pushes the number currently
masquerading as the frame pointer onto the top of the value stack:
\timruleV
{\timstateV{@PushV FramePtr@:i}{n}{s}{v}{h}{c}}
{\timstateV{i}{n}{s}{n:v}{h}{c}}

\subsection{Adding simple arithmetic to the implementation}

Now we are ready to modify our implementation.
We keep the modifications to a minimum by adding code for each of the
arithmetic functions to @compiledPrimitives@.  Recall that when we write
(for example)
@p-q@ in a program, the parser converts it to
\begin{verbatim}
	EAp (EAp (EVar "-") (EVar "p")) (EVar "q")
\end{verbatim}
All we need do is
to work out some suitable code for the primitive
@-@, and add this code to the code store.
The compiler can then treat @-@ in the same way as any other supercombinator.
Finally, the code for @-@ that we want
is exactly that which we developed in the
previous section for @sub@, and similar code is easy to write for other
arithmetic operations.

So the steps required are as follows:
\begin{itemize}
\item
Add the following type definition and initialisation for the value stack:

M2-> timValueStack == [num]
GH2-> type TimValueStack = [Int]
2-> initialValueStack = []

\item
Add the new instructions @PushV@, @Return@ and @Op@
to the @instruction@ data type.  We take the opportunity to add
one further instruction,
@Cond@, which has not yet been discussed but is the subject
of a later exercise.
TIM is no longer a three instruction machine!

M2> instruction ::= Take num
M2>                  | Push timAMode
M2>                  | PushV valueAMode
M2>                  | Enter timAMode
M2>                  | Return
M2>                  | Op op
M2>                  | Cond [instruction] [instruction]
GH2> data Instruction = Take Int
GH2>                  | Push TimAMode
GH2>                  | PushV ValueAMode
GH2>                  | Enter TimAMode
GH2>                  | Return
GH2>                  | Op Op
GH2>                  | Cond [Instruction] [Instruction]
M2>
M2-> op ::= Add  | Sub | Mult | Div | Neg
M2->        | Gr | GrEq | Lt | LtEq | Eq | NotEq
GH2-> data Op = Add  | Sub | Mult | Div | Neg
GH2->         | Gr | GrEq | Lt | LtEq | Eq | NotEq
H2->       deriving (Eq) -- KH

So far the argument of a @PushV@ instruction can only be @FramePtr@,
but we will shortly add a second form which allows us to push literal
constants onto the value stack.  So it is worth declaring an algebraic
data type for @valueAMode@:

M2-> valueAMode ::= FramePtr
M2->                | IntVConst num
GH2-> data ValueAMode = FramePtr
GH2->                 | IntVConst Int

The @showInstruction@ function must be altered to deal with this additional
structure.

\item
Modify the @step@ function to implement the extra instructions.  This is
just a question of translating the state transition rules into Miranda.

\item
Add to @compiledPrimitives@ suitable definitions for @+@, @-@ and so on.

\item
Now that @intCode@ is no longer empty, we must initialise
the stack to have
a suitable continuation (return address) for @main@ to return to.
The way to do this is to make @compile@ initialise the stack with the
closure @([],FrameNull)@, by redefining @initialArgStack@:

2-> initialArgStack = [([], FrameNull)]

This continuation has an empty code sequence, so
the machine will now halt with the result on top of the value stack.
\end{itemize}

\begin{exercise}
Implement these changes on your prototype.  Try it out on some simple
examples; for example
\begin{verbatim}
	four = 2 * 2
	main = four + four
\end{verbatim}
\end{exercise}

\begin{exercise}
We still cannot execute `interesting' programs, because we do not yet
have a conditional\index{conditional!in TIM},
and without a conditional we cannot use recursion.
A simple solution is to add a new instruction @Cond i1 i2@, which
removes a value from the top of the value stack, checks whether it was zero and
if so continues with instruction sequence @i1@, otherwise continues with
@i2@.  Here are its state transition rules:
\begin{etimruleV}
\arule  {\timstateV{[@Cond@~i_1~i_2]}{f}{s}{0:v}{h}{c}}
	{\timstateV{i_1}{f}{s}{v}{h}{c}}
\\ \hline
\arule  {\timstateV{[@Cond@~i_1~i_2]}{f}{s}{n:v}{h}{c}}
	{\timstateV{i_2}{f}{s}{v}{h}{c}} \\
& \multicolumn{6}{l|}{\mbox{where $n\not= 0$}}
\end{etimruleV}
The first rule matches if zero is on top of the value stack; otherwise the
second rule applies.

You also need to add a primitive @if@, which
behaves as follows:
\begin{verbatim}
	if 0 t f = t
	if n t f = f
\end{verbatim}
You need to work out the TIM code for @if@, using the @Cond@ instruction,
and add it to @compiledPrimitives@.  Finally, you can test your improved
system with the factorial function:
\sloppy
\begin{verbatim}
	factorial n = if n 1 (n * factorial (n-1))
	main = factorial 3
\end{verbatim}
\end{exercise}

\subsection{Compilation schemes for arithmetic}
\label{sect:tim:better-arith}

Just as with the G-machine, we can do a much better job of compiling for
our machine than we are doing at present.  Consider a function such as
\begin{verbatim}
	f x y z = (x+y) * z
\end{verbatim}
As things stand, this will get parsed to
\begin{verbatim}
	f x y z = * (+ x y) z
\end{verbatim}
and code for @f@ will get compiled which will call the standard functions
@*@ and @+@.  But we could do much better than this!  Instead of building
a closure for @(+ x y)@ and passing it to @*@, for example, we can just do
the operations in-line, using the following steps:
\begin{enumerate}
\item
evaluate @x@
\item
evaluate @y@
\item
add them
\item
evaluate @z@
\item
multiply
\item
return
\end{enumerate}
No closures need be built and no jumps need occur (except those needed to
evaluate @x@, @y@ and @z@).

To express this improvement, we introduce a new compilation scheme to deal
with expressions whose value is an integer,
the \tB{} scheme.  It is defined like this: for any expression
$e$ whose value is an integer, and for any code sequence $cont$,
\begin{important}
$(\B{e}~\rho~cont)$ is a code sequence which, when executed with a current
frame laid out as described by $\rho$, will push the
value of the expression $e$ onto the value stack, and then execute the code
sequence $cont$.
\end{important}
The compilation scheme uses a \stress{continuation-passing style}, in which
the $cont$ argument says what to do after the value has been computed.
Figure~\ref{fig:tim:arith2} gives the \tB{} compilation scheme,
together with the revised
\tR{} and \tA{} schemes.
When \tR{} finds an expression which is an arithmetic expression it calls
\tB{} to compile it.  \tB{} has special cases for constants and
applications of
arithmetic operators, which avoid explicitly pushing the continuation.
If it encounters an expression which it cannot handle specially, it just
pushes the continuation and calls \tR{}.
\begin{figure*}
$\begin{array}{|l|}
\hline
\\
%
\parbox[t]{29pc}{
$\R{e}~\rho$ is TIM code which applies the value of the expression
$e$ in environment $\rho$ to the arguments on the stack.
}
\\
\\
\begin{array}{rcll}
\R{e}~\rho              & = & \B{e}~\rho~ [@Return@] &
	\parbox[t]{2.0in}
		{where $e$ is an arithmetic expression, such as $e_1+e_2$,
		 or a number}   \\
\R{e_1~e_2}~\rho        & = & @Push@~(\A{e_2}~\rho)~@:@~\R{e_1}~\rho & \\
\R{a}~\rho      & = & @Enter@~(\A{a}~\rho) &
	\parbox[t]{2.0in}{where $a$ is a variable, or supercombinator}
\end{array} \\

\\
\hline
\\

\parbox[t]{29pc}{
$\A{e}~\rho$ is a TIM addressing mode for expression $e$
in environment $\rho$.}
\\
\\
\begin{array}{rcll}
\A{x}~\rho      & = & \rho~x &
		\mbox{where $x$ is bound by $\rho$}     \\
\A{n}~ \rho     & = & @IntConst@~ n     &
		\mbox{where $n$ is an integer constant} \\
\A{e}~\rho      & = & @Code@~(\R{e}~\rho) &
		\mbox{otherwise}
\end{array} \\

\\
\hline
\\

\parbox{29pc}{
$\B{e}~\rho~cont$ is TIM code which evaluates $e$ in environment $\rho$, and
puts its value, which should be an integer, on top of the value stack,
and then continues with the code sequence $cont$.}
\\
\\
\begin{array}{rcll}
\B{e_1 ~@+@~ e_2}~\rho~ cont
		& = & \B{e_2}~\rho~ (\B{e_1}~\rho~ (@Op Add@ ~:~ cont)) &  \\
\multicolumn{4}{l}{\qquad \mbox{\em \ldots and similar rules
		for other arithmetic primitives}} \\
\\
\B{n}~\rho~cont & = & @PushV@~(@IntVConst@~n) ~:~ cont
		& \mbox{where $n$ is a number}  \\
\B{e}~\rho~cont & = & @Push@~(@Code@~cont) ~:~ \R{e}~\rho
		& \mbox{otherwise}
\end{array} \\
\\
\hline
\end{array}$
\caption{Revised compilation schemes for arithmetic}
\label{fig:tim:arith2}
\end{figure*}

There is one new instruction required, which is used when \tB{} is asked
to compile a constant.  Then we need an instruction
$@PushV@~ (@IntVConst@~n)$ to push an integer
constant on the value stack.  Its transition rule is quite simple:
\timruleV
{\timstateV{@PushV@~(@IntVConst@~n):i}{f}{s}{v}{h}{c}}
{\timstateV{i}{f}{s}{n:v}{h}{c}}

\begin{exercise}
Implement the improved compilation scheme.
Compare the performance of your implementation
with that from before.
\end{exercise}

\begin{exercise}
\label{ex:tim:cond1}
Add a new rule to the \tR{} scheme to match a (full) application of
@if@.  You should be able to generate much better code than you get
by calling the @if@ primitive.  Implement the change and measure the
improvement in performance.
\end{exercise}

\begin{exercise}
\label{ex:tim:cond2}
Suppose we want to generalise our conditionals to deal with more general
arithmetic comparisons, such as that required by
\begin{verbatim}
	fib n = if (n < 2) 1 (fib (n-1) + fib (n-2))
\end{verbatim}
What is required is a new instruction @Op Lt@ which pops the top two items on
the value stack, compares them, and pushes @1@ or @0@ onto the value stack
depending on the result of the comparison.
Now the @Cond@ instruction can inspect this result.

Implement a family of such comparison instructions, and add special cases
for them to
the \tB{} scheme, in exactly the same way as for the other
arithmetic operators.  Test your improvement.
\end{exercise}

\begin{exercise}
In the previous exercise, you may have wondered why we did not modify the
@Cond@ instruction so that it had an extra `comparison mode'.  It could
then compare the top two items on the value stack according to this mode,
and act accordingly.  Why did we not do this?

Hint: what would happen for programs like this?
\begin{verbatim}
	multipleof3 x = ((x / 3) * 3) == x
	f y = if (multipleof3 y) 0 1
\end{verbatim}
\end{exercise}
%
% Once the $\tB{} scheme have been implemented, it looks as
% though we might no longer need the hand-generated code for
% code for @+@, @*@, (and other arithmetic operators)
% in @compiledPrimitives@.  But we still do
%
% rather implausible-looking definitions in @preludeDefs@:
% \begin{verbatim}
%       + a b = a + b
%       * a b = a * b
% \end{verbatim}
% They look as if they are defining @+@ in terms of @+@ and @*@ in terms of
% @*@.
% But the compiler will not generate any reference to @+@ when
% compiling the right-hand-side of the first definition, because the
% right-hand-side matches a special case in the \tB{} scheme.
% Indeed, it will thereby generate precisely the code which we have generated
% by hand in @compiledPrimitives@.
% As Lennart Augustsson put it, in a comment in the LML compiler source code:
% `The ice is thin here --- but it works'.
%
The material of this section is discussed in \cite{Argo89}
and corresponds precisely to the improved G-machine
compilation schemes discussed
in Chapter 20 of \cite{PJBook}.

\section{Mark 3: @let(rec)@ expressions}
\index{TIM!Mark 3}

At present the compiler cannot handle
@let(rec)@ expressions\index{let(rec) expressions@@@let(rec)@ expressions},
a problem which
we remedy in this section.  Two main new ideas are introduced:
\begin{itemize}
\item
We modify the @Take@ instruction to allocate a frame with extra space
to contain
the @let(rec)@-bound variables, as well as the formal parameters.
\item
We introduce the idea of an {\em indirection closure}.
\end{itemize}

\subsection{@let@ expressions}
\index{let expressions@@@let@ expressions!in TIM}

When we compile a @let@ expression, we must generate code
to build new closures for the right-hand sides of the
definitions.
Where should these new closures be put?
In order to treat @let(rec)@-bound names in the same way as argument
names, they have to be put in the current frame\footnote{%
A hint of the material in this section is in \cite{WD89}, but it
is not fully worked out.}.
This requires two modifications to the run-time machinery:
\begin{itemize}
\item
The @Take@ instruction should allocate a frame large enough to contain
closures for all the @let@ definitions which can occur during the
execution of the supercombinator.
The @Take@ instruction must be modified to the form $@Take@~t~n$, where
$t\geq n$.
This instruction allocates a frame of size $t$, takes $n$ closures from the
top of the stack,
and puts them into the first $n$ locations of the frame.
\item
We need a new instruction, $@Move@~i~a$,  for moving a new closure
$a$ into slot $i$ of the current frame.
Here $a$ is of type @timAMode@ as for @Push@ and @Enter@.
\end{itemize}

For example, the following definition:
\begin{verbatim}
	f x = let y = f 3 in g x y
\end{verbatim}
would compile to this code:
\begin{verbatim}
	[ Take 2 1,
	  Move 2 (Code [Push (IntConst 3), Enter (Label "f")]),
	  Push (Arg 2),
	  Push (Arg 1),
	  Enter (Label "g")
	]
\end{verbatim}
Here is a slightly more elaborate example:
\begin{verbatim}
	f x = let y = f 3
	      in
	      g (let z = 4 in h z) y
\end{verbatim}
which generates the code:
\begin{verbatim}
	[ Take 3 1,
	  Move 2 (Code [Push (IntConst 3), Enter (Label "f")]),
	  Push (Arg 2),
	  Push (Code [Move 3 (IntConst 4), Push (Arg 3), Enter (Label "h")]),
	  Enter (Label "g")
	]
\end{verbatim}
Notice the way that the initial @Take@ allocates space for {\em all\/} the
slots required by any of the closures in the body of the supercombinator.

\begin{exercise}
Write state transition rules for the new @Take@ and @Move@ instructions.
\end{exercise}

Next, we need to modify the compiler to generate the new @Take@ and @Move@
instructions.
When we encounter a @let@ expression we need to assign a free slot in the
frame to each bound variable, so we
need to keep track of which slots in the
frame are in use and which are free.
To do this, we add an extra parameter $d$ to each compilation scheme,
to record that the frame slots from $d+1$ onwards are free, but that
the slots from $1$ to $d$ might be occupied.

The remaining complication is that we need to discover the maximum value
that $d$ can take, so that we can allocate a big enough frame with the
initial @Take@ instruction.  This requires each compilation scheme to return
a pair: the compiled code, and the maximum value taken by $d$.
The new compilation schemes are given in Figure~\ref{fig:tim:let-schemes}.
(In this figure, and subsequently,
we use the notation $is_1 \plusplus is_2$ to denote the
concatenation of the instruction sequences $is_1$ and $is_2$.)
\begin{figure*}
$\begin{array}{|l|}
\hline
\\
\parbox{29pc}{
$\SC{def}~\rho$ is the TIM code for the supercombinator definition $def$
compiled in environment $\rho$.} \\
\\
\begin{array}{lcl}
\SC{f~ x_1~ \ldots~ x_n~ =~ e}~\rho & = & @Take@~ d'~ n~ @:@~ is        \\
&& \mbox{where}~ (d', is) = \R{e}~
	\rho[x_1 \mapsto @Arg@~1,\ldots,x_n \mapsto @Arg@~n]~ n
\end{array} \\

\\
\hline
\\

\parbox[t]{29pc}{
$\R{e}~\rho~d$ is a pair $(d', is)$, where $is$ is
TIM code which applies the value of the expression
$e$ in environment $\rho$ to the arguments on the stack.
The code $is$ assumes that the first $d$ slots of the frame are occupied, and
it uses slots $(d+1 \ldots d')$.}
\\
\\
\begin{array}{rcl}
\\
\R{e}~\rho~d              & = & \B{e}~\rho~d~[@Return@] \\
	&& \mbox{where $e$ is an arithmetic expression or a number}     \\
\\
\multicolumn{3}{l}{
\R{@let@~ x_1@=@e_1@;@~\ldots@;@~ x_n@=@e_n ~@in@~ e}~ \rho~ d} \\
 & = &
	(d',~ [@Move@~ (d+1)~ am_1,  \ldots,
		~ @Move@~ (d+n)~ am_n] ~\plusplus~ is)  \\
	&& \mbox{where}~ \begin{array}[t]{lcl}
		(d_1, am_1) & = & \A{e_1}~ \rho~ (d+n)  \\
		(d_2, am_2) & = & \A{e_2}~ \rho~ d_1    \\
		\ldots &&                               \\
		(d_n, am_n) & = & \A{e_n}~ \rho~ d_{n-1}\\
		\rho' & = &
	\rho[x_1 \mapsto @Arg@~(d+1), \ldots, x_n \mapsto @Arg@~(d+n)]  \\
		(d', is) & = & \R{e}~ \rho'~ d_n
	       \end{array}      \\

\\
\R{e_1~ e_2}~ \rho~ d   & = & (d_2, ~@Push@~ am ~@:@~ is)       \\
	&& \mbox{where}~\begin{array}[t]{lcl}
			(d_1, am) & = & \A{e_2}~ \rho~ d        \\
			(d_2, is) & = & \R{e_1}~ \rho~ d_1
		      \end{array}                       \\
\\
\R{a}~ \rho~ d  & = & (d',~ [@Enter@~ am])                      \\
	&& \mbox{where $a$ is a constant, supercombinator or local variable} \\
	&& \mbox{and}~(d', am) = \A{a}~ \rho~ d
\end{array} \\

\\
\hline
\\

\parbox[t]{29pc}{
$\A{e}~\rho~ d$ is a pair $(d', am)$, where $am$ is a TIM
addressing mode for expression $e$
in environment $\rho$.
The code assumes that the first $d$ slots of the frame are occupied, and
it uses slots $(d+1 \ldots d')$.
}
\\
\\
\begin{array}{rcll}
\A{x}~ \rho~ d  & = & (d,~ \rho~ x) &
			\mbox{where $x$ is bound by $\rho$}     \\
%
\A{n}~ \rho~ d  & = & (d,~ @IntConst@~ n)       &
		\mbox{where $n$ is an integer constant} \\
%
\A{e}~ \rho~ d  & = & (d',~ @Code@~ is) & \mbox{otherwise}      \\
		&& \mbox{where}~ (d', is) = \R{e}~ \rho~ d
\end{array} \\
\\
\hline
\end{array}$
\caption{Compilation schemes for @let@ expressions}
\label{fig:tim:let-schemes}
\end{figure*}

In the \tSC{} scheme you can see how the maximum frame size $d'$,
returned from the compilation of the supercombinator body, is used to
decide how large a @Take@ to perform.
In the @let@ expression case of the \tR{} scheme,
for each definition we generate
an instruction $@Move@~ i~ a$,
where $i$ is the number of a free slot in the current frame,
and $a$ is the result of compiling $e$ with the \tA{} scheme.
Notice the way in which the compilation of each right-hand side
is given the index of the last slot occupied by the previous right-hand side,
thus ensuring that all the right-hand sides use different slots.

\begin{exercise}
Implement the changes described in this section: add the
new instructions to the @instruction@ type, add new cases to @step@
and @showInstruction@ to handle them,
and implement the new compilation schemes.
\end{exercise}

\begin{exercise}
Consider the program
\begin{verbatim}
	f x y z = let p = x+y in p+x+y+z
	main = f 1 2 3
\end{verbatim}
In the absence of @let@ expressions, it would have to be written using an
auxiliary function, like this:
\begin{verbatim}
	f' p x y z = p+x+y+z
	f x y z = f' (x+y) x y z
	main = f 1 2 3
\end{verbatim}
Compare the code generated by these two programs, and measure the difference
in store consumed and steps executed.  What is the main saving obtained
by implementing @let@ expressions directly?
\end{exercise}

\subsection{@letrec@ expressions}
\label{sect:tim:letrec-expr}
\index{letrec expressions@@@letrec@ expressions!in TIM}

What needs to be done to handle @letrec@ expressions as well?
At first it seems very easy: the @letrec@ case for the \tR{} scheme is
exactly the same as the @let@ case, except that we need to replace
$\rho$ by $\rho'$ in the definitions of the $am_i$.
This is because the $x_i$ are in scope in their own right-hand sides.

\begin{exercise}
Implement this extra case in @compileR@, and try it out on the program
\begin{verbatim}
	f x = letrec p = if (x==0) 1 q ;
		     q = if (x==0) p 2
	      in p+q
	main = f 1
\end{verbatim}
Make sure you understand the code which is generated, and test it.
\end{exercise}

Unfortunately there is a subtle bug in this implementation!
Consider the code generated from:
\begin{verbatim}
	f x = letrec a = b ;
		     b = x
	      in a
\end{verbatim}
which is as follows:
\begin{verbatim}
	[Take 3 1, Move 2 (Arg 3), Move 3 (Arg 1), Enter (Arg 2)]
\end{verbatim}
The closure for @b@ is copied by the first @Move@ before it is
assigned by the second @Move@!

There are two ways out of this.  The first is to declare that
this is a silly program; just replace @b@ by @x@ in the scope of the
binding for @b@.
But there is a more interesting approach which will be instructive later,
and which allows even silly programs like the one above to work.
Suppose we generate instead the following code for the first @Move@:
\begin{verbatim}
	Move 2 (Code [Enter (Arg 3)])
\end{verbatim}
Now everything will be fine: slot 3 will be assigned before the
@Enter (Arg 3)@ gets executed.
In fact you can think of the closure @([Enter (Arg 3)], f)@ as
an \stress{indirection} to slot @3@ of frame @f@.

This code can be obtained by modifying the @let(rec)@ case of
the \tR{} compilation scheme, so that it records an {\em indirection
addressing mode\/} in the environment for each variable
bound by the @let(rec)@.
Referring to Figure~\ref{fig:tim:let-schemes}, the change required is to
the rule for @let@ in the \tR{} scheme, where the definition of $\rho'$
becomes
\[
\rho' =
\begin{array}[t]{l}
\rho[x_1 \mapsto \I{d+1}, \ldots, x_n \mapsto \I{d+n}] \\
\mbox{where}~ \I{d} = @Code@~[@Enter@~(@Arg@~d)]
\end{array}
\]
Of course, this is rather conservative: it returns an indirection in
lots of cases where it is not necessary to do so, but the resulting code
will still work fine, albeit less efficiently.

\begin{exercise}
\label{ex:tim:letrec-bug}
Modify @compileR@ to implement this idea, and
check that it generates correct code for the above example.
In modifying @compileR@ use the auxiliary function @mkIndMode@
(corresponding to the \tI{} scheme)
to generate the indirection addressing modes in the new environment:

M> mkIndMode :: num -> timAMode
GH> mkIndMode :: Int -> TimAMode
> mkIndMode n = Code [Enter (Arg n)]

\end{exercise}

\subsection{Reusing frame slots\index{reusing frame slots}\advanced}

At present, every definition on the right-hand side of a supercombinator
definition gets its own private slot in the frame.
Sometimes you may be able to figure out that you can safely share
slots between different @let(rec)@s.
For example, consider the definition
\begin{verbatim}
	f x = if x (let ... in ...) (let ... in ...)
\end{verbatim}
Now it is plain that only one of the two @let@ expressions can ever be
evaluated, so it would be perfectly safe to use the same slots for their
definitions.

Similarly, in the expression $e_1~ @+@~ e_2$, any @let(rec)@ slots used
during the evaluation of $e_1$ will be finished with by the time $e_2$
is evaluated (or vice versa if @+@ happened to evaluate its arguments in
the reverse order), so any @let(rec)@-bound variables in $e_1$ can share
slots with those in $e_2$.

\begin{exercise}
Modify your compiler to spot this and take advantage of it.
\end{exercise}

\subsection{Garbage collection\index{garbage collection}\advanced}
\label{sect:tim:gc2}

In Section~\ref{sect:tim:gc1} we remarked that it would be desirable to
record which frame slots were used by a code sequence, so that space leaks
can be avoided.
If @Take@ does not initialise the extra frame slots which it allocates,
there is a danger that the garbage collector will treat the contents
of these uninitialised slots as valid pointers, with unpredictable results.
The easiest solution is to initialise all slots, but this is quite expensive.
It is better to adopt the solution of Section~\ref{sect:tim:gc1}, and
record with each code sequence the list of slots which should be retained.
Uninitialised slots will then never be looked at by the garbage collector.

\section{Mark 4: Updating}\index{updates!in TIM}
\label{sect:tim-updates}
\index{TIM!Mark 4}

So far we have been performing tree reduction not graph reduction,
because we repeatedly evaluate shared redexes.
It is time to fix this.
Figuring out exactly how the various ways of performing TIM updates
work is a little tricky, but at least we have a prototype implementation
so that our development will be quite concrete.

\subsection{The basic technology}

The standard template instantiation machine and the G-machine
perform an update after every reduction.  (The G-machine has a few
optimisations for tail calls, but the principle is the same.)
Because TIM is a {\em spineless\/}\index{spinelessness} machine,
its updating technique has to be rather different.
The key idea is this:
\begin{important}
Updates are not performed after each
reduction, as the G-machine does.
Instead,
when evaluation of a closure is started (that is, when
the closure is entered), the following steps are taken:
\begin{itemize}
\item
The current stack, and the address of the closure being entered,
are pushed onto the \stress{dump}, a new component of the machine state.
\item
A `mouse-trap'\index{mouse-trap}
is set up which is triggered when evaluation of the closure
is complete.
\item
Evaluation of the closure now proceeds normally, starting with an empty stack.
\item
When the mouse-trap is triggered, the closure is updated with
its normal form, and the old stack is restored from the dump.
\end{itemize}
The `mouse-trap' is the following. Since the evaluation of the
closure is carried out on a new stack,
the evaluation must eventually grind to a halt, because
a @Return@ instruction finds an empty stack,
or a supercombinator is being applied to too few arguments.
At this point the expression
has reached (head) normal form, so an update should be performed.
\end{important}
To begin with, let us focus on updating a closure whose value is
of integer type.
We will arrange that just before the closure is entered, a new
instruction $@PushMarker@~ x$ is executed, which sets up the update mechanism
by pushing some information on the dump.  Specifically, $@PushMarker@~x$
pushes onto the dump\footnote{%
In papers about TIM this operation is often called `pushing an update
marker'\index{marker}, because the stack is `marked' so that the attempt to
use arguments below the `marker' will trigger an update.
}:
\begin{itemize}
\item
The current stack.
\item
The current frame pointer, which points to the frame
containing the closure to be updated.
\item
The index, $x$, of the closure to be updated within the frame.
\end{itemize}
Now that it has saved the current stack on the dump,
@PushMarker@ continues with an empty stack.
Here is its state transition rule:
\timruleVD
{\timstateVD{@PushMarker@~ x : i}{f}{s}{v}{d}{h}{c}}
{\timstateVD{i}{f}{[]}{v}{(f,x,s):d}{h}{c}}

\par
Some while later, evaluation of the closure will be complete.  Since its
value is an integer, its value will be on top of the value stack, and
a @Return@ instruction will be executed in the expectation of returning
to the continuation on top of the stack.  But the stack will be empty
at this point!  This is what triggers the update: the dump is popped,
the update is performed, and the @Return@ instruction is re-executed
with the restored stack.
This action is described by the following transition rules:
\begin{etimruleVD}
\arule{\timstateVD{[@Return@]}{f}{[]}{n:v}{(f_u,x,s):d}{h}{c}}
      {\timstateVD{[@Return@]}{f}{s}{n:v}{d}{h'}{c} \\
	& \multicolumn{7}{l|}{h' = h[f_u:\langle \ldots, d_{x-1},
		(@intCode@, n), d_{x+1}, \ldots \rangle]}
}
\\ \hline
\arule{\timstateVD{[@Return@]}{f}{(i,f'):s}{n:v}{d}{h}{c}}
      {\timstateVD{i}{f'}{s}{n:v}{d}{h}{c}}
\end{etimruleVD}

The first rule describes the update of the $x$th closure in frame $f_u$,
with the closure $(@intCode@, n)$.  This is a closure which when entered
immediately pushes $n$ onto the value stack and returns
(see Section~\ref{sect:tim:arith-overview}).
Notice that the @Return@ instruction is retried, in case there is a further
update to perform; this is indicated by the fact that the code
sequence in the right-hand side of the rule is still $[@Return@]$.

The second rule is just \ruleref{rule:tim-return} written out again.
It covers the case when there is no update to be performed; the
continuation on top of the stack is loaded into the program counter and
current frame pointer.

\subsection{Compiling @PushMarker@ instructions}

The execution of the @PushMarker@ and @Return@ instructions
is thus quite straightforward,
but the tricky question is: where should the compiler plant @PushMarker@
instructions?
To answer this we have to recall the motivation for the whole updating
exercise: it is to ensure that each redex\index{redex} is only evaluated once,
by overwriting the redex with its value once it has been evaluated.
In TIM, a `redex' is a closure.  The key insight is this:
\begin{important}
we have to be very careful when
copying closures, because
once two copies exist there is no way we can ever share their evaluation.
\end{important}
For example, consider the following function definitions:
\begin{verbatim}
	g x = h x x
	h p q = q - p
\end{verbatim}
At present we will generate the
following code for @g@:
\begin{verbatim}
	[ Take 1 1, Push (Arg 1), Push (Arg 1), Enter (Label "h") ]
\end{verbatim}
The two @Push Arg@ instructions will each take a copy of the same closure,
and @h@ will subsequently evaluate each of them independently.

What we really want to do is to push not a {\em copy\/} of the closure
for @x@, but rather a {\em pointer to\/} it.
Recalling the idea of an \stress{indirection} closure
from Section \ref{sect:tim:letrec-expr}, this is easily done, by replacing
@Push (Arg 1)@ with @Push (Code [Enter (Arg 1)])@.

This gets us half-way; we are not duplicating the closure, but we still are not
updating it.  But now it is easy!  All we need to is to precede the
@Enter (Arg 1)@ instruction with @PushMarker 1@, thus:
\[
@Push (Code [PushMarker 1, Enter (Arg 1)])@
\]
That is, just before entering the shared closure, we set up the update
mechanism which will cause it to be updated when its evaluation
is complete.

The addressing mode $(@Code@~ [@PushMarker@~ n,~ @Enter@~ (@Arg@~ n)])$
is called an \stress{updating indirection} to the $n$th closure of the frame,
because it is an indirection which will cause an update to take place.
Exactly the same considerations apply to entering an argument (rather than
pushing it on the stack).  An updating indirection to the argument
must be entered, rather than the argument itself:
@Enter (Arg 1)@ must be replaced by
@Enter (Code [PushMarker 1, Enter (Arg 1)])@.

The changes to the compilation schemes
are simple.  Only the \tSC{} and \tR{} schemes are affected,
and they are both affected in the same way:
where they build an environment, they should bind each variable
to an updating indirection addressing mode.  For example, in the
@let(rec)@ case of the \tR{} scheme, we now use the following definition
for $\rho'$ (cf.\ Figure~\ref{fig:tim:let-schemes}):
\[
\rho' =
\begin{array}[t]{l}
\rho[x_1 \mapsto \J{d+1}, \ldots, x_n \mapsto \J{d+n}] \\
\mbox{where}~ \J{d} = @Code@~[@PushMarker@~d,~@Enter@~(@Arg@~d)]
\end{array}
\]

\subsection{Implementing the updating mechanism}

To implement the new updating mechanism, we need to make the following changes:
\begin{itemize}
\item
Give a type definition for the dump\index{dump!in TIM}.  It is just a stack of
triples, represented as a list, and initialised to be empty:

M4-> timDump == [(framePtr,  || The frame to be updated
M4->              num,       || Index of slot to be updated
M4->              timStack)  || Old stack
M4->            ]
GH4-> type TimDump = [(FramePtr,  -- The frame to be updated
GH4->                  Int,       -- Index of slot to be updated
GH4->                  TimStack)  -- Old stack
GH4->                ]
4-> initialDump = []

\item
Add the @PushMarker@ instruction to the @instruction@ type, with appropriate
modifications to @showInstruction@.
\item
Add a new case to @step@ for the @PushMarker@ instruction, and modify the
case for @Return@.
\item
Modify @compileSC@ and the @ELet@ case of @compileR@
to build environments which bind each variable to an updating
indirection addressing mode.
Use the function @mkUpdIndMode@ to implement the \tJ{} scheme:

M4-> mkUpdIndMode :: num -> timAMode
GH4-> mkUpdIndMode :: Int -> TimAMode
4-> mkUpdIndMode n = Code [PushMarker n, Enter (Arg n)]

\end{itemize}

\begin{exercise}
Implement the updating mechanism as described.

When running the new system on some test programs, you should
be able to watch @PushMarker@ adding update information to the dump,
and @Return@ performing the updates.  Here is one possible test
program:
\begin{verbatim}
	f x = x + x
	main = f (1+2)
\end{verbatim}
The evaluation of @(1+2)@ should only happen once!
\end{exercise}

\begin{exercise}
Here is an easy optimisation you can perform.  For a function such as:
\begin{verbatim}
	compose f g x = f (g x)
\end{verbatim}
you will see that the code for @compose@ is like this:
\begin{verbatim}
compose:        Take 3 3
		Push (Code [...])
		Enter (Code [PushMarker 1, Enter (Arg 1)])
\end{verbatim}
where the @[...]@ is code for @(g x)@.  The final instruction enters
an updating indirection for @f@.
But it is a fact that
\[
@Enter@~(@Code@~i) \qquad \mbox{is equivalent to} \qquad i
\]
(This follows immediately from \ruleref{rule:tim:enter-code}.)
So the equivalent code for @compose@ is
\begin{verbatim}
compose:        Take 3 3
		Push (Code [...])
		PushMarker 1
		Enter (Arg 1)
\end{verbatim}
Implement this optimisation.  Much the nicest way to do this is by
replacing all expressions in the compiler of the form $@[Enter@~e@]@$
with $@(mkEnter@~e@)@$, where @mkEnter@ is
defined like this:

M> mkEnter :: timAMode -> [instruction]
GH> mkEnter :: TimAMode -> [Instruction]
> mkEnter (Code i) = i
> mkEnter other_am = [Enter other_am]

@mkEnter@ is an `active' form of the @Enter@ constructor, which
checks for a special case before generating an @Enter@ instruction.
\end{exercise}

There are a number of other improvements we can make to this scheme,
and we will study them in the following sections.

\subsection{Problems with updating indirections}\index{updating indirections}

While it is simple enough, this updating mechanism is horribly inefficient.
There are two main problems, which were first distinguished by
Argo \cite{Argo89}.
The first problem is that of {\em identical updates}\index{updates!identical}.  
Consider the program
given in the previous section:
\begin{verbatim}
	f x = x+x
	main = f (1+2)
\end{verbatim}
For each use of @x@, @f@ will enter an updating closure to its argument @x@.
The first time, @x@ will be updated with its value.  The second time,
a second (and entirely redundant) update will take place, which overwrites
the @x@ with its value again.  You should be able to watch this happening
as you execute the example on your implementation.

In this case, of course, a clever compiler
could spot that @x@ was sure to be evaluated, and just copy @x@ instead
of entering an indirection to it.  But this complicates the compiler, and
in general may be impossible to spot.  For example, suppose @f@ was defined
like this:
\begin{verbatim}
	f x = g x x
\end{verbatim}
Unless @f@ analyses @g@ to discover in which order the different @x@'s are
evaluated (and in general there may be no one answer to this question), it
has to be pessimistic and push updating indirections as arguments to @g@.

The second problem is that of \stress{indirection chains}.  Consider the
program
\begin{verbatim}
	g x = x+x
	f y = g y
	main = f (1+2)
\end{verbatim}
Here, @f@ passes to @g@ an updating indirection to its argument @y@; but
@g@ enters an updating indirection to its argument @x@.  Thus @g@ enters
an indirection to an indirection. In short, chains of indirections
build up, because {\em an indirection is added
every time an argument is passed on as an argument to another function}.
Just imagine how many indirections to @m@
could build up in the following tail-recursive function!
\begin{verbatim}
	horrid n m = if (n=0) m (horrid (n-1) m)
\end{verbatim}
We will not solve these problems yet.  Instead, the next section shows
how to deal with updates for @let(rec)@-bound variables, and why these
two problems do not arise.  This points the way to a better solution for
supercombinator arguments as well.

\subsection{Updating shared @let(rec)@-bound variables}
\label{sect:tim:self-update}

So far we have assumed that @let(rec)@-bound variables are updated
in exactly the same way as supercombinator arguments, by always using
an updating indirection addressing mode for them.
For example, consider the following supercombinator definition:
\begin{verbatim}
	f x = let y = ...
	      in
	      g y y
\end{verbatim}
where the `@...@' stands for an arbitrary right-hand side for @y@.
Treating @y@ just the same as @x@, we will generate this code for @f@:
\begin{verbatim}
f:      Take 2 1                                    -- Frame with room for y
	Move 2 (Code [...code for y...])            -- Closure for y
	Push (Code [PushMarker 2, Enter (Arg 2)])   -- Indirection to y
	Push (Code [PushMarker 2, Enter (Arg 2)])   -- Indirection to y
	Enter (Label "g")
\end{verbatim}
where the `@...code for y...@' stands for the code generated from
@y@'s right-hand side.
This code suffers from the 
identical-update\index{updates!identical} problem outlined earlier.

But a much better
solution is readily available.
Suppose we generate the following code for @f@ instead:
\begin{verbatim}
f:      Take 2 1                            -- Frame with room for y
	Move 2 (Code (PushMarker 2 :
		      [...code for y...]))  -- Closure for y
	Push (Code [Enter (Arg 2)])         -- Non-updating indirection to y
	Push (Code [Enter (Arg 2)])         -- Indirection to y
	Enter (Label "g")
\end{verbatim}
The @PushMarker@ instruction has moved from the {\em uses\/} of @y@
to its {\em definition}.  The closure for @y@ built by the @Move@ instruction
is now a \stress{self-updating closure};
that is, when entered it will set up the update
mechanism which will update itself.  Once this has happened, it will never
happen again because the pointer to the code with the @PushMarker@ instruction
has now been overwritten!

In general, the idea is this:
\begin{itemize}
\item
Use a self-updating closure for the right-hand side of a @let(rec)@ binding,
by beginning the code with a @PushMarker@ instruction.
\item
Use ordinary (non-updating) indirection addressing modes when pushing
@let(rec)@-bound variables onto the stack.
We still need to use indirections, rather than taking a copy of the
closure because, until it is updated, copying it would give rise to
duplicated work.
\end{itemize}
The modifications needed to implement this idea are:
\begin{itemize}
\item
Modify the \tR{} scheme so that it generates {\em non-updating\/} indirections
for
@let(rec)@-bound variables; that is, it builds the new environment
using the \tI{} scheme rather than \tJ{}.
(For the present, \tSC{} should continue to generate
{\em updating\/} indirections, using \tJ{}, for supercombinator arguments.)

\item
Modify the \tR{} scheme for @let(rec)@ expressions, so that it generates
a @PushMarker@ instruction at the start of the code for every right-hand
side.
This is most conveniently done by creating a new compilation scheme,
the \tAL{} scheme (see Figure~\ref{fig:tim:al-scheme}),
which is used in the @let(rec)@ case of the \tR{}
scheme to compile the right-hand sides of definitions.
\tAL{} needs a extra argument to tell it which slot in the current
frame should be updated, and uses this argument to generate an
appropriate @PushMarker@ instruction.
Figure~\ref{fig:tim:al-scheme} also gives the revised @let@ equation for
the \tR{} scheme.  The modification to the @letrec@ case is exactly
analogous.
\begin{figure*}
$\begin{array}{|l|}
\hline
\\
\parbox[t]{29pc}{
$\AL{e}~u~\rho~ d$ is a pair $(d', am)$, where $am$ is a TIM
addressing mode for expression $e$
in environment $\rho$.
If the closure addressed by $am$ is entered, it will update slot $u$
of the current frame with its normal form.
The code assumes that the first $d$ slots of the frame are occupied, and
it uses slots $(d+1 \ldots d')$.
}
\\
\\
\begin{array}{rcl}
\AL{e}~ u~ \rho~ d      & = & (d',~ @Code@~ (@PushMarker@~u : is)) \\
		&& \mbox{where}~ (d', is) = \R{e}~ \rho~ d
\end{array} \\
\\
\hline
\\
\begin{array}{rcl}
\multicolumn{3}{l}{
\R{@let@~ x_1@=@e_1@;@~\ldots@;@~ x_n@=@e_n ~@in@~ e}~ \rho~ d} \\
 & = &
	(d',~  [@Move@~ (d+1)~ am_1, ~ \ldots,
		@Move@~ (d+n)~ am_n] ~\plusplus~ is)    \\
	&& \mbox{where}~ \begin{array}[t]{lcl}
			(d_1, am_1) & = & \AL{e_1}~ (d+1)~ \rho~ (d+n)  \\
			(d_2, am_2) & = & \AL{e_2}~ (d+2)~ \rho~ d_1    \\
			\ldots &&                               \\
			(d_n, am_n) & = & \AL{e_n}~ (d+n)~ \rho~ d_{n-1}\\
			\rho'    & = & \begin{array}[t]{l}
		\rho[x_1 \mapsto \I{d+1}, \ldots, x_n \mapsto \I{d+n}] \\
				\mbox{where}~ \I{d} = @Code@~[@Enter@~(@Arg@~d)]
				\end{array} \\
			(d', is) & = & \R{e}~ \rho'~ d_n
		       \end{array}      \\
\\
\multicolumn{3}{l}{\parbox{0.9\textwidth}
	{The @letrec@ case is similar, except that $\rho'$ is passed to the
	 calls to $\AL{}$ instead of $\rho$.}}
\end{array} \\
\\
\hline
\end{array}$
\caption{The \tAL{} compilation scheme, and revised \tR{} rule for @let@}
\label{fig:tim:al-scheme}
\end{figure*}
\end{itemize}

\begin{exercise}
Try out this idea and measure its effectiveness in terms of how many
steps are saved.
\end{exercise}

\begin{exercise}
Consider the expression
\begin{verbatim}
	let x = 3 in x+x
\end{verbatim}
In this case, the right-hand side of the @let@ expression is already in
normal form, so there is no point in the \tAL{} scheme generating a
@PushMarker@ instruction.  Rather, \tAL{} can simply return an @IntConst@
addressing mode for this case.

Modify the \tAL{} compilation scheme, and the corresponding @compileU@
function, and confirm that the modification works correctly.
\end{exercise}

\subsection{Eliminating indirection chains\index{indirection!chains}}
\label{sect:tim:sc-args}

The idea of the previous section shows how to eliminate identical
updates\index{updates!identical}
for @let(rec)@-bound variables.  In this section we show how
to extend the idea to eliminate identical updates for supercombinator
arguments as well and, at the same time, to eradicate the indirection chain
problem.
The idea was first proposed by Argo \cite{Argo89}.

We start with indirection chains.
As noted earlier, indirection chains build up because a supercombinator
has to assume that it must not copy any of its argument closures, so
if it uses them more than once it had better use indirections.
This gives rise to indirection chains because often the argument closure
is an indirection already, and it would be perfectly safe to copy it.

This suggests an alternative strategy:
\begin{important}
adopt the convention that
every argument closure must be freely copyable without loss of sharing.
\end{important}
This calling convention is clearly convenient for the called function,
but how can the caller ensure that it is met?
An argument is either:
\begin{itemize}
\item
a constant, whose closure is freely copyable;
\item
a supercombinator, which has the same property;
\item
a @let(rec)@-bound variable, which also can be freely copied (using the ideas
of the previous section);
\item
an argument to the current supercombinator, which is freely copyable
because of our new convention;
\item
a non-atomic expression, whose closure (as things stand) is {\em not\/} freely
copyable.
\end{itemize}

It follows that all we have to do to adopt this new calling convention is
find some way of passing non-atomic arguments as freely copyable closures.
For example, consider the expression
\begin{verbatim}
	f (factorial 20)
\end{verbatim}
How can the
argument @(factorial 20)@ be passed as a freely copyable closure.
The solution is simple: transform the expression to the following
equivalent form:
\begin{verbatim}
	let arg = factorial 20 in f arg
\end{verbatim}
The @let@ expression will allocate a slot in the current frame for
the closure for @(factorial 20)@, will put a self-updating closure in it,
and an (ordinary) indirection to this closure will be passed
to @f@.
(Notice that this transformation need only be carried out if the closure
passed to @f@ might be entered more than once.
There is an opportunity here
for a global sharing analysis to be used to generate more efficient code.)

Once this transformation
is done we can freely copy argument closures, though we must
still use (non-updating) indirections for @let(rec)@-bound closures.  No
indirection chains will build up, nor will identical updates take place.

It is interesting to reflect on what has happened.  At first it appeared as
though TIM would allocate much less heap than a G-machine, because
the entire allocation for a supercombinator call was the frame
required to hold its arguments.  However, using our new updating techniques,
we see that every sub-expression
within the supercombinator body requires a slot in the frame to hold it.
Similarly, since most supercombinator arguments are now indirections,
TIM is behaving quite like the G-machine which passes pointers to arguments
rather than the arguments themselves.  So the problems of lazy updating
have forced TIM to become more G-machine-like.

We have presented the technique as a program transformation which
introduces a @let@ expression for every argument expression, but doing so
is somewhat tiresome because it involves inventing new arbitrary variable
names.  It is easier to write the new compilation schemes more directly.
The main alteration is to the application case of the \tR{} scheme, which
is given in Figure~\ref{fig:tim:indarg-schemes}.
The first equation deals with the case where the argument to the application
is an atomic expression (variable or constant), using the \tA{} scheme to
generate the appropriate addressing mode as before.
The second equation deals with the case where the argument is a compound
expression; it initialises the next free slot in the frame with a
self-updating closure for the argument expression, and pushes an indirection
to this closure.
\begin{figure*}
$\begin{array}{|l|}
\hline
\\
\begin{array}{@@{}lcll@@{}}
\R{e~ a}~ \rho~ d       & = & (d_1,~ @Push@~ (\A{a}~\rho) ~@:@~ is)     \\
	&& \mbox{where}~\begin{array}[t]{lcl}
			\multicolumn{3}{l}{a~ \mbox{is a supercombinator,
				local variable, or constant}}   \\
			(d_1, is) & = & \R{e}~ \rho~ d
		      \end{array}                       \\
\\
\R{e_{fun}~ e_{arg}}~ \rho~ d   & = &
	(d_2,~ @Move@~(d+1)~am_{arg} :
	       @Push@~(@Code@~[@Enter@~(@Arg@~(d+1))]) :
	       is_{fun})  \\
	&& \mbox{where}~\begin{array}[t]{lcl}
			(d_1,am_{arg}) & = & \AL{e_{arg}}~(d+1)~\rho~(d+1) \\
			(d_2, is_{fun}) & = & \R{e_{fun}}~ \rho~ d_1
		      \end{array}
\end{array} \\

\\
\hline
\\

\begin{array}{@@{}lcll@@{}}
\A{n}~ \rho & = & @IntConst@~ n & \mbox{where $n$ is a number}\\
\A{x}~ \rho & = & \rho~x     & \mbox{where $x$ is bound by $\rho$}
\end{array} \\
\\
\hline
\end{array}$
\caption{Modifications to \tR{} and \tA{} for copyable arguments}
\label{fig:tim:indarg-schemes}
\end{figure*}

The \tA{} scheme, also given in Figure~\ref{fig:tim:indarg-schemes},
now has one case fewer than before, because it is only invoked
with an atomic expression (variable or constant) as its argument.
For the same reason, it no longer needs to take $d$ as an argument
and return it as a result, because it never uses any frame slots.

\begin{exercise}
Implement this revised scheme, and measure the difference in performance
from the previous version.
\end{exercise}

\subsection{Updating partial applications\index{partial applications}}

So far we have successfully dealt with the update of closures whose value
is an integer.  When a @Return@ instruction finds an empty stack, it
performs an update and pops a new stack from the dump\index{dump}.

But there is another instruction which consumes items from the stack,
namely @Take@.  What should happen if a @Take@ instruction finds fewer
items on the stack than it requires?  For example, consider the program
\begin{verbatim}
	add a b = a+b
	twice f x = f (f x)
	g x = add (x*x)
	main = twice (g 3) 4
\end{verbatim}
When @twice@ enters @f@ it will do so via an indirection, which will set up
an update for @f@.  In this example, @f@ will be bound to @(g 3)@, which
evaluates to a partial application of @add@ to one argument.  The @Take 2@
instruction at the beginning of the code for @add@ will discover that
there is only one argument on the stack, which indicates that an update
should take place, overwriting the closure for @(g x)@ with one for
@(add (x*x))@.

In general:
\begin{important}
when a @Take@ instruction finds too few arguments on the stack, it
should perform an update on the closure identified by the top item on
the dump, glue the items on the current stack on top of the stack
recovered from the dump, and retry the @Take@ instruction (in case another
update is required).
\end{important}

The @Take@ instruction is already complicated enough, and now it has a further
task to perform.  To avoid @Take@ getting too unwieldy, we split it into
two instructions: @UpdateMarkers@, which performs the check as to whether
there are enough arguments, and @Take@ which actually builds the new frame.
An $@UpdateMarkers@~n$ instruction always immediately precedes every
$@Take@~t~n$ instruction.

The transition rule for @Take@ is therefore unchanged. The rules for
@UpdateMarkers@ are as follows:
\begin{etimruleVD}
\arule{
	\timstateVD{@UpdateMarkers@~n:i}{f}{c_1:\ldots:c_m:s}{v}{d}{h}{c}
}{      \timstateVD{i}{f}{c_1:\ldots:c_m:s}{v}{d}{h}{c} \\
	& \multicolumn{7}{l|}{\mbox{where $m \geq n$}}
}
\\ \hline
\arule{
	\timstateVD{@UpdateMarkers@~n:i}{f}{c_1:\ldots:c_m:[]}
			{v}{(f_u,x,s):d}{h}{c}
}{      \timstateVD{@UpdateMarkers@~n:i}{f}{c_1:\ldots:c_m:s}{v}{d}
		{h'}{c} \\
	& \multicolumn{7}{l|}
	  {\mbox{where} ~\begin{array}[t]{l} m < n \\
			 h' = h[f_u:\langle \ldots, d_{x-1},
				(i', f'), d_{x+1}, \ldots \rangle]
			\end{array}}
}
\end{etimruleVD}
The first rule deals with the case where there are enough arguments, so that
@UpdateMarkers@ does nothing.  The second deals with the
other case, where an update
needs to take place; the appropriate closure is updated, the current stack
is glued on top of the old one, and the @UpdateMarkers@ is retried.

In this rule, $i'$ and $f'$ are the
code pointer and frame pointer which overwrite the target closure, but
so far we have not specified just what values they should take.
The way to figure out what they should be is to ask the question: what should
happen when the closure $(i',f')$ is entered?  This closure represents the
partial
application of the supercombinator to the arguments $c_1, \ldots, c_m$.  Hence,
when it is entered, it should push $c_1, \ldots, c_m$, and then jump to
the code for the supercombinator.  It follows that
\begin{itemize}
\item
$f'$ must point to a newly allocated frame $\langle c_1, \ldots, c_m \rangle$.
\item
$i'$ must be the code sequence
\[
@Push@~(@Arg@~m): \ldots: @Push@~(@Arg@~1): @UpdateMarkers@~n: i
\]
Here, the @Push@ instructions place the arguments of the partial
application onto the stack, the @UpdateMarkers@ instruction checks for
any further updates that need to take place, and $i$ is the rest of the
code for the supercombinator.
\end{itemize}

\begin{exercise}
Implement the @UpdateMarkers@ instruction, and modify the compiler to place
one before each @Take@ instruction.  Test your implementation before and
after the modification on the following program.  The program
uses higher-order functions to implement pairs\index{pairs}
(Section~\ref{sect:template:data-str-hof}).
The pair @w@ is shared, and evaluates to a partial application of the @pair@
function.
\begin{verbatim}
	pair x y f = f x y
	fst p = p K
	snd p = p K1
	main = let w = pair 2 3
	       in (fst w) * (snd w)
\end{verbatim}
You should see @w@ being updated with the partial application
for @(pair 2 3)@.   To make it a little more convincing, you could
make the right-hand side of @w@ involve a little more computation: for example
\begin{verbatim}
	main = let w = if (2*3 > 4) (pair 2 3) (pair 3 2)
	       in (fst w) * (snd w)
\end{verbatim}
\end{exercise}

\begin{exercise} \label{ex:tim:upd-zero}
Just as @Take 0 0@ does nothing, @UpdateMarkers 0@ does nothing.
Modify @compileSC@ so that it omits both of these instructions when
appropriate.  (This is a simple extension of Exercise~\ref{ex:tim:take}.)
\end{exercise}

There are a few other points worth noticing:
\begin{itemize}
\item
In a real implementation, the code $i'$ would not be manufactured afresh
whenever an update takes place, as the rule appears to say.
Instead, the code for the supercombinator $i$ can be preceded by a sequence
of @Push@ instructions, and the code pointer for a partial application
can just point into the appropriate place in the sequence.

\item
The @UpdateMarkers@ rule duplicates the closures $c_1, \ldots, c_m$.
This is fine now that supercombinator arguments are freely copyable,
a modification we introduced in Section~\ref{sect:tim:sc-args}.
Prior to that modification, making such a copy would have risked duplicating
a redex, so instead the @UpdateMarkers@ rule would have been
further complicated with indirections.  It is for this reason that
the introduction of @UpdateMarkers@ has been left so late.

\item
Suppose we are compiling code for the expression $(@f@~ e_1~ e_2)$, where
@f@ is known to be a supercombinator of 2 (or fewer) arguments.
In this case, the @UpdateMarkers@ instruction at the start of @f@ will
certainly do nothing, because the stack is sure to be deep enough
to satisfy it.
So when compiling a call to a supercombinator applied to all its
arguments (or more) we can enter its code {\em after\/} the @UpdateMarkers@
instruction.

Many of the function applications in a typical program are saturated
applications of known supercombinators, so this optimisation is frequently
applicable.
\end{itemize}

\section{Mark 5: Structured data\index{structured data!in TIM}}
\index{TIM!Mark 5}
\index{data structures!in TIM}

In this section we will study how to add
algebraic data types to TIM.  It is possible to implement data structures
without any of the material of this section, using higher-order functions
as described in Section~\ref{sect:template:data-str-hof}; but it is
rather inefficient to do so.  Instead, we will develop the
approach we used for arithmetic to be able to handle more general data
structures.

\subsection{The general approach}

Consider the function @is_empty@, which returns $1$ if its argument
is an empty list, and $0$ if not.
It is given in the context of a program which applies it to a singleton list.
\begin{verbatim}
	is_empty xs = case xs of
			<1>      -> 1
			<2> y ys -> 0

	cons a b = Pack{2,2} a b
	nil = Pack{1,0}

	main = is_empty (cons 1 nil)
\end{verbatim}

Recall from Section~\ref{sect:lang:constructors} that constructors are
denoted by $@Pack{@tag,arity@}@$.  In this program, which manipulates
lists, the empty list constructor @nil@ has tag\index{tag!of constructor}
1 and arity\index{arity!of constructor} 0, while
the list constructor @cons@ has tag 2 and arity 2.   Pattern matching
is performed only by @case@ expressions; nested patterns are matched
by nested @case@ expressions.

We consider first what code we should generate for a @case@ expression.
Just as arithmetic operators require their arguments to be evaluated,
a @case@ expression requires an expression, @xs@ in the @is_empty@ example,
to be evaluated.  After this, a multi-way jump can be taken depending on
the tag of the object returned.  Taking a similar approach to the one
we used for arithmetic
operators suggests the following conventions:
\begin{itemize}
\item
To evaluate a closure representing a
data object, a continuation is pushed onto the argument
stack, and the closure is entered.
\item
When it is evaluated to (head) normal form, this continuation is popped
from the stack and entered.
\item
The tag of the data object is returned on top of the value stack.
\item
The components of the data object (if any) are returned in a frame pointed
to by a new register, the \stressD{data frame pointer}.
\end{itemize}

So the code we would produce for @is_empty@ would be like this\footnote{%
As usual we write the code with explicit labels for continuations,
but in reality we
would compile uses of the @Code@ addressing mode so as to avoid generating
fresh labels.
}:
\begin{verbatim}
is_empty:       Take 1 1                -- One argument
		Push (Label "cont")     -- Continuation
		Enter (Arg 1)           -- Evaluate xs

cont:           Switch [ 1 -> [PushV (IntVConst 1), Return]
			 2 -> [PushV (IntVConst 0), Return]
		]
\end{verbatim}
The @Switch@ instruction does a multi-way jump based on the top item on the
value stack.  In this example,
both branches of the @case@ expression just return
a constant number.

In this example the components of the scrutinised list cell were not
used.  This is not always the case.  Consider, for example, the @sum@ function:
\begin{verbatim}
	sum xs = case xs of
			<1>      -> 0
			<2> y ys -> y + sum ys
\end{verbatim}
@sum@ computes the sum of the elements of a list.  The new feature is that
the expression @y + sum ys@ uses the components, @y@ and @ys@,  of the
list cell.  As indicated earlier, these
components are returned to the continuation in a frame
pointed to by the data frame pointer, a new register.  (Exercise: why
cannot the ordinary frame pointer be used for this purpose?)

So far, every local variable (that is, supercombinator argument
or @let(rec)@-bound variable) has a slot in the current frame which contains
its closure, so it seems logical to extend the idea, and add further slots
for @y@ and @ys@.  All we need to do is to move the closures out of the
list cell frame, and into the current frame.  Here, then, is the code for
@sum@:
\begin{verbatim}
sum:    Take 3 1                -- One argument, two extra slots for y,ys
	Push (Label "cont")     -- Continuation for case
	Enter (Arg 1)           -- Evalute xs

cont:   Switch [
	    1 -> [PushV (IntVConst 0), Return]
	    2 -> [Move 2 (Data 1)
		  Move 3 (Data 2)
		  ...code to compute y + sum ys...
		 ]
	]
\end{verbatim}
The @Move@ instructions use a new addressing mode @Data@, which
addresses a
closure in the frame pointed to by the data frame pointer.
The two @Move@ instructions copy @y@ and @ys@ from the list cell into
the current frame (the one which contains @xs@).

In summary, a @case@ expression is compiled into five steps:
\begin{enumerate}
\item
Push a continuation.
\item
Enter the closure to be scrutinised.  When it is evaluated, it will
enter the continuation pushed in Step 1.
\item
The continuation uses a @Switch@ instruction to take a multi-way jump
based on the tag, which is returned on top of the value stack.
\item
Each branch of the @Switch@ begins with @Move@ instructions to copy the
contents of the data object into the current frame.
Since this copies the closure, we must be sure that all closures in data
objects have the property that they can freely be copied
(Section~\ref{sect:tim:sc-args}).
\item
Each alternative then continues with the code for that alternative,
compiled exactly as usual.
\end{enumerate}

Finally, we can ask what code should be generated for the expression
$@Pack{@tag,arity@}@$.  Consider, for example, the expression
\[
@Pack{1,2}@~e1~e2
\]
which builds a list cell.  The minimalist approach is to treat @Pack{1,2}@
as a supercombinator, and generate the following code\footnote{%
In principle there are an infinite number of possible constructors, so
it seems that we need an infinite family of similar
code fragments for them in the
code store.  In practice this is easily avoided as will be seen when we
write the detailed compilation schemes.
}:
\begin{verbatim}
	Push (...addressing mode for e2...)
	Push (...addressing mode for e1...)
	Enter (Label "Pack{1,2}")
\end{verbatim}

The code for @Pack{1,2}@ is very simple:
\begin{verbatim}
Pack{1,2}:      UpdateMarkers 2
		Take 2 2
		ReturnConstr 1
\end{verbatim}
The first two instructions are just the same as for any other supercombinator.
The @UpdateMarkers@ instruction performs any necessary updates, and
the @Take@ instruction builds the frame containing the two components of
the list cell, putting a pointer to it in the current frame pointer.
Finally, a new instruction, @ReturnConstr@, enters the continuation, while
pushing
a tag of @1@ onto the value stack, and copying the current frame
pointer into the data frame pointer.  Like @Return@, @ReturnConstr@ needs to
check for updates and perform them when necessary.

\subsection{Transition rules and compilation schemes for data structures}

Now that we have completed the outline, we can give the details of
the transition rules and compilation schemes for the new constructs.
The rule for @Switch@ is as follows:
\begin{etimruleVDD}
\arule{\timstateVDD{[@Switch@~ [\ldots~t~@->@~i~\ldots]]}
		{f}{f_d}{s}{t:v}{d}{h}{c}
}{
       \timstateVDD{i}{f}{f_d}{s}{v}{d}{h}{c}
}
\end{etimruleVDD}

There are two rules for @ReturnConstr@, because it has to account for
the possibility that an update is required.  The first is straightforward,
when there is no update to be done:
\begin{etimruleVDD}
\arule{\timstateVDD{[@ReturnConstr@~t]}{f}{f_d}{(i,f'):s}{v}{d}{h}{c}}
      {\timstateVDD{i}{f'}{f}{s}{t:v}{d}{h}{c}}
\end{etimruleVDD}
The second rule deals with updating, overwriting the
closure to be updated with a code sequence containing only a @ReturnConstr@
instruction, and the data frame pointer:
\begin{etimruleVDD}
\arule{\timstateVDD{[@ReturnConstr@~t]}{f}{f_d}{[]}{v}{(f_u,x,s):d}{h}{c}}
      {\timstateVDD{[@ReturnConstr@~t]}{f}{f_d}{s}{v}{d}{h'}{c} \\
	& \multicolumn{8}{l|}
	  {\mbox{where}~h'=h[f_u:\langle \ldots, d_{x-1},
		([@ReturnConstr@~t], f), d_{x+1}, \ldots \rangle]}
}
\end{etimruleVDD}

The only changes to the compilation schemes are to add extra cases to the
\tR{} scheme for constructors and for @case@ expressions.  The latter
is structured by the use of an auxiliary scheme \tE{}, which
compiles a @case@ alternative (Figure~\ref{fig:tim:case-schemes}).
Notice that constructors are compiled `in-line' as they are encountered,
which avoids the need for
an infinite family of definitions to be added to the code store.
\begin{figure*} %\centering
$\begin{array}{|l|}
\hline
\\
\begin{array}{@@{}lcl@@{}}
\R{@Pack{@t@,@a@}@}~\rho~d & = &
		(d,~ [@UpdateMarkers@~a,~ @Take@~a~a,~@ReturnConstr@~t]) \\
\\
\R{@case@~e~@of@~alt_1~ \ldots ~alt_n}~ \rho~ d & = &
	(d',~ @Push@~ (@Code@~[@Switch@~[branch_1 ~\ldots~ branch_n]]):~
		is_e) \\
	&& \mbox{where}~\begin{array}[t]{@@{}lcl@@{}}
			(d_1, branch_1) & = & \E{alt_1}~\rho~d \\
			\ldots && \\
			(d_n, branch_n) & = & \E{alt_n}~\rho~d  \\
			(d', is_e) & = & \R{e}~ \rho~ max(d_1, \ldots, d_n) \\
		      \end{array}
\end{array} \\
\\
\hline
\\
\parbox[t]{29pc}{
$\E{alt}~\rho~d$, where $alt$ is a @case@ alternative, is a pair $(d',branch)$,
where $branch$ is the @Switch@ branch compiled in environment $\rho$.
The code assumes that the first $d$ slots of the frame are occupied, and
it uses slots $(d+1 \ldots d')$.}\\
\\
\begin{array}{@@{}lcl@@{}}
\E{@<@t@>@~x_1\ldots x_n~@->@~body}~\rho~d
	& = & (d',~ t~@->@~(is_{moves} \plusplus is_{body}))    \\
	&& \mbox{where}~\begin{array}[t]{@@{}lcl@@{}}
		is_{moves} & = & [@Move@~(d+1)~(@Data@~1), \\
			   & &   \ldots,                   \\
			   & &   @Move@~(d+n)~(@Data@~n)]   \\
		(d',is_{body}) & = & \R{body}~\rho'~(d+n)          \\
		\rho' & = &
		\rho[x_1 \mapsto @Arg@~(d+1), \\
		      & & \ldots,\\
		      & & x_n \mapsto @Arg@~(d+n)]
		\end{array}
\end{array} \\
\\
\hline
\end{array}$
\caption{Compilation schemes for @case@ expressions}
\label{fig:tim:case-schemes}
\end{figure*}

\subsection{Trying it out}
\label{sect:tim:data-prelude}

We can use the new machinery to implement lists\index{lists} and booleans\index{booleans},
by using the following extra Core-language definitions:
\begin{verbatim}
	cons = Pack{2,2}
	nil  = Pack{1,0}

	true  = Pack{2,0}
	false = Pack{1,0}
	if cond tbranch fbranch = case cond of
					<1> -> fbranch
					<2> -> tbranch
\end{verbatim}
Notice that @if@, which previously had a special instruction and
case in the compilation schemes, is now just a supercombinator definition
like any other. Even so, it is often clearer to write programs using @if@
rather than @case@, so you may want to leave the special case in your
compiler; but now you can generate a @Switch@ instruction rather than
a @Cond@ instruction.  (The latter can vanish.)

\begin{exercise}
Implement the new instructions and compilation schemes.

Test your new implementation on the following program:
\begin{verbatim}
	length xs = case xs of
			<1>      -> 0
			<2> p ps -> 1 + length ps

	main = length (cons 1 (cons 2 nil))
\end{verbatim}
A more interesting example, which will demonstrate whether your
update code is working correctly, is this:
\begin{verbatim}
	append xs ys = case xs of
			<1>      -> ys
			<2> p ps -> cons p (append ps ys)

	main = let xs = append (cons 1 nil) (cons 2 nil)
	       in
	       length xs + length xs
\end{verbatim}
Here @xs@ is used twice, but the work of appending should
only be done once.
\end{exercise}

\begin{exercise}
If the arity, $a$, of the constructor is zero, then $\R{@Pack{@t@,@a@}@}$ will
generate the code $@[UpdateMarkers 0, Take 0 0, ReturnConstr@~t@]@$.  Optimise
the \tR{} scheme and @compileR@ function to generate better code for this case
(cf.\ Exercise~\ref{ex:tim:upd-zero}).
\end{exercise}

\subsection{Printing a list\index{printing}}
\label{sect:tim:print-list}

The example programs suggested so far have all returned an integer,
but it would be nice to be able to return and print a list instead.

The way we expressed this in the G-machine chapter was to add an extra
component to the machine state to represent the {\em output}, together with
an instruction @Print@, which appends a number to the end of the output.
In our case, numbers are returned on the value stack, so @Print@ consumes
a number from the value stack and appends it to the output.

At present @compile@ initialises the stack with the continuation
@([],FrameNull)@,
which has the effect of stopping the machine when it is entered.
All we need to do is change this continuation to do the printing.
This time, the continuation expects the value of the program to be a list,
so it must do case analysis to decide how to proceed.  If the list is empty,
the machine should halt, so that branch can just have the empty code sequence.
Otherwise, the head of the list should be evaluated and printed, and the
tail then given the original continuation again.  Here is the code:
\begin{verbatim}
topCont:        Switch [ 1 -> []
			 2 -> [ Move 1 (Data 1)         -- Head
				Move 2 (Data 2)         -- Tail
				Push (Label "headCont")
				Enter (Arg 1)           -- Evaluate head
			      ]
		]

headCont:       Print
		Push (Label "topCont")
		Enter (Arg 2)                           -- Do the same to tail
\end{verbatim}
Notice that the @topCont@ code needs a 2-slot frame for working storage,
which @compile@ had better provide for it.  @compile@ therefore initialises
the stack with the continuation
\begin{verbatim}
	(topCont, frame)
\end{verbatim}
where @topCont@ is the code sequence above, and @frame@ is the address of
a 2-slot frame allocated from the heap.

\begin{exercise}
Implement list printing as described.  The only tiresome aspect is that
you need to add an extra component to the machine state (again).

As usual, you can use @Push (Code ...)@ instead of @Push (Label "headCont")@,
and in fact you can do the same for @Push (Label "topCont")@, by using a
little recursion!

Test your work on the following program:
\begin{verbatim}
	between n m = if (n>m) nil (cons n (between (n+1) m))
	main = between 1 4
\end{verbatim}
\end{exercise}

\begin{exercise}
When running a program whose result is a list, it is nice to have the
elements of the list printed as soon as they become available.  With
our present implementation, either we print every state (if we use
@showFullResults@) or we print only the last state (using
@showResults@).  In the former case we get far too much output, while
in the latter we get no output at all until the program terminates.

Modify @showResults@ so that it prints the output as it is produced.
The easiest way to do this is to compare the output component of
successive pairs of states, and to print the last element when the output
gets longer between one state and the next.

Another possible modification to @showResults@ is to print a dot for each state
(or ten states), to give a rough idea of how much work is done between each
output step.
\end{exercise}

\subsection{Using data structures directly\advanced}

One might ask why we cannot use the components of a data structure
directly in the arms of a @Switch@ instruction, by using
@Data@ addressing modes in instructions other than @Move@.
The reason can be found in the @sum@ example, which we repeat here:
\begin{verbatim}
	sum xs = case xs of
			<1>      -> 0
			<2> y ys -> y + sum ys
\end{verbatim}
Now, let us follow the code for @y + sum ys@ a little further.
This code
must first evaluate @y@, which may take a lot of computation, certainly
using the data frame pointer register.  Hence, by the time it comes to
evaluate @ys@, the data frame pointer will have been changed, so @ys@ will
no longer be accessible via the data frame pointer.
By moving the contents of the list cell into the
current frame, we enable them to be preserved across further evaluations.

Sometimes, no further evaluation is to be done, as in the @head@ function:
\begin{verbatim}
	head xs = case xs of
			<1>      -> error
			<2> y ys -> y
\end{verbatim}
In this case, as an optimisation we could use @y@ directly from the
data frame; that is,
the second branch of the @Switch@ instruction would be simply
$[@Enter@~(@Data@~1)]$.

Similarly, if a variable is not used at all in the branch of the
@case@ expression, there is no need to move it into the current frame.

\section{Mark 6: Constant applicative forms and the code store\advanced}
\label{sect:tim:caf}
\index{TIM!Mark 6}

As we mentioned earlier (Section~\ref{sect:tim:compiler}),
our decision to represent
the code store as an association list of names and code sequences
means that CAFs\index{CAF} do not get updated.  Instead, their code is
executed each time they are called, which will perhaps duplicate
work.  We would like to avoid this extra work, but the solution for
the TIM is not quite as easy as that for our earlier implementations.

In the case of the template instantiation machine and the G-machine,
the solution was to allocate a node in the heap to represent
each supercombinator.  When a CAF is called, the root of the redex is
the supercombinator node itself, and so the node is updated with
the result of the reduction (that is, an instance of the right-hand
side of the supercombinator definition).   Any subsequent use of the
supercombinator will see this updated node instead of the
original supercombinator.
The trouble is that the TIM does not have heap nodes at all; what corresponds
to a node is a closure within a frame.  So what we have to do is
to allocate in the initial heap a single giant frame, the
\stressD{global frame},
which contains a closure for each supercombinator.

The code store is now represented by the address, $f_G$, of the global frame,
together with an association list, $g$,
mapping supercombinator names to their offset in the frame.
The @Label@ addressing mode uses this association list to find the offset, and
then fetches the closure for the supercombinator from the global frame.
The new transition rule for @Push Label@ formalises these
ideas:
\timrule{
   \timstate
	{@Push@~ (@Label@~ l): i}
	{f}{s}
	{h[f_G:\langle (i_1,f_1),\ldots,(i_n,f_n)\rangle]}
	{(f_G,g[l:k])}
}{
   \timstate
	{i}{f}{(i_k,f_k):s}{h}{(f_G,g)}
}

The rule for @Enter Label@ follows directly from the @Push@/@Enter@
relationship.
Each closure in the global frame is a self-updating
closure, as described in the context of @let(rec)@-bound variables
in Section~\ref{sect:tim:self-update}.   Just as for @let(rec)@-bound variables,
when pushing a supercombinator on the stack we should use a (non-updating)
indirection (Section~\ref{sect:tim:self-update}).

\subsection{Implementing CAFs}

Here is what needs to be done to add
proper updating for CAFs to a Mark 4 or Mark 5 TIM.
\begin{itemize}
\item
The code store component of the machine state now contains the address of the
global frame and an association between supercombinator names and frame offsets:

M6> codeStore == (addr, assoc name num)
GH6> type CodeStore = (Addr, ASSOC Name Int)

The @showSCDefns@ function must be altered to take account of this change.

\item
The function @amToClosure@ must take different
action for a @Label@ addressing mode,
as described above.

\item
The initial environment, @initial_env@, computed in the @compile@ function,
must be altered to
generate an indirection addressing mode for each supercombinator.

\item
The last modification involves most work.
We need to alter the @compile@ function
to build the initial heap, just as we did in
the @compile@ function of the template instantiation machine and the G-machine.
\end{itemize}

The last of these items needs a little more discussion.  Instead of
starting with an empty heap, @compile@ now needs to build an initial
heap, using an auxiliary function @allocateInitialHeap@.
@allocateInitialHeap@ is passed the @compiled_code@ from the @compile@
function.  It allocates a single big frame containing a closure for
each element of @compiled_code@, and returns the initial heap and the
@codeStore@:

M6> allocateInitialHeap :: [(name, [instruction])] -> (timHeap, codeStore)
GH6> allocateInitialHeap :: [(Name, [Instruction])] -> (TimHeap, CodeStore)
6> allocateInitialHeap compiled_code
6>  = (heap, (global_frame_addr, offsets))
6>    where
6>    indexed_code = zip2 [1..] compiled_code
6>    offsets = [(name, offset) | (offset, (name, code)) <- indexed_code]
6>    closures = [(PushMarker offset : code, global_frame_addr) |
6>                       (offset, (name, code)) <- indexed_code]
6>    (heap, global_frame_addr) = fAlloc hInitial closures

@allocateInitialHeap@ works as follows.  First the @compiled_code@
list is indexed, by pairing each element with a frame offset, starting
at 1.  Now this list is separately processed to produce @offsets@, the
mapping from supercombinator names to addresses, and @closures@, the
list of closures to be placed in the global frame.  Finally, the
global frame is allocated, and the resulting heap is returned together
with the code store.

Notice that @global_frame_addr@ is used in constructing @closures@;
the frame pointer of each supercombinator closure is the global frame
pointer itself, so that the @PushMarker@ instruction pushes an update
frame referring to the global frame.

\begin{exercise}
Make the required modifications to @showSCDefns@,
@compileA@, @amToClosure@ and @compile@.
Test whether updating of CAFs does in fact take place.
\end{exercise}

\begin{exercise}
The @PushMarker@ instruction added inside @allocateInitialHeap@ is only required
for CAFs, and is a waste of time for supercombinators with one or more
arguments.
Modify @allocateInitialHeap@ to plant the @PushMarker@ instruction only for
CAFs. (Hint: you can identify non-CAFs by the fact that their code
begins with a $@Take@~n$ instruction, where $n>0$.)
Measure the improvement.
\end{exercise}

\begin{exercise}
An indirection addressing mode is only required for CAFs, and not
for non-CAF supercombinators.
Modify the construction of @initial_env@ to take advantage of this fact.
\end{exercise}

\subsection{Modelling the code store more faithfully}

There is something a little odd about our handling of @Label@s so far.
It is this: the names of supercombinators get looked up in the
environment at compile-time (to map them to a @Label@ addressing mode),
and then again at run-time (to map them to an offset in the global
frame\footnote{%
We are assuming that we have implemented the changes suggested in the
previous section for CAFs, but this section applies also to the
pre-CAF versions of the machine.
}).
This is hardly realistic: in a real compiler, names will be looked up
at compile-time, but will be linked to a hard machine address before run-time,
so no run-time lookups take place.

We can model this by changing the @Label@ constructor to take two arguments
instead of one, thus:
\begin{verbatim}
  timAMode ::= Label name num
	       | ...as before...
\end{verbatim}
The @name@ field records the name of the supercombinator as before, but
now the @num@ says what offset to use in the global frame.
Just as in the @NSupercomb@ constructor of the template machine, the
@name@ field is only there for documentation and debugging purposes.
The code store component now becomes simply the address of the
global frame, as you can see from the
revised rule for @Push Label@:
\timrule{
   \timstate
	{@Push@~ (@Label@~ l~k): i}
	{f}{s}
	{h[g:\langle (i_1,f_1),\ldots,(i_n,f_n)\rangle]}
	{g}
}{
   \timstate
	{i}{f}{(i_k,f_k):s}{h}{g}
}
The rule for @Enter@ follows from the @Push@/@Enter@ relationship.
\begin{exercise}
Implement this idea.  To do this:
\begin{itemize}
\item
Change the @timAMode@ type as described.
\item
Change the @codeStore@ type to consist only of a frame pointer.
\item
Change the @compile@ function so that it generates the correct
initial state for the machine.
In particular, it must generate
an @initial_env@ with the right @Label@ addressing
modes.
\item
Adjust the @show@ functions to account for these changes.
\end{itemize}
\end{exercise}

\section{Summary}

The final TIM compilation schemes are summarised in
Figures~\ref{fig:tim-final1} and \ref{fig:tim-final2}.
The obvious question is `is the TIM better or worse than the G-machine?';
it is a hard one to answer.  Our prototypes are very useful for exploring
design choices, but really no good at all for making serious performance
comparisons.  How can one establish, for example, the relative costs of
a @Take@ instruction compared with a G-machine @Mkap@?
About the only really comparable measure we have available is the heap
consumption of the two.
\begin{figure*}
$\begin{array}{|l|}
\hline

%                       THE SC SCHEME

\\
\parbox{0.9\textwidth}{
$\SC{def}~\rho$ is the TIM code for the supercombinator definition $def$
compiled in environment $\rho$.} \\
\\
\begin{array}{lcl}
\SC{f~ x_1~ \ldots~ x_n~ =~ e}~\rho
		& = & @UpdateMarkers@~n ~:~ @Take@~ d'~ n~ @:@~ is    \\
&& \mbox{where}~ (d', is) = \R{e}~
		\rho[x_1 \mapsto @Arg@~1,\ldots,x_n \mapsto @Arg@~n]~ n
\end{array} \\
\\
\hline


%                       THE R SCHEME

\\

\parbox[t]{0.9\textwidth}{
$\R{e}~\rho~d$ is a pair $(d', is)$, where $is$ is
TIM code which applies the value of the expression
$e$ in environment $\rho$ to the arguments on the stack.
The code $is$ assumes that the first $d$ slots of the frame are occupied, and
it uses slots $(d+1 \ldots d')$.} \\
\\
\begin{array}{rcl}
\R{e}~\rho~d              & = & \B{e}~\rho~d~ [@Return@] \\
	&& \mbox{where $e$ is an integer or arithmetic expression}      \\
\\
\R{a}~ \rho~ d  & = & (d,~ [@Enter@~ (\A{a}~\rho)])                     \\
	&& \mbox{where $a$ is a supercombinator or local variable} \\
\\
\R{e~ a}~ \rho~ d       & = & (d_1,~ @Push@~ (\A{a}~\rho) ~@:@~ is)     \\
	&& \mbox{where}~\begin{array}[t]{lcl}
			\multicolumn{3}{l}{a~ \mbox{is a supercombinator,
				local variable, or integer}}    \\
			(d_1, is) & = & \R{e}~ \rho~ d
		      \end{array}                       \\
\\
\R{e_{fun}~ e_{arg}}~ \rho~ d   & = &
	(d_2,~ @Move@~(d+1)~am_{arg} : @Push@~\I{d+1} :
	       is_{fun})  \\
	&& \mbox{where}~\begin{array}[t]{lcl}
			(d_1,am_{arg}) & = & \AL{e_{arg}}~(d+1)~\rho~(d+1) \\
			(d_2, is_{fun}) & = & \R{e_{fun}}~ \rho~ d_1
		      \end{array} \\
\\
\multicolumn{3}{l}{
\R{@let@~ x_1@=@e_1@;@~\ldots@;@~ x_n@=@e_n ~@in@~ e}~ \rho~ d} \\
 & = &
	(d',~  [@Move@~ (d+1)~ am_1, ~ \ldots,
		@Move@~ (d+n)~ am_n] ~\plusplus~ is)    \\
	&& \mbox{where}~ \begin{array}[t]{lcl}
			(d_1, am_1) & = & \AL{e_1}~ (d+1)~ \rho~ (d+n)  \\
			(d_2, am_2) & = & \AL{e_2}~ (d+2)~ \rho~ d_1    \\
			\ldots &&                               \\
			(d_n, am_n) & = & \AL{e_n}~ (d+n)~ \rho~ d_{n-1}\\
			\rho'    & = &
		\rho[x_1 \mapsto \I{d+1}, \ldots, x_n \mapsto \I{d+n}] \\
			(d', is) & = & \R{e}~ \rho'~ d_n
		       \end{array}      \\
\\
\multicolumn{3}{l}{\parbox{0.9\textwidth}
	{The @letrec@ case is similar, except that $\rho'$ is passed to the
	 calls to $\AL{}$ instead of $\rho$.}} \\
\\
\R{@Pack{@t@,@a@}@}~\rho~d & = &
		(d,~ [@UpdateMarkers@~a,~ @Take@~a~a,~@ReturnConstr@~t]) \\
\\
\multicolumn{3}{l}{
\R{@case@~e~@of@~alt_1~ \ldots ~alt_n}~ \rho~ d} \\
 & = &  (d',~ @Push@~ (@Code@~[@Switch@~[branch_1 ~\ldots~ branch_n]]):~
		is_e) \\
	&& \mbox{where}~\begin{array}[t]{lcl}
			(d_1, branch_1) & = & \E{alt_1}~\rho~d \\
			\ldots && \\
			(d_n, branch_n) & = & \E{alt_n}~\rho~d  \\
			(d', is_e) & = & \R{e}~ \rho~ max(d_1, \ldots, d_n) \\
		      \end{array}                       \\
\end{array} \\
\\
\hline
\end{array}$
\caption{Final TIM compilation schemes (part 1)}
\label{fig:tim-final1}
\end{figure*}
\begin{figure*}
$\begin{array}{|l|}
\hline

%                       THE E SCHEME

\\
\parbox[t]{0.9\textwidth}{
$\E{alt}~\rho~d$, where $alt$ is a @case@ alternative, is a pair
$(d', branch)$,
where $branch$ is the @Switch@ branch compiled in environment $\rho$.
The code assumes that the first $d$ slots of the frame are occupied, and
it uses slots $(d+1 \ldots d')$.} \\
\\
\begin{array}{rcl}
\E{@<@t@>@~x_1\ldots x_n~@->@~body}~\rho~d
	& = & (d',~ t~@->@~(is_{moves} \plusplus is_{body}))    \\
	&& \mbox{where}~\begin{array}[t]{@@{}lcl@@{}}
		is_{moves} & = & [@Move@~(d+1)~(@Data@~1), \\
			    & &   \ldots,                  \\
			    & &  @Move@~(d+n)~(@Data@~n)]  \\
	    (d',is_{body}) & = & \R{e}~\rho'~(d+n)          \\
		     \rho' & = & \rho[x_1 \mapsto @Arg@~(d+1), \\
			    & &   \ldots,                      \\
			    & &  x_n \mapsto @Arg@~(d+n)]
		\end{array}     \\
\end{array} \\
\hline

%                       THE AL SCHEME

\\
\parbox[t]{0.9\textwidth}{
$\AL{e}~u~\rho~ d$ is a pair $(d', am)$, where $am$ is a TIM
addressing mode for expression $e$
in environment $\rho$.
If the closure addressed by $am$ is entered, it will update slot $u$
of the current frame with its normal form.
The code assumes that the first $d$ slots of the frame are occupied, and
it uses slots $(d+1 \ldots d')$.} \\
\\
\begin{array}{rcll}
\AL{n}~ u~ \rho~ d      & = & (d,~ @IntConst@~ n)       &
		\mbox{where $n$ is an integer constant} \\
%
\AL{e}~ u~ \rho~ d      & = & (d',~ @Code@~ (@PushMarker@~u : is))
			& \mbox{otherwise}  \\
		&& \mbox{where}~ (d', is) = \R{e}~ \rho~ d
\end{array} \\
\hline


%                       THE A SCHEME

\\
\parbox[t]{0.9\textwidth}{
$\A{e}~\rho$ is a TIM addressing mode for expression $e$
in environment $\rho$.} \\
\\
\begin{array}{lcll}
\A{n}~ \rho & = & @IntConst@~ n & \mbox{where $n$ is a number}\\
\A{x}~ \rho & = & \rho~x     & \mbox{where $x$ is bound by $\rho$}
\end{array} \\
\hline


%                       THE I SCHEME

\\
\parbox[t]{0.9\textwidth}{
$\I{d}$ is an indirection addressing mode for frame offset $d$} \\
\\
\I{d} = @Code@~[@Enter@~(@Arg@~d)] \\
\hline

%                       THE B SCHEME

\\
\parbox{0.9\textwidth}{
$\B{e}~\rho~d~cont$ is a pair $(d',is)$, where $is$ is TIM code 
which evaluates $e$ in environment $\rho$, 
putting its value (which should be an integer) on top of the value stack,
and continuing with the code sequence $cont$. 
The code assumes that the first $d$ slots of the frame are occupied, and
it uses slots $(d+1 \ldots d')$.} \\
\\
\begin{array}{rcll}
\B{e_1 ~@+@~ e_2}~\rho~ d~cont
		& = & \B{e_2}~\rho~ d_1~is_1 & \\
		&& \multicolumn{2}{l}
		   {\mbox{where $(d_1,is_1) = \B{e_1}~\rho~ d~(@Op Add@ ~:~ cont)$}} \\
\multicolumn{4}{l}{\qquad \mbox{\em \ldots and similar rules
		for other arithmetic primitives}} \\
\\
\B{n}~\rho~d~cont & = & (d,@PushV@~(@IntVConst@~n) ~:~ cont)
		& \mbox{where $n$ is a number}  \\
\B{e}~\rho~d~cont & = & (d',@Push@~(@Code@~cont) ~:~ is)
		& \mbox{otherwise} \\
		&& \multicolumn{2}{l}
		   {\mbox{where $(d',is) =  \R{e}~\rho~d$}}
\end{array} \\
\\
\hline
\end{array}$
\caption{Final TIM compilation schemes (part 2)}
\label{fig:tim-final2}
\end{figure*}

Still, it can be very illuminating to explore another evaluation model,
as we have done in this chapter, because it suggests other design avenues
which combine aspects of the TIM with those of the G-machine.  One attempt to
do so is the Spineless Tagless G-machine\index{Spineless Tagless G-machine}
\cite{STG1,STG2}, which adopts
the spinelessness and update mechanism of TIM, but whose stack consists
of pointers to heap objects (like the G-machine) rather than code-frame
pairs (as in TIM).

\theendnotes

% end of chap04