1 Mathematical Preliminaries
Throughout the course, we are going to define sets recursively, define relations on those sets (also recursively), and then prove relationships between these sets and elements of the sets that satisfy specific relations. We will also use a particularly expressive notion called a context as tool to streamline our definitions of relations. This tool is very powerful and expressive, so we’ll explore it in a more familiar setting here in this chapter before putting it to use in our study of type systems.
1.1 Defining Sets
We will define sets via rules that describe the membership of the set recursively. Generally, the definitions will consists of a number of base case elements of the set and ways to build bigger sets out of sets that are already built.
Here is a first example definition of a set, specifically the set of binary trees:
In words, this definition says that is the
smallest set satisfying two conditions, one for each line of
the definition. The first line says that
is a
member of the set
. The second line says that, for
any integer
and any two other binary trees,
and
,
is a
member of the set
.
For example, we know that
is a member of the set
by using the first rule three times and the second
rule two times. Conversely, we know that
and
are not members of
the set, as the rules provide no way to build up elements
looking like those.
As a notational device, we will pun on the use of
both as the set and as an element of the set, using the name
of the element to indicate which set it comes from.
1.2 Proving Properties by Induction
To do proofs with these relations, we use structural induction, a proof technique that generalizes the usual form of induction on natural numbers. When we do a proof by induction on natural numbers, we’re proving a property that’s indexed by natural numbers. We start by proving that the property is true when the natural number is zero, and then we assume that the property is true for some given natural number and prove that it holds for the next one. These two lemmas enable us to build up a proof for any particular natural number that we encounter later on, as any particular natural number can be found by starting at zero and counting up, one number at a time.
The root of this idea is the idea of a well-founded order, specifically, an ordering on a set that covers all of the elements in the set but where all of the descending chains in the order are finite. For the natural numbers, the ordering is simply the usual less-than ordering. That is, given any particular natural number, there are only a finite number of other naturals that are less than it, so chaining together these lemmas will always eventually bottom out, making the proof technique sound.
Taking this idea to our binary tree set definition, there
are two ways to create binary trees, either they are leaves
or they are nodes (which contain two smaller trees). So, if
we want to prove some property of binary trees we can
organize the proof into two parts: first we prove the
property for leaves and then we assume the property holds
for two arbitrary trees and show that it holds for the tree
you get by combining the two together with an arbitrary
integer using . These then become the two lemmas
that can be chained together to obtain the proof for any
particular binary tree. In other words, if we can prove
those two facts, we know the property holds for all of the
binary trees.
Theorem. For any with l leaves and height h,
l ≤ 2^h.
Case one: the tree is a leaf. In this case, we know l is 1 and h is 0, so specializing the claim, we get 1 ≤ 2^0, which is true.
Case two: the tree is built with
. Each node contains two binary trees; let’s name them
and
. Let’s also name their heights h_1 and h_2, respectively and name the number of leaves they have l_1 and l_2. Our goal is to show that the number of leaves in the tree
, which is l_1+l_2 (since all of the leaves of both
and
are in the tree) is at most 2^{max(h_1,h_2)+1}, as the tree
has the height max(h_1,h_2)+1 (from the definition of the height of a tree).
As we doing this proof by induction, we also get to assume that the lemma we are proving holds for the trees
and
. So, we know that l_1 ≤ 2^{h_1} and l_2 ≤ 2^{h_2}.
To complete the proof, we need to use some facts about numbers. By adding the two inductive assumptions, we know that l_1+l_2 ≤ 2^{h_1} + 2^{h_2} Since the maximum of two numbers is larger than either of them and exponentiation is increasing, we know 2^{h_1} ≤ 2^{max(h_1,h_2)} and similarly for l_2, so we have l_1 + l_2 ≤ 2^{max(h_1,h_2)} + 2^{max(h_1,h_2)} We can rearrange the right-hand side using properties of exponentiation and arrive at l_1+l_2 ≤ 2^{max(h_1,h_2) + 1} which was the goal.
QED.
1.3 Defining Relations to Identify Desired Subsets
In order to capture particular, preferred subsets of terms,
we will use relations on the sets of terms. The idea is that
these subsets will have some nice property that we are
interested in studying. As a first example, we can define a
relation that captures which trees are perfect binary trees.
A perfect binary tree is one where every path from the root
to a leaf has the same length. We write
to indicate that
is a
perfect binary tree and all the paths to leaves have length
.
We define the relation inductively, just as the sets are defined inductively, using rules. We write the rules as sequents, meaning we write premises (assumptions) above a bar and a conclusion below the bar, and a name for the rule beside the bar. We take this to mean that whenever the premises hole, then the conclusion is true. Just like the definitions of the sets, the relations are the smallest ones that satisfy the rules, meaning that the only way to show membership in the relation is to use the rules.
Let’s clarify this with an example. Here are the rules for perfect trees:
The first rule says that it is always the case (i.e.,
requiring no assumptions), that leaf nodes are perfect trees
with a path-length of zero. The second rule says that, if
is a perfect tree of length
and so is
, then the tree
is a
perfect tree of length
.
Perfect trees have only certain fixed sizes; there is, for
example, no perfect tree that has four nodes in it. We can
generalize the idea of a perfect trees a little bit to allow
such trees by saying that every path from the root to a leaf
has either a length or
and,
furthermore, all of the paths of length
are to the
left and the paths of length
are to the
right, when drawn out. Such trees are called complete trees.
For example, the tree on the left is a complete tree of height four and the tree on the right is a tree of height four that is not complete because the bottom row of nodes is not filled in from the left.
We’ll write to indicate that
is a complete tree that has paths that are either
of length
or of length
. We can
define the relation in a manner similar to the definition
for perfect trees, using rules with assumptions and
conclusions:
Note that these rules introduce a subtle point: with
complete trees, there are two different rules that can both
construct trees that end with a .
Here’s an example derivation, showing how the rules capture
the complete tree with four nodes shown above; it uses both
variants of the rule.
1.4 Proving Properties by Induction using Relations
Earlier, we showed an exponential upper bound on the number of leaves in a tree. For an arbitrary tree, it is possible to include just a few nodes such that there is just one more leaf than the height of the tree. For example, here’s a tree with with a height of 4 and 5 leaves:
For perfect and complete binary trees, however, there must be many more leaves than the height. Let’s start with perfect trees, where there must be exactly 2^h leaves.
To prove this, we need to do induction again, and on the
structure of the tree, but because we know that the tree is
complete, we will have more information at each stage. There
is a catch, however: we will also have more requirements to
be able to use induction. The additional information and the
additional requirement both comes from the definition of the
relation.
To be able to use the relation, we first prove a result that connects the shape of the binary tree (i.e., which rule was used to construct the tree) to the definition of the relation. This lemma is called inversion, and each relation comes with its own inversion lemma.
If
is
then
must be 0.
If
is
then
must be at least
and
and
We can also connect the size of the natural number in the perfect binary tree relation to the possible tree shapes in a similar manner.
If
is 0 then
must be
.
If
is at least 1, then
must be
for some
,
, and
. Furthermore, we know that
and
Equipped with the inversion lemma, we can prove the result about perfect binary trees. There is one subtle point here, however, that comes up in the second case. Specifically, bear in mind that the property we are trying to prove is itself an implication.
Theorem. For any binary tree with l leaves and
height h, if
, then 2^h = l.
This “if ...” plays a role as an
assumption (we get to assume that about the given tree) but
it also plays the role of an obligation, when we try to use
the inductive hypothesis.
Proof. By induction on the structure of the tree.
Case one: the tree is a leaf. In this case, we know l is 1 and h is 0, so specializing the claim, we get 2^0 = 1, which is true.
Case two: the tree is a node, so there are two other trees
and
as well as an integer
such that
is
. Let’s say that the height of
is h_1 and it has l_1 leaves; also the height of
is h_2 and it has l_2 leaves.
Now, as in the previous proof, we can do induction using
and
. But, just as the theorem we are proving requires us to know that
and
are perfect, so too the inductive hypothesis requires us to show that the trees are perfect before we can use it. Here is where the assumption of
and the inversion lemma come in. Since we know that
is true, by inversion we know that
and
. This lets us apply induction, telling us that 2^{h_1} = l_1 and 2^{h_2} = l_2. Take care when using induction in settings like this! the inductive hypothesis matches the entire proof and that proof has an implication in it; we must discharge that assumption in order to use the inductive hypothesis. In this case, it is straightforward, as get can satisfy the assumption directly from inversion, but that won’t always be the case.
From here, we have to do algebraic manipulations to obtain the goal. Let’s start by adding the left- and right-hand sides of the facts we obtained from induction to get 2^{h_1} + 2^{h_2} = l_1 + l_2 Since our original tree
has all of the leaves of
and
and no more, we can simplify the right-hand side to just l: 2^{h_1} + 2^{h_2} = l Because the path length to any leaf is always the same, we know that h_1 = h_2 and thus we know that 2^{h_1} + 2^{h_1} = l Using properties of the exponential function we can simplify the left-hand side: 2^{h_1+1} = l Furthermore, using the fact that h_1 and h_2 are the same and a property of max, we can adjust the left-hand side to 2^{max(h_1,h_2)+1} = l Now, the left-hand side looks like the definition of the height function, so we can replace it with h: 2^h = l which completes the proof.
QED.
Exercise 1. Our proof above used the fact that if two trees
and
are both perfect with the same
, i.e.,
and
, then the heights of the two trees
are the same. It is possible to prove this fact using
induction, but a simpler fact to prove is that if
, then the height of
is
, and it implies the desired lemma. Prove it.
Complete trees do not have a simple characterization for the exact number of leaves, but there still have to be many leaves compared to the example from the start of this section. In particular, we can bound the number of leaves in a complete tree from below; in a complete binary tree of height h, there must be at least 2^{h-1} leaves.
Because the definition of complete trees is more complex, the proof requires a little more sophistication. To start we need an inversion lemma.
If
is
then
must be 0.
- If
is
then either
is at least 1,
, and
, or
is at least 2,
, and
Theorem. For any complete tree with l leaves and
height h, if
, then 2^{h-1} ≤ l.
Proof. By induction on the structure of the tree.
Case one: the tree is a leaf. In this case, we know l is 1 and h is 0, so specializing the claim, we get 2^{-1} ≤ 1, which is true.
Case two: the tree is
. Let’s say we have
has height h_1 and l_1 leaves, and that
has height h_2 and l_2 leaves.
Since the addition of one in the definition of the height and the subtraction of one from the theorem statement cancel out, our goal is that 2^{max(h_1,h_2)} ≤ l_1 + l_2
Inversion tells us we have two subcasesis at least 1,
, and
. In this case, we can use our earlier theorem about perfect trees to conclude that 2^{h_1} ≤ l_1 and induction to conclude that 2^{h_2-1} ≤ l_2. Furthermore, by the proof in the exercise above and the one in the exercise below, we know that the height of
and
are both
and thus equal to each other, so let’s replace the h_2s in the goal with h_1, and we can simplify the use of max. So, our our goal specializes to 2^{h_1} ≤ l_1 + l_2 From the induction on
, and since adding l_2 onto l_1 does not decrease it, we have finished this case.
In the other subcase, we know that
is at least 2,
, and
. As in the previous case, by the results from the two exercises, we know that the max expression in the goal specializes to h_2, meaning our goal becomes 2^{h_2} ≤ l_1 + l_2 We can also use the previous result about perfect trees to conclude that 2^{h_2} = l_2, which gives us the overall result.
Exercise 2. Show that, if we know that , then
the height of
is
.
Exercise 3. Show that, with the specific given definitions
above, for any , if
, then
.
Exercise 4. The converse of the claim in the previous exercise
is false. That is, it is not true that if
then
. Find one of the smallest possible binary trees that is
complete but not perfect and write out the derivation showing it is
complete.
1.5 Contexts
A very powerful concept for defining relations is the idea of a context, which we’ll focus on for this section, returning to how the context is actually useful in later sections.
We write , to decompose a binary
tree into two pieces, a context
, which is a
binary tree with a specific spot called the hole
somewhere inside it, plus another binary tree that is placed
at the hole in the context.
Here is the definition of the set of contexts ; they
use the same grammar-based definition technique as before,
but when we define them, we ensure that the definition is
formulated so that there is exactly one hole, written
, in each element of the set
.
As an example, on the left we have an element of the set
, and on the right we have an ordinary binary tree.
Contexts can also be drawn as trees, but we just write
somewhere at the bottom. Here’s the same context
and tree:
We can combine them by placing the tree in the hole:
As contexts allow us to pick out subtrees of binary trees, they also allow us to pick out subtrees of perfect and complete binary trees. Let’s prove a lemma that says that such subtrees of perfect trees are themselves perfect.
Theorem. For any context and binary tree
, if
then
for
some
.
If
is
, then
is
. So we can choose
to be
to satisfy the goal.
If
is
then we should apply induction on the inner
. To do so, we have to show that
is perfect. For that, we turn to our assumption that
. In this case, however, we know the outer layer of
, which means that
is perfect. Because of what it means to replace a term in the hole, this is the same thing as
and thus we know that
. By inversion on the definition of a perfect binary tree, we know that both
and
are perfect binary trees (with height
), setting us up to use induction, which gives us the desired result.
The final case is that
is
. This proceeds in an analogous manner to the previous case.
QED.
Exercise 5. The converse, namely that if
then
for some
is
false. Give a counterexample.
Exercise 6. Prove that for any context and binary tree
, if
then
for
some
.
1.6 Relations as Computation
Beyond using relations to capture particular desirable subsets of the sets we have defined, we can also use relations to capture a form of computation, where we relate one element of a set to another one.
The interpretation of these relations will be that some small amount of computation has occurred to transform one tree into the other one.
For our binary trees, we’ll use a relation that, step by
step, adds up the values in nodes, removing one node at a
time as it does so. The relation is written
to indicate that we can add two
integers together in
and update the tree by
removing one of the nodes to produce
.
Contexts offer great expressiveness in defining these relations because we can factor out the specific computational step from the place where it occurs inside the tree. Here is the the definition of the relation to illustrate the idea.
This relation has two rules. First, focus on the part inside the hole in the first rule. In the portion before the arrow, it has a node with two children that each have two leaves for children. In the portion after the arrow, we remove the right child, and update the left child’s value to be the sum of the two values in the original children. The second rule is similar, but this time the outer node has one node for a child and a leaf, and we sum the values into the node.
But because the in the rule can be an arbitrary
element of the set
, it might also have been the
example above, meaning that these two trees are also related
by the relation, as are many others.
1.7 A Preservation Proof
The two rules in the definition of the
relation are enough to reduce every
binary tree that has any numbers to one that contains just a
single number. Additionally, it is even possible to reduce
every complete tree to another complete tree. Let’s try to
prove this.
Although this proof differs in specific ways from the kinds of proofs we will be doing for type systems, it is illustrative of the general kind of thinking we need when doing those proofs.
is
is
for some
, or
There is a
such that
for some
and
.
As stated, this theorem is true, but not amenable to induction. Let’s see what goes wrong.
is
, which is one of the cases in the conclusion.
is
for some
,
, and
. Since
, by inversion we know that there are two subcases. Let’s focus on the first one to see where the proof goes wrong. It says that
is at least 1,
, and
. Since we know that
is complete, we can apply induction, which gives us three possibilities. The first one is that
is
and the second one is that
is
, and these are not problematic.
In the third situation that induction gives us, we know that there is asuch that
for some
, and
. At this point, we might wish to say that binary tree
is complete to finish this case. Unfortunately, all we know is that
and we need to know that
. In particular, we do not know that
is the same as
. And, in fact, it might not be! For example, the tree
is complete with height 2, and it is related toby the [one child] rule, but that binary tree has height 1.This does not make the theorem false, however. What has gone wrong is that our inductive hypothesis is not strong enough. That is, the information we learn from induction is weaker than what is actually true. This is one of the essential truths when working with proofs by induction: sometimes we have to prove state a stronger result to be able to get useful facts from induction. Indeed, this becomes a balancing act, as stating a strong result gives us more information from induction, but also means we have to establish harder-to-prove goals.
The next step after, however, demonstrates the other situation. It gives us a complete tree with a height that’s one smaller. And, when that happens, we will actually always get a tree that’s not just complete, but also perfect:
Let’s turn this observation into a more precise statement we can use a lemma to prove the original theorem.
is
is
for some
,
there is a
such that
and
, or
there is a
such that
and
, or
Proof. By induction on .
is
, which is one of the cases in the conclusion.
is
for some
,
, and
. Since
, by inversion we know that there are two subcases.
- In the first subcase from inversion, we know that
is at least 1,
, and
. Since we know that
is complete, we can apply induction, which gives us four possibilities.
The first case is that
is
. We can use inversion on the fact that
to learn that
is zero, and thus
must be 1. Therefore, we know that
which, from inversion tells us that
must be
. Therefore, this case satisfies the second clause in the lemma statement.
The second case from induction is that
is
. This case is similar to the previous case, except that the binary tree is larger; as before, examining the definition of complete binary trees gives us enough information to establish this case; it uses the third case in the lemma statement.
In the third situation that induction gives us, we know that there is a
such that
and
. The
that satisfies the lemma is
and it does so via the third alternative in the lemma statement. To prove that, we have to show that
and
. For the first, we can use the [right] rule in the definition of complete binary trees. For the second, consider the
that was used to conclude
. We can extend it to
and then use that to conclude that
.
In the fourth situation that induction gives us, we know that there is a
that is perfect with height
such that
. The
that satisfies the lemma in this case is
. We can conclude that
by extending the context as we did in the previous case, but this time to
.
To show that
is complete, let’s remind ourselves what we know about the subtrees of
. From inversion at the start of this case, we know that
. We also learned, from induction, that
. We can use these facts together with the [right] rule to conclude that
, which satisfies the third clause of the goal of the lemma.
- In the second subcase from inversion, we know that
is at least 2,
, and
. This time, we can do induction on
, giving us four more subcases:
In the first subcase, we know that
is
. By inversion of
we know that
must be zero. But this is impossible and thus
could not actually have been
in this situation.
Exercise 7. Around half of the the proof above is missing,
specifically the last three subcases of 2b (the second case of inversion in
the case that is
).
Complete the proof.