Admin note: You can choose any programming language you want. If you don't have a pet PL that you love, then probably best to stick with PLAI or Racket (or some other Lisp or Lisp derivative) because you'd need to implement the 'read' function. Specifically, each phase of the compiler (ie, each homework assignment) will consume and print out sexpressions. These are easy to write out and read in any lanugage, but trivial in Racket. Here is the language you'll be compiling: p ::= ((i ...) (label i ...) ...) A 'p' is a sequence of functions, the first of which is unnamed (ie has no initial label). That one is the (body of the) main function for the program. Each function's body is just a series of instructions. Here are the instructions: i ::= (x <- s) ;; assign to a register | (x <- (mem x n4)) ;; read from memory @ x+n4 | ((mem x n4) <- s) ;; update memory @ x+n4 | (x aop= t) ;; update x with an arith op and s. | (x sop= sx) ;; update x with a shifting op and sx. | (x sop= num) ;; update x with a shifting op and num. | (cx <- t cmp t) ;; save result of a comparison | label ;; target of a jump | (goto label) ;; unconditional jump | (cjump t cmp t label label) ;; conditional jump | (call u) ;; call a function [see below] | (tail-call u) ;; tail call a function [see below] | (return) ;; return from a function ;; three calls into the runtime system: ;; one to print a value ;; (returns the encoded version of 0, which gets stored in eax) | (eax <- (print t)) ;; one to allocate & initialize some space ;; (returns the location where the allocated memory begins) | (eax <- (allocate t t)) ;; one to signal a runtime error on array dereference and ;; terminate the program (doesn't actually return) | (eax <- (array-error t t)) And a few helper non-terminals: s ::= x | num | label u ::= x | label t ::= x | num x ::= cx | esi | edi | ebp | esp cx ::= eax | ecx | edx | ebx sx ::= ecx aop= ::= += | -= | *= | &= sop= ::= <<= | >>= cmp ::= < | <= | = label ::= sequence of alpha-numeric characters or underscore, but starting with a colon and a non-digit, i.e., matching this regexp: #rx"^:[a-zA-Z_][a-zA-Z_0-9]*$" num ::= integer between (inclusive) -2^31 and (2^31)-1 n4 ::= integer between (inclusive) -2^31 and (2^31)-1, that is divisble by 4 ;; eax, edx, and ecx are the argument registers (in that order) ;; esi, and edi are callee / function save ;; eax, edx, ecx, and ebx caller / application save Okay, here's an implementation of fib in this language: (((esi <- 89) ;; loop termination variable; this is 44 in reality (eax <- (allocate esi 3)) (edi <- eax) ;; save this register for the array base pointer (ebx <- 5) ;; loop index variable, starts at 2 (encoded as 5). :loop (cjump ebx < esi :keep_going :done) :keep_going ;; compute a[ebx-2], putting it into edx (edx <- ebx) (edx -= 4) ;; - decrement by 2. (edx -= 1) ;; - convert a 2a+1 number into (edx *= 2) ;; 4a for the index calculation (edx += edi) ;; - add in the base pointer (edx += 4) ;; - skip past the size word (edx <- (mem edx 0)) ;; compute a[ebx-1], putting it into ecx (ecx <- ebx) (ecx -= 2) ;; - decrement by 1. (ecx -= 1) ;; - convert a 2a+1 number into (ecx *= 2) ;; 4a for the index calculation (ecx += edi) ;; - add in the base pointer (ecx += 4) ;; - skip past the size word (ecx <- (mem ecx 0)) ;; put the sum of a[ebx-2]+a[ebx-1] into edx (edx += ecx) (edx -= 1) ;; number conversion... ;; compute the location of a[ebx], putting it into ecx (ecx <- ebx) (ecx -= 1) ;; - convert a 2a+1 number into (ecx *= 2) ;; 4a for the index calculation (ecx += 4) ;; - skip past the size word (ecx += edi) ;; - add in the base pointer ;; store the sum a[ebx-2]+a[ebx-1] into a[ebx] ((mem ecx 0) <- edx) ;; increment the loop counter (ebx += 2) ;; this is what adding by one is. ;; go back to the loop test (goto :loop) :done (eax <- (print edi)))) Here's one that an optimizing compiler might produce from the same source code as the one above (i.e. where the loop contains something like a[x] = a[x-1]+a[x-2]): (((esi <- 89) ;; size of the array to create (eax <- (allocate esi 3)) (edi <- eax) ;; save this register for the array base pointer (ebx <- edi) ;; - loop index variable is now a pointer into (ebx += 12) ;; the array; it starts at the 2nd word. (esi *= 2) ;; - convert esi so it is the location of (esi += 2) ;; the word just past the end of (esi += edi) ;; the array :loop (cjump ebx < esi :keep_going :done) :keep_going (edx <- (mem ebx -8)) (ecx <- (mem ebx -4)) (edx += ecx) (edx -= 1) ((mem ebx 0) <- edx) (ebx += 4) ;; go back to the loop test (goto :loop) :done (eax <- (print edi)))) Note that we can do even better here: we don't really ever need to look at the array's contents. We can just save the results from the last two iterations of the array in registers and use those. A compiler typically won't be able to do that kind of optimization, though, because it won't be sure that the contents of the array is not changing. But it will, generally speaking, be able to generate that code if we had used local variables in the original source program instead of array references. ===== Numbers ===== The numbers in that program looked strange. Intead of using 44, we used 89. The problem stems from the fact that we want to be able to print out the values and we do not have any type information available in the source programming language. So, what we do instead is tag the values so we can tell what they are. We have two kinds of values: numbers and arrays. Since pointers have to be a multiple of 4 (and thus also a multiple of 2), we know that no odd number will be a pointer. Thus, we can encode an integer a as the number 2a+1. This has lots of implications for our compiler later on, but in this language, the only that it affects is printing and allocation (ie, our runtime system). In other words, when you see: (eax += ebx) you just turn that into add %ebx, %eax and we expect an earlier phase in the compiler to have turned an expression like: (let ((x (+ y z))) ...) into this series of instructions (or something like this): (x <- y) (x += z) (x -= 1) On the other hand, when you see (eax <- (print ebx)) The printer will decode the contents of ebx before printing it. Similarly, (eax <- (allocate ebx 3)) means that ebx should be decoded (and must decode into a number!) to get the size of the array, and the array should be initialized with the number 3 (but that's really a bunch of "1"s).