75 yes---if a compiler can find out that for example a variable |
75 yes---if a compiler can find out that for example a variable |
76 will never be negative and this variable is used as an index |
76 will never be negative and this variable is used as an index |
77 for an array, then the compiler does not need to generate code |
77 for an array, then the compiler does not need to generate code |
78 for an underflow-check. Remember some languages are immune to |
78 for an underflow-check. Remember some languages are immune to |
79 buffer-overflow attacks, but they need to add underflow and |
79 buffer-overflow attacks, but they need to add underflow and |
80 overflow checks everywhere. According to Regehr, an expert in |
80 overflow checks everywhere. According to John Regher, an |
81 the field of compilers, overflow checks can cause 5-10\% |
81 expert in the field of compilers, overflow checks can cause |
82 slowdown, and in some languages even 100\% for tight |
82 5-10\% slowdown, in some languages even 100\% for tight |
83 loops.\footnote{\url{http://blog.regehr.org/archives/1154}} If |
83 loops.\footnote{\url{http://blog.regehr.org/archives/1154}} If |
84 the compiler can omit the underflow check, for example, then |
84 the compiler can omit the underflow check, for example, then |
85 this can potentially drastically speed up the generated code. |
85 this can potentially drastically speed up the generated code. |
86 |
86 |
87 What do programs in our simple programming language look like? |
87 What do programs in our simple programming language look like? |
217 describe an interpreter for our programming language, which |
217 describe an interpreter for our programming language, which |
218 specifies exactly how programs are supposed to be |
218 specifies exactly how programs are supposed to be |
219 run\ldots{}at least we will specify this for all \emph{good} |
219 run\ldots{}at least we will specify this for all \emph{good} |
220 programs. By good programs I mean where all variables are |
220 programs. By good programs I mean where all variables are |
221 initialised, for example. Our interpreter will just crash if |
221 initialised, for example. Our interpreter will just crash if |
222 it cannot find out the value for a variable, in case it is not |
222 it cannot find out the value for a variable when it is not |
223 initialised. Also, we will assume that labels in good programs |
223 initialised. Also, we will assume that labels in good programs |
224 are unique, otherwise our programs will calculate ``garbage''. |
224 are unique, otherwise our programs will calculate ``garbage''. |
225 |
225 |
226 First we will pre-process our programs. This will simplify the |
226 First we will pre-process our programs. This will simplify the |
227 definition of our interpreter later on. We will transform |
227 definition of our interpreter later on. By pre-processing our |
228 programs into \emph{snippets}. Their purpose is to simplify |
228 programs we will transform programs into \emph{snippets}. |
229 the definition of what the interpreter should do in case of a |
229 Their purpose is to simplify the definition of what the |
230 jump. A snippet is a label and all code that comes after the |
230 interpreter should do in case of a jump. A snippet is a label |
231 label. This essentially means a snippet is a \emph{map} from |
231 and all the code that comes after the label. This essentially |
232 labels to code. Given that programs are sequences (or lists) |
232 means a snippet is a \emph{map} from labels to |
233 of statements, we can easily calculate the snippets by just |
233 code.\footnote{Be sure you know what maps are. In a |
234 traversing this sequence and recursively generating the map. |
234 programming context they are often represented as association |
235 Suppose a program is of the general form |
235 list where some data is associated with a key.} |
|
236 |
|
237 Given that programs are sequences (or lists) of statements, we |
|
238 can easily calculate the snippets by just traversing this |
|
239 sequence and recursively generating the map. Suppose a program |
|
240 is of the general form |
236 |
241 |
237 \begin{center} |
242 \begin{center} |
238 $stmt_1\;stmt_2\; \ldots\; stmt_n$ |
243 $stmt_1\;stmt_2\; \ldots\; stmt_n$ |
239 \end{center} |
244 \end{center} |
240 |
245 |
431 \end{tabular} |
436 \end{tabular} |
432 \end{center} |
437 \end{center} |
433 |
438 |
434 \noindent This environment $env$ also acts like a map: it |
439 \noindent This environment $env$ also acts like a map: it |
435 associates variable with their current values. For example |
440 associates variable with their current values. For example |
436 such an environment might look as follows |
441 ߆after evaluating the first two lines in our factorial |
|
442 program, such an environment might look as follows |
437 |
443 |
438 \begin{center} |
444 \begin{center} |
439 \begin{tabular}{ll} |
445 \begin{tabular}{ll} |
440 $\fbox{\texttt{a}} \mapsto \texttt{1}$ & |
446 $\fbox{\texttt{a}} \mapsto \texttt{1}$ & |
441 $\fbox{\texttt{n}} \mapsto \texttt{5}$ |
447 $\fbox{\texttt{n}} \mapsto \texttt{5}$ |
442 \end{tabular} |
448 \end{tabular} |
443 \end{center} |
449 \end{center} |
444 |
450 |
445 \noindent Again I hilighted the keys. In the clause for |
451 \noindent Again I highlighted the keys. In the clause for |
446 variables, we can therefore consult this environment and |
452 variables, we can therefore consult this environment and |
447 return whatever value is currently stored for this variable. |
453 return whatever value is currently stored for this variable. |
448 This is written $env(x)$. If we query this map with $x$ we |
454 This is written $env(x)$. If we query this map with $x$ we |
449 obtain the corresponding number. You might ask what happens if |
455 obtain the corresponding number. You might ask what happens if |
450 an environment does not contain any value for, say, the |
456 an environment does not contain any value for, say, the |
451 variable $x$? Well, then our interpreter just crashes, or will |
457 variable $x$? Well, then our interpreter just ``crashes'', or |
452 raise an exception. In this case we have a ``bad'' program |
458 more precisely will raise an exception. In this case we have a |
453 that tried to use a variable before it was initialised. The |
459 ``bad'' program that tried to use a variable before it was |
454 programmer should not have don this. With the second version |
460 initialised. The programmer should not have done this. In a |
455 of \textit{eval\_exp} we completed our definition for |
461 real programming language we would of course try a bit harder |
456 evaluating expressions. |
462 and for example give an error at compile time, or design our |
|
463 language in such a way that this can never happen. With the |
|
464 second version of \textit{eval\_exp} we completed our |
|
465 definition for evaluating expressions. |
457 |
466 |
458 Next comes the evaluation function for statements. We define |
467 Next comes the evaluation function for statements. We define |
459 this function in such a way that we evaluate a whole sequence |
468 this function in such a way that we recursively evaluate a |
460 of statements. Assume a program $p$ and its preprocessed |
469 whole sequence of statements. Assume a program $p$ (you want |
461 snippets $sn$. Then we define: |
470 to evaluate) and its pre-processed snippets $sn$. Then we can |
|
471 define: |
462 |
472 |
463 \begin{center} |
473 \begin{center} |
464 \begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}} |
474 \begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}} |
465 $\textit{eval\_stmts}([], env)$ & $\dn$ & $env$\\ |
475 $\textit{eval\_stmts}([], env)$ & $\dn$ & $env$\\ |
466 $\textit{eval\_stmts}(\texttt{label:}\;rest, env)$ & |
476 $\textit{eval\_stmts}(\texttt{label:}\;rest, env)$ & |
467 $\dn$ & $\textit{eval\_stmts}(rest, env)$ \\ |
477 $\dn$ & $\textit{eval\_stmts}(rest, env)$ \\ |
468 $\textit{eval\_stmts}(\texttt{x\,:=\,e}\;rest, env)$ & |
478 $\textit{eval\_stmts}(\texttt{x\,:=\,e}\;rest, env)$ & |
469 $\dn$ & $\textit{eval\_stmts}(rest, |
479 $\dn$ & $\textit{eval\_stmts}(rest, |
470 env[x := \textit{eval\_exp}(\texttt{e}, env)])$\\ |
480 env[x := \textit{eval\_exp}(\texttt{e}, env)])$\\ |
|
481 $\textit{eval\_stmts}(\texttt{goto\,lbl}\;rest, env)$ |
|
482 & $\dn$ & $\textit{eval\_stmts}(sn(\texttt{lbl}), env)$\\ |
471 $\textit{eval\_stmts}(\texttt{jmp?\,e\,lbl}\;rest, env)$ |
483 $\textit{eval\_stmts}(\texttt{jmp?\,e\,lbl}\;rest, env)$ |
472 & $\dn$ & $\begin{cases}\begin{array}{@{}l@{\hspace{-12mm}}r@{}} |
484 & $\dn$ & $\begin{cases}\begin{array}{@{}l@{\hspace{-12mm}}r@{}} |
473 \textit{eval\_stmts}(sn(\texttt{lbl}), env)\\ |
485 \textit{eval\_stmts}(sn(\texttt{lbl}), env)\\ |
474 & \text{if}\;\textit{eval\_exp}(\texttt{e}, env) = 1\\ |
486 & \text{if}\;\textit{eval\_exp}(\texttt{e}, env) = 1\\ |
475 \textit{eval\_stmts}(rest, env) & \text{otherwise}\\ |
487 \textit{eval\_stmts}(rest, env) & \text{otherwise}\\ |
476 \end{array}\end{cases}$\\ |
488 \end{array}\end{cases}$ |
477 $\textit{eval\_stmts}(\texttt{goto\,lbl}\;rest, env)$ |
|
478 & $\dn$ & $\textit{eval\_stmts}(sn(\texttt{lbl}), env)$ |
|
479 \end{tabular} |
489 \end{tabular} |
480 \end{center} |
490 \end{center} |
481 |
491 |
482 |
492 \noindent The first clause is for the empty program, or when |
483 |
493 we arrived at the end of the program. In this case we just |
484 |
494 return the environment. The second clause is for when the next |
|
495 statement is a label. That means the program is of the form |
|
496 $\texttt{label:}\;rest$ where the label is some string and |
|
497 $rest$ stands for all following statement. This case is easy, |
|
498 because our evaluation function just discards the label and |
|
499 evaluates the rest of the statements (we already extracted all |
|
500 important information about labels when we pre-processed our |
|
501 programs and generated the snippets). The third clause is for |
|
502 variable assignments. Again we just evaluate the rest for the |
|
503 statements, but with a modified environment---since the |
|
504 variable assignment is supposed to introduce a new variable or |
|
505 change the current value of a variable. For this modification |
|
506 of the environment we first evaluate the expression |
|
507 $\texttt{e}$ using our evaluation function for expressions. |
|
508 This gives us a number. Then we assign this number to the |
|
509 variable $x$ in the environment. This modified environment |
|
510 will be used to evaluate the rest of the program. The fourth |
|
511 clause is for the unconditional jump to a label. That means we |
|
512 have to look up in our snippets map $sn$ what are the next |
|
513 statements for this label are. Therefore we will continue with |
|
514 evaluating, not with the rest of the program, but with the |
|
515 statements stored in the snippets-map under the label |
|
516 $\texttt{lbl}$. The fifth clause for conditional jumps is |
|
517 similar, but in order to make the jump we first need to |
|
518 evaluate the expression $\texttt{e}$ in order to find out |
|
519 whether it is $1$. If yes, we jump, otherwise we just continue |
|
520 with evaluating the rest of the program. |
|
521 |
|
522 Our interpreter works in two stages: First we pre-process our |
|
523 program generating the snippets map $sn$, say. Second we call |
|
524 the evaluation function with the default entry point and the |
|
525 empty environment: |
|
526 |
|
527 \begin{center} |
|
528 $\textit{eval\_stmts}(sn(\pcode{""}), \varnothing)$ |
|
529 \end{center} |
|
530 |
|
531 \noindent It is interesting to not that our interpreter when |
|
532 it comes to the end of the program returns an environment. Our |
|
533 programming language does not contain any constructs for input |
|
534 and output. Therefore this environment is the only effect we |
|
535 can observe when running the program (apart from that our |
|
536 interpreter might need some time before finishing the |
|
537 evaluation of the program) |
485 |
538 |
486 |
539 |
487 |
540 |
488 |
541 |
489 \end{document} |
542 \end{document} |