180 complexity is really beside the point here. Furthermore, real |
180 complexity is really beside the point here. Furthermore, real |
181 machine code has many instructions for manipulating memory. We |
181 machine code has many instructions for manipulating memory. We |
182 do not have this at all. This is actually a more serious |
182 do not have this at all. This is actually a more serious |
183 simplification because we assume numbers to be arbitrary small |
183 simplification because we assume numbers to be arbitrary small |
184 or large, which is not the case with real machine code. In |
184 or large, which is not the case with real machine code. In |
185 real code basic number formats have a range and might |
185 real machine code, basic number formats have a range and might |
186 over-flow or under-flow from this range. Also the number of |
186 over-flow or under-flow this range. Also the number of |
187 variables in our programs is potentially unlimited, while |
187 variables in our programs is potentially unlimited, while |
188 memory in an actual computer, of course, is always limited |
188 memory in an actual computer, of course, is always limited. To |
189 somehow on any actual. To sum up, our language might look |
189 sum up, our language might look ridiculously simple, but it is |
190 ridiculously simple, but it is not far removed from |
190 not too far removed from practically relevant issues. |
191 practically relevant issues. |
|
192 |
191 |
193 |
192 |
194 \subsubsection*{An Interpreter} |
193 \subsubsection*{An Interpreter} |
195 |
194 |
196 Designing a language is like playing god: you can say what |
195 Designing a language is like playing god: you can say what |
264 \noindent In the first clause we just return the empty map for |
263 \noindent In the first clause we just return the empty map for |
265 the program that does not contain any statement. In the second |
264 the program that does not contain any statement. In the second |
266 clause, we have to distinguish the case where the first |
265 clause, we have to distinguish the case where the first |
267 statement is a label or not. As said before, if not, then we |
266 statement is a label or not. As said before, if not, then we |
268 just ``throw away'' the label and recursively calculate the |
267 just ``throw away'' the label and recursively calculate the |
269 snippets for the rest of the program. If yes, then we do the |
268 snippets for the rest of the program (the otherwise clause). |
270 same, but also update the map so that $label$ now points to |
269 If yes, then we do the same, but also update the map so that |
271 the rest of the statements. There is one small problem we need |
270 $label$ now points to the rest of the statements (the if |
272 to overcome: our two programs shown so far have no label as |
271 clause). This looks all realtively straightforward, but there |
273 \emph{entry point}---that is where the execution is supposed |
272 is one small problem we need to overcome: our two programs |
274 to start. We usually assume that the first statement will be |
273 shown so far have no label as \emph{entry point}---that is |
275 run first. To make this the default, it is convenient if we |
274 where the execution is supposed to start. We usually assume |
276 add to all our programs a default label, say \pcode{""} (the |
275 that the first statement will be run first. To make this the |
277 empty string). With this we can define our pre-processing of |
276 default, it is convenient if we add to all our programs a |
278 programs as follows |
277 default label, say \pcode{""} (the empty string). With this we |
|
278 can define our pre-processing of programs as follows |
279 |
279 |
280 \begin{center} |
280 \begin{center} |
281 $\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$ |
281 $\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$ |
282 \end{center} |
282 \end{center} |
283 |
283 |
364 0 & \text{otherwise} |
364 0 & \text{otherwise} |
365 \end{cases}$ |
365 \end{cases}$ |
366 \end{tabular} |
366 \end{tabular} |
367 \end{center} |
367 \end{center} |
368 |
368 |
369 \noindent While this should look all relatively |
369 \noindent While this should look all rather intuitive`, still |
370 straightforward, still be very careful. There is a subtlety |
370 be very careful. There is a subtlety which can be easily |
371 which can be easily overlooked: The function |
371 overlooked: The function \textit{eval\_exp} takes an |
372 \textit{eval\_exp} takes an expression of our programming |
372 expression of our programming language as input and returns a |
373 language as input and returns a number as output. Therefore |
373 number as output. Therefore whenever we encounter a number in |
374 whenever we have a number in our program, we just return this |
374 our program, we just return this number---this is defined in |
375 number---this is defined in the first clause above. Whenever |
375 the first clause above. Whenever we encounter an addition, |
376 we encounter an addition, well then we first evaluate the |
376 well then we first evaluate the left-hand side |
377 left-hand side $\texttt{e}_\texttt{1}$ of the addition (this |
377 $\texttt{e}_\texttt{1}$ of the addition (this will give a |
378 will give a number), then evaluate the right-hand side |
378 number), then evaluate the right-hand side |
379 $\texttt{e}_\texttt{2}$ (this gives another number), and |
379 $\texttt{e}_\texttt{2}$ (this gives another number), and |
380 finally add both numbers together. Here is the subtlety: on |
380 finally add both numbers together. Here is the subtlety: on |
381 the left-hand side of the $\dn$ we have a \texttt{+} (in the |
381 the left-hand side of the $\dn$ we have a \texttt{+} (in the |
382 teletype font) which is the symbol for addition in our |
382 teletype font) which is the symbol for addition in our |
383 programming language. On the right-hand side we have $+$ which |
383 programming language. On the right-hand side we have $+$ which |
388 implementation of our interpreter, the mathematical operation |
388 implementation of our interpreter, the mathematical operation |
389 will be the function for addition from the programming |
389 will be the function for addition from the programming |
390 language in which we \underline{\smash{implement}} our |
390 language in which we \underline{\smash{implement}} our |
391 interpreter. While the \texttt{+} is just a symbol that is |
391 interpreter. While the \texttt{+} is just a symbol that is |
392 used in our programming language. Clearly we have to use a |
392 used in our programming language. Clearly we have to use a |
393 symbol that is a good mnemonic for addition otherwise we will |
393 symbol that is a good mnemonic for addition, otherwise we will |
394 confuse the programmers working with our language. Therefore |
394 confuse the programmers working with our language. Therefore |
395 we use $\texttt{+}$. A similar choice is made for times in the |
395 we use $\texttt{+}$. A similar choice is made for times in the |
396 third clause and equality in the fourth clause. Remember I |
396 third clause and equality in the fourth clause. |
397 wrote at the beginning of this section about being god when |
397 |
398 designing a programming language. You can see this here: we |
398 Remember I wrote at the beginning of this section about being |
399 need to give meaning to symbols. |
399 god when designing a programming language. You can see this |
400 |
400 here: we need to give meaning to symbols. At the moment |
401 At the moment however, we are a poor fallible god. Look again |
401 however, we are a poor fallible god. Look again at the grammar |
402 at the grammar of our programming language and our definition. |
402 of our programming language and our definition. Clearly, an |
403 Clearly, an expression can contain variables. So far we have |
403 expression can contain variables. So far we have ignored them. |
404 ignored them. What should our interpreter do with variables? |
404 What should our interpreter do with variables? They might |
405 They might change during the evaluation of a program. For |
405 change during the evaluation of a program. For example the |
406 example the variable \pcode{n} in the factorial program counts |
406 variable \pcode{n} in the factorial program counts down from 5 |
407 down from 5 up to 0. How can we improve our definition above |
407 down to 0. How can we improve our definition above to give also |
408 to give also an answer whenever our interpreter encounters a |
408 an answer whenever our interpreter encounters a variable in an |
409 variable in an expression? The solution is to add an |
409 expression? The solution is to add an \emph{environment}, |
410 \emph{environment}, written $env$, as an additional input |
410 written $env$, as an additional input argument to our |
411 argument to our \textit{eval\_exp} function. |
411 \textit{eval\_exp} function. |
412 |
412 |
413 \begin{center} |
413 \begin{center} |
414 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} |
414 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} |
415 $\textit{eval\_exp}(\texttt{n}, env)$ & $\dn$ & $n$ |
415 $\textit{eval\_exp}(\texttt{n}, env)$ & $\dn$ & $n$ |
416 \qquad\text{if}\; \texttt{n}\; \text{is a number like} |
416 \qquad\text{if}\; \texttt{n}\; \text{is a number like} |
434 $\textit{eval\_exp}(\texttt{x}, env)$ & $\dn$ & $env(x)$ |
434 $\textit{eval\_exp}(\texttt{x}, env)$ & $\dn$ & $env(x)$ |
435 \end{tabular} |
435 \end{tabular} |
436 \end{center} |
436 \end{center} |
437 |
437 |
438 \noindent This environment $env$ also acts like a map: it |
438 \noindent This environment $env$ also acts like a map: it |
439 associates variable with their current values. For example |
439 associates variables with their current values. For example |
440 ߆after evaluating the first two lines in our factorial |
440 after evaluating the first two lines in our factorial |
441 program, such an environment might look as follows |
441 program, such an environment might look as follows |
442 |
442 |
443 \begin{center} |
443 \begin{center} |
444 \begin{tabular}{ll} |
444 \begin{tabular}{ll} |
445 $\fbox{\texttt{a}} \mapsto \texttt{1}$ & |
445 $\fbox{\texttt{a}} \mapsto \texttt{1}$ & |
448 \end{center} |
448 \end{center} |
449 |
449 |
450 \noindent Again I highlighted the keys. In the clause for |
450 \noindent Again I highlighted the keys. In the clause for |
451 variables, we can therefore consult this environment and |
451 variables, we can therefore consult this environment and |
452 return whatever value is currently stored for this variable. |
452 return whatever value is currently stored for this variable. |
453 This is written $env(x)$. If we query this map with $x$ we |
453 This is written $env(x)$---meaning we query this map with $x$ |
454 obtain the corresponding number. You might ask what happens if |
454 we obtain the corresponding number. You might ask what happens |
455 an environment does not contain any value for, say, the |
455 if an environment does not contain any value for, say, the |
456 variable $x$? Well, then our interpreter just ``crashes'', or |
456 variable $x$? Well, then our interpreter just ``crashes'', or |
457 more precisely will raise an exception. In this case we have a |
457 more precisely will raise an exception. In this case we have a |
458 ``bad'' program that tried to use a variable before it was |
458 ``bad'' program that tried to use a variable before it was |
459 initialised. The programmer should not have done this. In a |
459 initialised. The programmer should not have done this. In a |
460 real programming language we would of course try a bit harder |
460 real programming language, we would of course try a bit harder |
461 and for example give an error at compile time, or design our |
461 and for example give an error at compile time, or design our |
462 language in such a way that this can never happen. With the |
462 language in such a way that this can never happen. With the |
463 second version of \textit{eval\_exp} we completed our |
463 second version of \textit{eval\_exp} we completed our |
464 definition for evaluating expressions. |
464 definition for evaluating expressions. |
465 |
465 |
491 \noindent The first clause is for the empty program, or when |
491 \noindent The first clause is for the empty program, or when |
492 we arrived at the end of the program. In this case we just |
492 we arrived at the end of the program. In this case we just |
493 return the environment. The second clause is for when the next |
493 return the environment. The second clause is for when the next |
494 statement is a label. That means the program is of the form |
494 statement is a label. That means the program is of the form |
495 $\texttt{label:}\;rest$ where the label is some string and |
495 $\texttt{label:}\;rest$ where the label is some string and |
496 $rest$ stands for all following statement. This case is easy, |
496 $rest$ stands for all following statements. This case is easy, |
497 because our evaluation function just discards the label and |
497 because our evaluation function just discards the label and |
498 evaluates the rest of the statements (we already extracted all |
498 evaluates the rest of the statements (we already extracted all |
499 important information about labels when we pre-processed our |
499 important information about labels when we pre-processed our |
500 programs and generated the snippets). The third clause is for |
500 programs and generated the snippets). The third clause is for |
501 variable assignments. Again we just evaluate the rest for the |
501 variable assignments. Again we just evaluate the rest for the |
505 of the environment we first evaluate the expression |
505 of the environment we first evaluate the expression |
506 $\texttt{e}$ using our evaluation function for expressions. |
506 $\texttt{e}$ using our evaluation function for expressions. |
507 This gives us a number. Then we assign this number to the |
507 This gives us a number. Then we assign this number to the |
508 variable $x$ in the environment. This modified environment |
508 variable $x$ in the environment. This modified environment |
509 will be used to evaluate the rest of the program. The fourth |
509 will be used to evaluate the rest of the program. The fourth |
510 clause is for the unconditional jump to a label. That means we |
510 clause is for the unconditional jump to a label, called |
511 have to look up in our snippets map $sn$ what are the next |
511 \texttt{lbl}. That means we have to look up in our snippets |
512 statements for this label are. Therefore we will continue with |
512 map $sn$ what are the next statements for this label. |
513 evaluating, not with the rest of the program, but with the |
513 Therefore we will continue with evaluating, not with the rest |
514 statements stored in the snippets-map under the label |
514 of the program, but with the statements stored in the |
515 $\texttt{lbl}$. The fifth clause for conditional jumps is |
515 snippets-map under the label $\texttt{lbl}$. The fifth clause |
516 similar, but in order to make the jump we first need to |
516 for conditional jumps is similar, but to decide whether to |
517 evaluate the expression $\texttt{e}$ in order to find out |
517 make the jump we first need to evaluate the expression |
518 whether it is $1$. If yes, we jump, otherwise we just continue |
518 $\texttt{e}$ in order to find out whether it is $1$. If yes, |
519 with evaluating the rest of the program. |
519 we jump, otherwise we just continue with evaluating the rest |
|
520 of the program. |
520 |
521 |
521 Our interpreter works in two stages: First we pre-process our |
522 Our interpreter works in two stages: First we pre-process our |
522 program generating the snippets map $sn$, say. Second we call |
523 program generating the snippets map $sn$, say. Second we call |
523 the evaluation function with the default entry point and the |
524 the evaluation function with the default entry point and the |
524 empty environment: |
525 empty environment: |
543 $\fbox{\texttt{n}} \mapsto \texttt{0}$ |
544 $\fbox{\texttt{n}} \mapsto \texttt{0}$ |
544 \end{tabular} |
545 \end{tabular} |
545 \end{center} |
546 \end{center} |
546 |
547 |
547 \noindent While the discussion above should have illustrated |
548 \noindent While the discussion above should have illustrated |
548 the ideas, in order to do some serious calculation we clearly |
549 the ideas, in order to do some serious calculations, we clearly |
549 need to implement the interpreter. |
550 need to implement the interpreter. |
550 |
551 |
551 |
552 |
552 \subsubsection*{Code of the Interpreter} |
553 \subsubsection*{Scala Code for the Interpreter} |
553 |
554 |
554 Functional programming languages are very convenient for |
555 Functional programming languages are very convenient for |
555 implementations of interpreters. A convenient choice for a |
556 implementations of interpreters. A good choice for a |
556 functional programming language is Scala. This is a |
557 functional programming language is Scala, a programming |
557 programming language that combines functional and |
558 language that combines functional and object-oriented |
558 object-oriented programming-styles. It has received in the |
559 programming-styles. It has received in the last five years or |
559 last five years or so quite a bit of attention. One reason for |
560 so quite a bit of attention. One reason for this attention is |
560 this attention is that, like the Java programming language, |
561 that, like the Java programming language, Scala compiles to |
561 Scala compiles to the Java Virtual Machine (JVM) and therefore |
562 the Java Virtual Machine (JVM) and therefore Scala programs |
562 Scala programs can run under MacOSX, Linux and |
563 can run under MacOSX, Linux and Windows.\footnote{There are |
563 Windows.\footnote{There are also experimental backends for |
564 also experimental backends for Android and JavaScript.} Unlike |
564 Android and JavaScript.} Unlike Java, however, Scala often |
565 Java, however, Scala often allows programmers to write very |
565 allows programmers to write very concise and elegant code. |
566 concise and elegant code. Some therefore say Scala is the much |
566 Some therefore say Scala is the much better Java. A number of |
567 better Java. A number of companies, The Guardian, Twitter, |
567 companies, The Guardian, Twitter, Coursera, FourSquare, |
568 Coursera, FourSquare, LinkedIn to name a few, either use Scala |
568 LinkedIn to name a few, either use Scala exclusively in |
569 exclusively in production code, or at least to some |
569 production code, or at least to some substantial degree. If |
570 substantial degree. If you want to try out Scala yourself, the |
570 you want to try out Scala yourself, the Scala compiler can be |
571 Scala compiler can be downloaded from |
571 downloaded from |
|
572 |
572 |
573 \begin{quote} |
573 \begin{quote} |
574 \url{http://www.scala-lang.org} |
574 \url{http://www.scala-lang.org} |
575 \end{quote} |
575 \end{quote} |
576 |
576 |