728 (domain:\texttt{kcl}), |
728 (domain:\texttt{kcl}), |
729 (top\_level:\texttt{ac.uk})]$ |
729 (top\_level:\texttt{ac.uk})]$ |
730 \end{center} |
730 \end{center} |
731 |
731 |
732 Recall that we want to lex a little programming language, |
732 Recall that we want to lex a little programming language, |
733 called the \emph{While}-language. The main keywords in this |
733 called the \emph{While}-language. A simple program in this |
734 language are \pcode{while}, \pcode{if}, \pcode{then} and |
734 language is shown in Figure~\ref{while}. The main keywords in |
735 \pcode{else}. As usual we have identifiers, operators, numbers |
735 this language are \pcode{while}, \pcode{if}, \pcode{then} and |
736 and so on. For this we would need to design the corresponding |
736 \pcode{else}. As usual we have syntactic categories for |
737 regular expressions to recognise these syntactic categories. I |
737 identifiers, operators, numbers and so on. For this we would |
738 let you do this design task. Having these regular expressions |
738 need to design the corresponding regular expressions to |
739 at our disposal we can form the regular expression |
739 recognise these syntactic categories. I let you do this design |
740 |
740 task. Having these regular expressions at our disposal, we can |
741 \begin{center} |
741 form the regular expression |
742 \begin{tabular}{rcl} |
742 |
743 \textit{WhileRegs} & $\dn$ & (($k$, KEYWORD) +\\ |
743 \begin{figure}[t] |
744 & & ($i$, ID) +\\ |
744 \begin{center} |
745 & & ($o$, OP) + \\ |
745 \mbox{\lstinputlisting[language=while]{../progs/fib.while}} |
746 & & ($n$, NUM) + \\ |
746 \end{center} |
747 & & ($s$, SEMI) + \\ |
747 \caption{The Fibonacci program in the While-language.\label{while}} |
748 & & ($p$, (LPAREN + RPAREN)) +\\ |
748 \end{figure} |
749 & & ($b$, (BEGIN + END)) + \\ |
749 |
750 & & ($w$, WHITESPACE))$^*$ |
750 \begin{center} |
751 \end{tabular} |
751 $\textit{WhileRegs} \;\dn\; |
752 \end{center} |
752 \left(\begin{array}{l} |
753 |
753 (k, KeyWords)\; +\\ |
|
754 (i, Ids)\;+\\ |
|
755 (o, Ops)\;+ \\ |
|
756 (n, Nums)\;+ \\ |
|
757 (s, Semis)\;+ \\ |
|
758 (p, (LParens + RParens))\;+\\ |
|
759 (b, (Begin + End))\;+ \\ |
|
760 (w, WhiteSpacess) |
|
761 \end{array}\right)\LARGE^\mbox{\LARGE*}$ |
|
762 \end{center} |
|
763 |
|
764 \noindent and ask the algorithm by Sulzmann \& Lu to lex, say |
|
765 the following string |
|
766 |
|
767 \begin{center}\large |
|
768 \code{"if true then then 42 else +"} |
|
769 \end{center} |
|
770 |
|
771 \noindent |
|
772 By using the records and extracting the environment, the |
|
773 result is the following list: |
|
774 |
|
775 \begin{center}\tt |
|
776 \begin{tabular}{l} |
|
777 KEYWORD(if),\\ |
|
778 WHITESPACE,\\ |
|
779 IDENT(true),\\ |
|
780 WHITESPACE,\\ |
|
781 KEYWORD(then),\\ |
|
782 WHITESPACE,\\ |
|
783 KEYWORD(then),\\ |
|
784 WHITESPACE,\\ |
|
785 NUM(42),\\ |
|
786 WHITESPACE,\\ |
|
787 KEYWORD(else),\\ |
|
788 WHITESPACE,\\ |
|
789 OP(+) |
|
790 \end{tabular} |
|
791 \end{center} |
|
792 |
|
793 \noindent Typically we are not interested in the whitespaces |
|
794 and comments and would filter them out: this gives |
|
795 |
|
796 \begin{center}\tt |
|
797 \begin{tabular}{l} |
|
798 KEYWORD(if),\\ |
|
799 IDENT(true),\\ |
|
800 KEYWORD(then),\\ |
|
801 KEYWORD(then),\\ |
|
802 NUM(42),\\ |
|
803 KEYWORD(else),\\ |
|
804 OP(+) |
|
805 \end{tabular} |
|
806 \end{center} |
|
807 |
|
808 \noindent |
|
809 which will be the input for the next phase of our compiler. |
754 |
810 |
755 \end{document} |
811 \end{document} |
756 |
812 |
757 %%% Local Variables: |
813 %%% Local Variables: |
758 %%% mode: latex |
814 %%% mode: latex |