67 \item parentheses are \texttt{(}, \texttt{\{}, \texttt{)} and \texttt{\}} |
67 \item parentheses are \texttt{(}, \texttt{\{}, \texttt{)} and \texttt{\}} |
68 \item there are semicolons \texttt{;} |
68 \item there are semicolons \texttt{;} |
69 \item whitespaces are either \texttt{" "} (one or more) or \texttt{$\backslash$n} |
69 \item whitespaces are either \texttt{" "} (one or more) or \texttt{$\backslash$n} |
70 \item identifiers are letters followed by underscores \texttt{\_\!\_}, letters |
70 \item identifiers are letters followed by underscores \texttt{\_\!\_}, letters |
71 or digits |
71 or digits |
72 \item numbers are \texttt{0}, \text{1}, \ldots |
72 \item numbers are \pcode{0}, \pcode{1}, \ldots and so on; give |
|
73 a regular expression that can recognise \pcode{0}, but not numbers |
|
74 with leading zeroes, such as \pcode{001} |
73 \end{enumerate} |
75 \end{enumerate} |
74 |
76 |
75 \noindent |
77 \noindent |
76 You can use the basic regular expressions |
78 You can use the basic regular expressions |
77 |
79 |
89 $r^?$ & optional $r$\\ |
91 $r^?$ & optional $r$\\ |
90 $r^{\{n\}}$ & n-times $r$\\ |
92 $r^{\{n\}}$ & n-times $r$\\ |
91 \end{tabular} |
93 \end{tabular} |
92 \end{center} |
94 \end{center} |
93 |
95 |
94 \noindent |
96 \noindent Try to design your regular expressions to be as |
95 Try to design your regular expressions to be as small as possible. |
97 small as possible. For example you should use character ranges |
|
98 for identifiers and numbers. |
96 |
99 |
97 \subsection*{Question 2 (marked with 3\%)} |
100 \subsection*{Question 2 (marked with 3\%)} |
98 |
101 |
99 Implement the Sulzmann \& Lu tokeniser from the lectures. For |
102 Implement the Sulzmann \& Lu tokeniser from the lectures. For |
100 this you need to implement the functions $nullable$ and $der$ |
103 this you need to implement the functions $nullable$ and $der$ |
115 $inj\, (r^{\{n\}})\,c\,\ldots$ & $\dn$ & $?$\\ |
118 $inj\, (r^{\{n\}})\,c\,\ldots$ & $\dn$ & $?$\\ |
116 \end{tabular} |
119 \end{tabular} |
117 \end{center} |
120 \end{center} |
118 |
121 |
119 \noindent where $inj$ takes three arguments: a regular |
122 \noindent where $inj$ takes three arguments: a regular |
120 expression, a character and a value. Also add the record |
123 expression, a character and a value. Test your lexer code |
121 regular expression from the lectures to your tokeniser and |
124 with at least the two small examples below: |
122 implement a function, say \pcode{env}, that returns all |
125 |
123 assignments from a value (such that you can extract easily the |
126 \begin{center} |
124 tokens from a value).\medskip |
127 \begin{tabular}{ll} |
|
128 regex: & string:\smallskip\\ |
|
129 $a^{\{3\}}$ & $aaa$\\ |
|
130 $(a + \epsilon)^{\{3\}}$ & $aa$ |
|
131 \end{tabular} |
|
132 \end{center} |
|
133 |
|
134 |
|
135 \noindent Both strings should be sucessfully lexed by the |
|
136 respective regular expression, that means the lexer returns |
|
137 in both examples a value. |
|
138 |
|
139 |
|
140 Also add the record regular expression from the |
|
141 lectures to your tokeniser and implement a function, say |
|
142 \pcode{env}, that returns all assignments from a value (such |
|
143 that you can extract easily the tokens from a value).\medskip |
125 |
144 |
126 \noindent |
145 \noindent |
127 Finally give the tokens for your regular expressions from Q1 and the |
146 Finally give the tokens for your regular expressions from Q1 and the |
128 string |
147 string |
129 |
148 |