13 \usetikzlibrary{shadows} |
13 \usetikzlibrary{shadows} |
14 \usetikzlibrary{positioning} |
14 \usetikzlibrary{positioning} |
15 \usetikzlibrary{calc} |
15 \usetikzlibrary{calc} |
16 \usetikzlibrary{fit} |
16 \usetikzlibrary{fit} |
17 \usetikzlibrary{backgrounds} |
17 \usetikzlibrary{backgrounds} |
|
18 \usepackage{fontspec} |
|
19 \setmonofont{Consolas} |
18 |
20 |
19 \newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}% |
21 \newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}% |
20 |
22 |
21 \definecolor{javared}{rgb}{0.6,0,0} % for strings |
23 \definecolor{javared}{rgb}{0.6,0,0} % for strings |
22 \definecolor{javagreen}{rgb}{0.25,0.5,0.35} % comments |
24 \definecolor{javagreen}{rgb}{0.25,0.5,0.35} % comments |
38 morestring=[b]", |
40 morestring=[b]", |
39 morestring=[b]', |
41 morestring=[b]', |
40 morestring=[b]""" |
42 morestring=[b]""" |
41 } |
43 } |
42 |
44 |
|
45 \lstdefinelanguage{while}{ |
|
46 morekeywords={while, if, then. else, read, write}, |
|
47 otherkeywords={=>,<-,<\%,<:,>:,\#,@}, |
|
48 sensitive=true, |
|
49 morecomment=[l]{//}, |
|
50 morecomment=[n]{/*}{*/}, |
|
51 morestring=[b]", |
|
52 morestring=[b]', |
|
53 morestring=[b]""" |
|
54 } |
|
55 |
|
56 |
43 \lstset{language=Scala, |
57 \lstset{language=Scala, |
44 basicstyle=\ttfamily, |
58 basicstyle=\ttfamily, |
45 keywordstyle=\color{javapurple}\bfseries, |
59 keywordstyle=\color{javapurple}\bfseries, |
46 stringstyle=\color{javagreen}, |
60 stringstyle=\color{javagreen}, |
47 commentstyle=\color{javagreen}, |
61 commentstyle=\color{javagreen}, |
52 numbersep=10pt, |
66 numbersep=10pt, |
53 tabsize=2, |
67 tabsize=2, |
54 showspaces=false, |
68 showspaces=false, |
55 showstringspaces=false} |
69 showstringspaces=false} |
56 |
70 |
|
71 \newcommand\grid[1]{% |
|
72 \begin{tikzpicture}[baseline=(char.base)] |
|
73 \path[use as bounding box] |
|
74 (0,0) rectangle (1em,1em); |
|
75 \draw[red!50, fill=red!20] |
|
76 (0,0) rectangle (1em,1em); |
|
77 \node[inner sep=1pt,anchor=base west] |
|
78 (char) at (0em,\gridraiseamount) {#1}; |
|
79 \end{tikzpicture}} |
|
80 \newcommand\gridraiseamount{0.12em} |
|
81 |
|
82 \makeatletter |
|
83 \newcommand\Grid[1]{% |
|
84 \@tfor\z:=#1\do{\grid{\z}}} |
|
85 \makeatother |
|
86 |
|
87 \newcommand\Vspace[1][.3em]{% |
|
88 \mbox{\kern.06em\vrule height.3ex}% |
|
89 \vbox{\hrule width#1}% |
|
90 \hbox{\vrule height.3ex}} |
|
91 |
|
92 \def\VS{\Vspace[0.6em]} |
|
93 |
57 \begin{document} |
94 \begin{document} |
58 |
95 |
59 \section*{Handout 5} |
96 \section*{Handout 5} |
60 |
97 |
61 Whenever you want to design a programming language or implement a compiler for an |
98 Whenever you want to design a programming language or implement a compiler for an |
62 existing language, the first task is to fix the basic ``words'' of the language, like what are the k |
99 existing language, the first task is to fix the basic ``words'' of the language, like what are the |
63 eywords, what are permitted identifiers and so on. One convenient way to do this is, of |
100 keywords or reserved words of the language, what are permitted identifiers, numbers and so |
|
101 on. One convenient way to do this is, of |
64 course, to use regular expressions. In this course we want to take a closer look at the |
102 course, to use regular expressions. In this course we want to take a closer look at the |
65 WHILE-language. This is a simple imperative language consisting of arithmetic |
103 WHILE-language. This is a simple imperative language consisting of arithmetic |
66 expressions, assignments and loops only. For example the Fibonacci program can be |
104 expressions, assignments and loops only. For example the Fibonacci program can be |
67 written in this language as follows |
105 written in this language as follows |
|
106 |
|
107 \begin{center} |
|
108 \mbox{\lstinputlisting[language=while]{../progs/fib.while}} |
|
109 \end{center} |
|
110 |
|
111 \noindent |
|
112 The keywords in this language will be |
|
113 |
|
114 \begin{center} |
|
115 \texttt{while}, \texttt{if}, \texttt{then}, \texttt{else}, \texttt{write}, \texttt{read} |
|
116 \end{center} |
|
117 |
|
118 \noindent |
|
119 In addition we will have some typical operators, like \texttt{<}, \texttt{>}, \texttt{:=} and so on; numbers |
|
120 and strings (which we however ignore for the moment). We also need to specify what the ``white space'' |
|
121 is in our programming language as well as what comments should look like. |
|
122 We might specify the regular expressions for our language roughly as follows |
|
123 |
|
124 \begin{center} |
|
125 \begin{tabular}{lcl} |
|
126 $\textit{KEYWORD}$ & $:=$ & $\texttt{while} + \texttt{if} + \texttt{then} + \texttt{else} + \dots$\\ |
|
127 $\textit{IDENT}$ & $:=$ & $\textit{LETTER} \cdot (\textit{LETTER} + \textit{DIGIT} + {\_})^*$\\ |
|
128 $\textit{OP}$ & $:=$ & $\texttt{:=} + \texttt{<} + \ldots$\\ |
|
129 $\textit{NUM}$ & $:=$ & $\textit{DIGIT}^+$\\ |
|
130 $\textit{WHITESPACE}$ & $:=$ & $"\hspace{2mm}" + \backslash\texttt{n}$ |
|
131 \end{tabular} |
|
132 \end{center} |
|
133 |
|
134 \noindent |
|
135 with the usual meanings for the regular expressions $\textit{LETTER}$ and $\textit{DIGIT}$. |
|
136 |
|
137 The problem we have to solve is given a string of our programming language, which regular expression |
|
138 matches which part of the string. For example the input string |
|
139 |
|
140 \begin{center} |
|
141 \texttt{\Grid{if\VS true\VS then\VS x+2\VS else\VS x+3}} |
|
142 \end{center} |
|
143 |
|
144 \noindent |
|
145 needs to be recognised as |
|
146 |
|
147 \begin{center} |
|
148 \tt |
|
149 \Grid{if} |
|
150 \Grid{\VS} |
|
151 \Grid{true} |
|
152 \Grid{\VS} |
|
153 \Grid{then} |
|
154 \Grid{\VS} |
|
155 \Grid{x} |
|
156 \Grid{+} |
|
157 \Grid{2} |
|
158 \Grid{\VS} |
|
159 \Grid{else} |
|
160 \Grid{\VS} |
|
161 \Grid{x} |
|
162 \Grid{+} |
|
163 \Grid{3} |
|
164 \end{center} |
|
165 |
|
166 \noindent |
|
167 Since \texttt{if} matches the \textit{KEYWORD} regular expression, \VS{} is a whitespace and so on. This process |
|
168 of separating an input string into components is often called \emph{lexing} or \emph{scanning}. |
|
169 It is usually the first phase of a compiler. Note that the separation into words cannot, in general, |
|
170 be done by looking at whitespaces: while \texttt{if} and \texttt{true} are separated by a whitespace, |
|
171 the components in \texttt{x+2} are not. Another reason for recognising whitespaces explicitly is |
|
172 that in some languages, for example Python, whitespace matters. However in our small language we will eventually filter out all whitespaces and also comments. |
68 |
173 |
69 |
174 |
70 \end{document} |
175 \end{document} |
71 |
176 |
72 %%% Local Variables: |
177 %%% Local Variables: |