--- a/handouts/ho05.tex Fri Oct 25 14:33:35 2013 +0100
+++ b/handouts/ho05.tex Fri Oct 25 17:06:19 2013 +0100
@@ -15,6 +15,8 @@
\usetikzlibrary{calc}
\usetikzlibrary{fit}
\usetikzlibrary{backgrounds}
+\usepackage{fontspec}
+\setmonofont{Consolas}
\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}%
@@ -40,6 +42,18 @@
morestring=[b]"""
}
+\lstdefinelanguage{while}{
+ morekeywords={while, if, then. else, read, write},
+ otherkeywords={=>,<-,<\%,<:,>:,\#,@},
+ sensitive=true,
+ morecomment=[l]{//},
+ morecomment=[n]{/*}{*/},
+ morestring=[b]",
+ morestring=[b]',
+ morestring=[b]"""
+}
+
+
\lstset{language=Scala,
basicstyle=\ttfamily,
keywordstyle=\color{javapurple}\bfseries,
@@ -54,18 +68,109 @@
showspaces=false,
showstringspaces=false}
+\newcommand\grid[1]{%
+\begin{tikzpicture}[baseline=(char.base)]
+ \path[use as bounding box]
+ (0,0) rectangle (1em,1em);
+ \draw[red!50, fill=red!20]
+ (0,0) rectangle (1em,1em);
+ \node[inner sep=1pt,anchor=base west]
+ (char) at (0em,\gridraiseamount) {#1};
+\end{tikzpicture}}
+\newcommand\gridraiseamount{0.12em}
+
+\makeatletter
+\newcommand\Grid[1]{%
+ \@tfor\z:=#1\do{\grid{\z}}}
+\makeatother
+
+\newcommand\Vspace[1][.3em]{%
+ \mbox{\kern.06em\vrule height.3ex}%
+ \vbox{\hrule width#1}%
+ \hbox{\vrule height.3ex}}
+
+\def\VS{\Vspace[0.6em]}
+
\begin{document}
\section*{Handout 5}
Whenever you want to design a programming language or implement a compiler for an
-existing language, the first task is to fix the basic ``words'' of the language, like what are the k
-eywords, what are permitted identifiers and so on. One convenient way to do this is, of
+existing language, the first task is to fix the basic ``words'' of the language, like what are the
+keywords or reserved words of the language, what are permitted identifiers, numbers and so
+on. One convenient way to do this is, of
course, to use regular expressions. In this course we want to take a closer look at the
WHILE-language. This is a simple imperative language consisting of arithmetic
expressions, assignments and loops only. For example the Fibonacci program can be
written in this language as follows
+\begin{center}
+\mbox{\lstinputlisting[language=while]{../progs/fib.while}}
+\end{center}
+
+\noindent
+The keywords in this language will be
+
+\begin{center}
+\texttt{while}, \texttt{if}, \texttt{then}, \texttt{else}, \texttt{write}, \texttt{read}
+\end{center}
+
+\noindent
+In addition we will have some typical operators, like \texttt{<}, \texttt{>}, \texttt{:=} and so on; numbers
+and strings (which we however ignore for the moment). We also need to specify what the ``white space''
+is in our programming language as well as what comments should look like.
+We might specify the regular expressions for our language roughly as follows
+
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{KEYWORD}$ & $:=$ & $\texttt{while} + \texttt{if} + \texttt{then} + \texttt{else} + \dots$\\
+$\textit{IDENT}$ & $:=$ & $\textit{LETTER} \cdot (\textit{LETTER} + \textit{DIGIT} + {\_})^*$\\
+$\textit{OP}$ & $:=$ & $\texttt{:=} + \texttt{<} + \ldots$\\
+$\textit{NUM}$ & $:=$ & $\textit{DIGIT}^+$\\
+$\textit{WHITESPACE}$ & $:=$ & $"\hspace{2mm}" + \backslash\texttt{n}$
+\end{tabular}
+\end{center}
+
+\noindent
+with the usual meanings for the regular expressions $\textit{LETTER}$ and $\textit{DIGIT}$.
+
+The problem we have to solve is given a string of our programming language, which regular expression
+matches which part of the string. For example the input string
+
+\begin{center}
+\texttt{\Grid{if\VS true\VS then\VS x+2\VS else\VS x+3}}
+\end{center}
+
+\noindent
+needs to be recognised as
+
+\begin{center}
+\tt
+\Grid{if}
+\Grid{\VS}
+\Grid{true}
+\Grid{\VS}
+\Grid{then}
+\Grid{\VS}
+\Grid{x}
+\Grid{+}
+\Grid{2}
+\Grid{\VS}
+\Grid{else}
+\Grid{\VS}
+\Grid{x}
+\Grid{+}
+\Grid{3}
+\end{center}
+
+\noindent
+Since \texttt{if} matches the \textit{KEYWORD} regular expression, \VS{} is a whitespace and so on. This process
+of separating an input string into components is often called \emph{lexing} or \emph{scanning}.
+It is usually the first phase of a compiler. Note that the separation into words cannot, in general,
+be done by looking at whitespaces: while \texttt{if} and \texttt{true} are separated by a whitespace,
+the components in \texttt{x+2} are not. Another reason for recognising whitespaces explicitly is
+that in some languages, for example Python, whitespace matters. However in our small language we will eventually filter out all whitespaces and also comments.
+
\end{document}