afl-material: comparison handouts/ho05.tex

equal deleted inserted replaced

-:90e27fafc5c7
+:70ab41cb610e
 \usetikzlibrary{shadows}
 \usetikzlibrary{positioning}
 \usetikzlibrary{calc}
 \usetikzlibrary{fit}
 \usetikzlibrary{backgrounds}
+\usepackage{fontspec}
+\setmonofont{Consolas}
 \newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}%
 \definecolor{javared}{rgb}{0.6,0,0} % for strings
 \definecolor{javagreen}{rgb}{0.25,0.5,0.35} % comments
 morestring=[b]",
 morestring=[b]',
 morestring=[b]"""
 }
+\lstdefinelanguage{while}{
+morekeywords={while, if, then. else, read, write},
+otherkeywords={=>,<-,<\%,<:,>:,\#,@},
+sensitive=true,
+morecomment=[l]{//},
+morecomment=[n]{/*}{*/},
+morestring=[b]",
+morestring=[b]',
+morestring=[b]"""
+}
 \lstset{language=Scala,
 	basicstyle=\ttfamily,
 	keywordstyle=\color{javapurple}\bfseries,
 	stringstyle=\color{javagreen},
 	commentstyle=\color{javagreen},
 	numbersep=10pt,
 	tabsize=2,
 	showspaces=false,
 	showstringspaces=false}
+\newcommand\grid[1]{%
+\begin{tikzpicture}[baseline=(char.base)]
+\path[use as bounding box]
+(0,0) rectangle (1em,1em);
+\draw[red!50, fill=red!20]
+(0,0) rectangle (1em,1em);
+\node[inner sep=1pt,anchor=base west]
+(char) at (0em,\gridraiseamount) {#1};
+\end{tikzpicture}}
+\newcommand\gridraiseamount{0.12em}
+\makeatletter
+\newcommand\Grid[1]{%
+\@tfor\z:=#1\do{\grid{\z}}}
+\makeatother
+\newcommand\Vspace[1][.3em]{%
+\mbox{\kern.06em\vrule height.3ex}%
+\vbox{\hrule width#1}%
+\hbox{\vrule height.3ex}}
+\def\VS{\Vspace[0.6em]}
 \begin{document}
 \section*{Handout 5}
 Whenever you want to design a programming language or implement a compiler for an
-existing language, the first task is to fix the basic ``words'' of the language, like what are the k
+existing language, the first task is to fix the basic ``words'' of the language, like what are the
-eywords, what are permitted identifiers and so on. One convenient way to do this is, of
+keywords or reserved words of the language, what are permitted identifiers, numbers and so
+on. One convenient way to do this is, of
 course, to use regular expressions. In this course we want to take a closer look at the
 WHILE-language. This is a simple imperative language consisting of arithmetic
 expressions, assignments and loops only. For example the Fibonacci program can be
 written in this language as follows
+\begin{center}
+\mbox{\lstinputlisting[language=while]{../progs/fib.while}}
+\end{center}
+\noindent
+The keywords in this language will be
+\begin{center}
+\texttt{while}, \texttt{if}, \texttt{then}, \texttt{else}, \texttt{write}, \texttt{read}
+\end{center}
+\noindent
+In addition we will have some typical operators, like \texttt{<}, \texttt{>}, \texttt{:=} and so on; numbers
+and strings (which we however ignore for the moment). We also need to specify what the ``white space''
+is in our programming language as well as what comments should look like.
+We might specify the regular expressions for our language roughly as follows
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{KEYWORD}$ & $:=$ & $\texttt{while}  + \texttt{if} + \texttt{then} + \texttt{else} + \dots$\\
+$\textit{IDENT}$ & $:=$ & $\textit{LETTER} \cdot (\textit{LETTER} + \textit{DIGIT} + {\_})^*$\\
+$\textit{OP}$      &  $:=$ & $\texttt{:=} + \texttt{<} + \ldots$\\
+$\textit{NUM}$   &  $:=$ & $\textit{DIGIT}^+$\\
+$\textit{WHITESPACE}$ & $:=$ & $"\hspace{2mm}" + \backslash\texttt{n}$
+\end{tabular}
+\end{center}
+\noindent
+with the usual meanings for the regular expressions $\textit{LETTER}$ and $\textit{DIGIT}$.
+The problem we have to solve is given a string of our programming language, which regular expression
+matches which part of the string. For example the input string
+\begin{center}
+\texttt{\Grid{if\VS true\VS then\VS x+2\VS else\VS x+3}}
+\end{center}
+\noindent
+needs to be recognised as
+\begin{center}
+\tt
+\Grid{if}
+\Grid{\VS}
+\Grid{true}
+\Grid{\VS}
+\Grid{then}
+\Grid{\VS}
+\Grid{x}
+\Grid{+}
+\Grid{2}
+\Grid{\VS}
+\Grid{else}
+\Grid{\VS}
+\Grid{x}
+\Grid{+}
+\Grid{3}
+\end{center}
+\noindent
+Since \texttt{if} matches the \textit{KEYWORD} regular expression, \VS{}  is a whitespace and so on. This process
+of separating an input string into components is often called \emph{lexing} or \emph{scanning}.
+It is usually the first phase of a compiler. Note that the separation into words cannot, in general,
+be done by looking at whitespaces: while \texttt{if} and \texttt{true} are separated by a whitespace,
+the components in \texttt{x+2} are not. Another reason for recognising whitespaces explicitly is
+that in some languages, for example Python, whitespace matters. However in our small language we will eventually filter out all whitespaces and also comments.
 \end{document}
 %%% Local Variables:

changeset 153	70ab41cb610e
parent 152	90e27fafc5c7
child 154	51d6b8b828c4