handouts/ho05.tex
changeset 153 70ab41cb610e
parent 152 90e27fafc5c7
child 154 51d6b8b828c4
equal deleted inserted replaced
152:90e27fafc5c7 153:70ab41cb610e
    13 \usetikzlibrary{shadows}
    13 \usetikzlibrary{shadows}
    14 \usetikzlibrary{positioning}
    14 \usetikzlibrary{positioning}
    15 \usetikzlibrary{calc}
    15 \usetikzlibrary{calc}
    16 \usetikzlibrary{fit}
    16 \usetikzlibrary{fit}
    17 \usetikzlibrary{backgrounds}
    17 \usetikzlibrary{backgrounds}
       
    18 \usepackage{fontspec}
       
    19 \setmonofont{Consolas}
    18 
    20 
    19 \newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}%
    21 \newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}%
    20 
    22 
    21 \definecolor{javared}{rgb}{0.6,0,0} % for strings
    23 \definecolor{javared}{rgb}{0.6,0,0} % for strings
    22 \definecolor{javagreen}{rgb}{0.25,0.5,0.35} % comments
    24 \definecolor{javagreen}{rgb}{0.25,0.5,0.35} % comments
    38   morestring=[b]",
    40   morestring=[b]",
    39   morestring=[b]',
    41   morestring=[b]',
    40   morestring=[b]"""
    42   morestring=[b]"""
    41 }
    43 }
    42 
    44 
       
    45 \lstdefinelanguage{while}{
       
    46   morekeywords={while, if, then. else, read, write},
       
    47   otherkeywords={=>,<-,<\%,<:,>:,\#,@},
       
    48   sensitive=true,
       
    49   morecomment=[l]{//},
       
    50   morecomment=[n]{/*}{*/},
       
    51   morestring=[b]",
       
    52   morestring=[b]',
       
    53   morestring=[b]"""
       
    54 }
       
    55 
       
    56 
    43 \lstset{language=Scala,
    57 \lstset{language=Scala,
    44 	basicstyle=\ttfamily,
    58 	basicstyle=\ttfamily,
    45 	keywordstyle=\color{javapurple}\bfseries,
    59 	keywordstyle=\color{javapurple}\bfseries,
    46 	stringstyle=\color{javagreen},
    60 	stringstyle=\color{javagreen},
    47 	commentstyle=\color{javagreen},
    61 	commentstyle=\color{javagreen},
    52 	numbersep=10pt,
    66 	numbersep=10pt,
    53 	tabsize=2,
    67 	tabsize=2,
    54 	showspaces=false,
    68 	showspaces=false,
    55 	showstringspaces=false}
    69 	showstringspaces=false}
    56 	
    70 	
       
    71 \newcommand\grid[1]{%
       
    72 \begin{tikzpicture}[baseline=(char.base)]
       
    73   \path[use as bounding box]
       
    74     (0,0) rectangle (1em,1em);
       
    75   \draw[red!50, fill=red!20]
       
    76     (0,0) rectangle (1em,1em);
       
    77   \node[inner sep=1pt,anchor=base west]
       
    78     (char) at (0em,\gridraiseamount) {#1};
       
    79 \end{tikzpicture}}
       
    80 \newcommand\gridraiseamount{0.12em}
       
    81 
       
    82 \makeatletter
       
    83 \newcommand\Grid[1]{%
       
    84   \@tfor\z:=#1\do{\grid{\z}}}
       
    85 \makeatother	
       
    86 
       
    87 \newcommand\Vspace[1][.3em]{%
       
    88   \mbox{\kern.06em\vrule height.3ex}%
       
    89   \vbox{\hrule width#1}%
       
    90   \hbox{\vrule height.3ex}}
       
    91 
       
    92 \def\VS{\Vspace[0.6em]}
       
    93 	
    57 \begin{document}
    94 \begin{document}
    58 
    95 
    59 \section*{Handout 5}
    96 \section*{Handout 5}
    60 
    97 
    61 Whenever you want to design a programming language or implement a compiler for an
    98 Whenever you want to design a programming language or implement a compiler for an
    62 existing language, the first task is to fix the basic ``words'' of the language, like what are the k
    99 existing language, the first task is to fix the basic ``words'' of the language, like what are the 
    63 eywords, what are permitted identifiers and so on. One convenient way to do this is, of 
   100 keywords or reserved words of the language, what are permitted identifiers, numbers and so 
       
   101 on. One convenient way to do this is, of 
    64 course, to use regular expressions. In this course we want to take a closer look at the 
   102 course, to use regular expressions. In this course we want to take a closer look at the 
    65 WHILE-language. This is a simple imperative language consisting of arithmetic
   103 WHILE-language. This is a simple imperative language consisting of arithmetic
    66 expressions, assignments and loops only. For example the Fibonacci program can be
   104 expressions, assignments and loops only. For example the Fibonacci program can be
    67 written in this language as follows
   105 written in this language as follows
       
   106 
       
   107 \begin{center}
       
   108 \mbox{\lstinputlisting[language=while]{../progs/fib.while}}
       
   109 \end{center}
       
   110 
       
   111 \noindent
       
   112 The keywords in this language will be
       
   113 
       
   114 \begin{center}
       
   115 \texttt{while}, \texttt{if}, \texttt{then}, \texttt{else}, \texttt{write}, \texttt{read}
       
   116 \end{center}
       
   117 
       
   118 \noindent
       
   119 In addition we will have some typical operators, like \texttt{<}, \texttt{>}, \texttt{:=} and so on; numbers
       
   120 and strings (which we however ignore for the moment). We also need to specify what the ``white space''
       
   121 is in our programming language as well as what comments should look like.
       
   122 We might specify the regular expressions for our language roughly as follows
       
   123 
       
   124 \begin{center}
       
   125 \begin{tabular}{lcl}
       
   126 $\textit{KEYWORD}$ & $:=$ & $\texttt{while}  + \texttt{if} + \texttt{then} + \texttt{else} + \dots$\\
       
   127 $\textit{IDENT}$ & $:=$ & $\textit{LETTER} \cdot (\textit{LETTER} + \textit{DIGIT} + {\_})^*$\\ 
       
   128 $\textit{OP}$      &  $:=$ & $\texttt{:=} + \texttt{<} + \ldots$\\
       
   129 $\textit{NUM}$   &  $:=$ & $\textit{DIGIT}^+$\\
       
   130 $\textit{WHITESPACE}$ & $:=$ & $"\hspace{2mm}" + \backslash\texttt{n}$
       
   131 \end{tabular}
       
   132 \end{center}
       
   133 
       
   134 \noindent
       
   135 with the usual meanings for the regular expressions $\textit{LETTER}$ and $\textit{DIGIT}$.
       
   136 
       
   137 The problem we have to solve is given a string of our programming language, which regular expression 
       
   138 matches which part of the string. For example the input string
       
   139 
       
   140 \begin{center}
       
   141 \texttt{\Grid{if\VS true\VS then\VS x+2\VS else\VS x+3}}
       
   142 \end{center}
       
   143 
       
   144 \noindent
       
   145 needs to be recognised as
       
   146 
       
   147 \begin{center}
       
   148 \tt
       
   149 \Grid{if} 
       
   150 \Grid{\VS} 
       
   151 \Grid{true} 
       
   152 \Grid{\VS} 
       
   153 \Grid{then} 
       
   154 \Grid{\VS} 
       
   155 \Grid{x} 
       
   156 \Grid{+} 
       
   157 \Grid{2}
       
   158 \Grid{\VS} 
       
   159 \Grid{else}
       
   160 \Grid{\VS}
       
   161 \Grid{x}
       
   162 \Grid{+}
       
   163 \Grid{3}
       
   164 \end{center}
       
   165 
       
   166 \noindent
       
   167 Since \texttt{if} matches the \textit{KEYWORD} regular expression, \VS{}  is a whitespace and so on. This process 
       
   168 of separating an input string into components is often called \emph{lexing} or \emph{scanning}.
       
   169 It is usually the first phase of a compiler. Note that the separation into words cannot, in general, 
       
   170 be done by looking at whitespaces: while \texttt{if} and \texttt{true} are separated by a whitespace,
       
   171 the components in \texttt{x+2} are not. Another reason for recognising whitespaces explicitly is
       
   172 that in some languages, for example Python, whitespace matters. However in our small language we will eventually filter out all whitespaces and also comments.
    68 
   173 
    69 
   174 
    70 \end{document}
   175 \end{document}
    71 
   176 
    72 %%% Local Variables: 
   177 %%% Local Variables: