hws/hw06.tex
author Christian Urban <christian.urban@kcl.ac.uk>
Wed, 29 May 2024 13:25:30 +0100
changeset 960 c7009356ddd8
parent 955 47acfd7f9096
permissions -rw-r--r--
updated

\documentclass{article}
\usepackage{../style}
\usepackage{../graphics}
\usepackage{../grammar}

\begin{document}

\section*{Homework 6}

%\HEADER

\begin{enumerate}
\item (i) Give the regular expressions for lexing a language
      consisting of whitespaces, identifiers (some letters
      followed by digits), numbers, operations \texttt{=},
      \texttt{<} and \texttt{>}, and the keywords \texttt{if},
      \texttt{then} and \texttt{else}. (ii) Decide whether the
      following strings can be lexed in this language?

\begin{enumerate}
\item \texttt{"if y4 = 3 then 1 else 3"}
\item \texttt{"if33 ifif then then23 else else 32"}
\item \texttt{"if x4x < 33 then 1 else 3"}
\end{enumerate}

In case they can, give the corresponding token sequences. (Hint: 
Observe the maximal munch rule and priorities of your regular
expressions that make the process of lexing unambiguous.)

\solution{
  a and b are ok; c is not matched because x4x is not an identifier according to the rules.
}


\item Suppose the grammar

\begin{plstx}[margin=1cm]    
:\meta{E} ::= \meta{F} \;|\; \meta{F} \cdot \texttt{*} \cdot \meta{F} \;|\; \meta{F} \cdot \backslash \cdot \meta{F}\\
:\meta{F} ::= \meta{T} \;|\; \meta{T} \cdot \texttt{+} \cdot \meta{T} \;|\; \meta{T} \cdot \texttt{-} \cdot \meta{T}\\
:\meta{T} ::= $num$ \;|\; \texttt{(} \cdot E \cdot \texttt{)}\\
\end{plstx}

where \meta{E}, \meta{F} and \meta{T} are non-terminals, \meta{E} is
the starting symbol of the grammar, and $num$ stands for a number
token. Give a parse tree for the string \texttt{(3+3)+(2*3)}. Note that
\meta{F} and \meta{T} are ``exchanged'' in this grammar in comparison with
the usual grammar for arithmetic expressions. What does this mean in terms
of precedences of the operators?

\solution{
  \begin{tikzpicture}[level distance=10mm, black,sibling distance=10mm]
  \node {\meta{E}}
  child[sibling distance=20mm] {node {\meta{F}} 
    child {node {\meta{T}}
      child[sibling distance=5mm] {node {(}}
      child[sibling distance=5mm]  {node {\meta{E}}
        child {node {\meta{F}}
          child {node {\meta{T}} child {node {3}}}
          child {node {+}}
          child {node {\meta{T}} child {node {3}}}
        }  
      }
      child[sibling distance=5mm]  {node {)}}
    }
    child {node {+}}
    child {node {\meta{T}}
      child[sibling distance=5mm] {node {(}}
      child[sibling distance=5mm] {node {\meta{E}}
        child {node {\meta{F}}
          child {node {\meta{T}} child {node {2}}}
        }
        child {node {*}}
        child {node {\meta{F}}
          child {node {\meta{T}} child {node {3}}}
        }
        }
      child[sibling distance=5mm] {node {)}}
    }
    };
\end{tikzpicture}

  
  For the second part "1+2*3" will be parsed as (1+2)*3, meaning + and - bind
  tighter than * and $\backslash$
}

\item Define what it means for a grammar to be ambiguous. Give an example of
an ambiguous grammar.

\solution{
  Already the grammar
  \begin{plstx}[margin=1cm]
    : \meta{E} ::= \meta{E}\cdot +\cdot\meta{E} | num\\
  \end{plstx}

  is ambiguous because a string like "1+2+3" can be parsed
  as (1+2)+3 or 1+(2+3).
  }


\item Suppose boolean expressions are built up from

\begin{center}
\begin{tabular}{ll}
1.) & tokens for \texttt{true} and \texttt{false},\\
2.) & the infix operations \texttt{$\wedge$} and \texttt{$\vee$},\\
3.) & the prefix operation $\neg$, and\\
4.) & can be enclosed in parentheses.
\end{tabular}
\end{center}

(i) Give a grammar that can recognise such boolean expressions
and (ii) give a sample string involving all rules given in 1.-4.~that 
can be parsed by this grammar.

\solution{
The simplest grammar is
  
\begin{plstx}[margin=1cm]    
  :\meta{B} ::= true \;|\; false \;|\;\meta{B} \cdot $\wedge$ \cdot \meta{B} \;|\; \meta{B} \cdot $\vee$ \cdot \meta{B}
  \;|\; $\neg$\meta{B} \;|\; (\cdot\meta{B}\cdot)\\
\end{plstx}

This is unfortunately a left-recursive grammar. Giving a not left-recursive one is a bit more involved.
  }

\item Parsing combinators consist of atomic parsers, alternative
  parsers, sequence parsers and semantic actions.  What is the purpose
  of (1) atomic parsers and of (2) map-parsers (also called semantic actions)?

  \solution{
    Atomic parsers look at a concrete prefix of the input, like num-tokens or identifiers.
    Map-parsers apply a function to the output of a parser. In this way you can transform
    the output from, for example, a string to an integer.
  }

  
\item Parser combinators can directly be given a string as
      input, without the need of a lexer. What are the
      advantages to first lex a string and then feed a
      sequence of tokens as input to the parser?

      \solution{ Reason 1 you can filter out whitespaces and comments,
        which makes the grammar rules simpler. If you have to make
        sure that a whitespace comes after a variable say, then your
        parser rule for variables gets more complicated. Same with
        comments which do not contribute anything to the parse tree.
        
        Reason 2) The lexer can already classify tokens, for example
        as numbers, keywords or identifiers. This again makes the grammar
        rules more deterministic and as a result faster to parse.

        The point is that parser combinators can be used to process
        strings, but in case of compilers where whitespaces and
        comments need to be filtered out, the lexing phase is actually
        quite useful.  }
      
\item The injection function for sequence regular expressions is defined
      by three clauses:

\begin{center}
\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
  $\inj\,(r_1 \cdot r_2)\,c\,\,Seq(v_1,v_2)$ & $\dn$  & $Seq(\inj\,r_1\,c\,v_1,v_2)$\\
  $\inj\,(r_1 \cdot r_2)\,c\,\,\Left(Seq(v_1,v_2))$ & $\dn$  & $Seq(\inj\,r_1\,c\,v_1,v_2)$\\
  $\inj\,(r_1 \cdot r_2)\,c\,\,Right(v)$ & $\dn$  & $Seq(\textit{mkeps}(r_1),\inj\,r_2\,c\,v)$\\
\end{tabular}
\end{center}

     Explain why there are three cases in the injection function for sequence
     regular expressions.

     \solution{
       This is because the derivative of sequences can be of the form

       \begin{itemize}
       \item $(der\,c\,r_1)\cdot r_2$
       \item $(der\,c\,r_1)\cdot r_2 \;+\; der\,c\,r_2$  
       \end{itemize}

       In the first case the value needs to be of the form $Seq$, in the second case $\Left$ or $Right$.
       Therefore 3 cases.
       }
      
\item \POSTSCRIPT        
\end{enumerate}

\end{document}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: t
%%% End: