hws/hw01.tex
author Christian Urban <christian.urban@kcl.ac.uk>
Fri, 24 Oct 2025 08:55:14 +0100
changeset 1016 c02d409ed7f4
parent 967 ce5de01b9632
permissions -rw-r--r--
added amm faq

% !TEX program = xelatex
\documentclass{article}
\usepackage{../style}

\begin{document}

\section*{Homework 1}

%%\HEADER

\begin{enumerate}

\item {\bf (Optional)} If you want to run the code presented
      in the lectures, install the Scala programming language
      available (for free) from

      \begin{center}
        \url{http://www.scala-lang.org}
      \end{center}

      and the Ammonite REPL from
       
       \begin{center}
       \url{https://ammonite.io}
       \end{center}      

      If you want to follow the code I present during the
      lectures, it might be useful to install VS Code or Codium.
      Please have a look at the handout about Ammonite and
      if you need a refresher for Scala - I linked on KEATS
      the Scala handout from PEP.
      %I will be using Scala Version 3.5, which has the \texttt{scala-cli}
      %REPL used in PEP already built in.
      
      %handout about Scala.
      %Make sure Ammonite
      %uses the Scala 3 compiler.

%\item {\bf (Optional)} Have a look at the crawler programs.
%      Can you find a usage for them in your daily programming
%      life? Can you improve them? For example in cases there
%      are links that appear on different recursion levels, the
%      crawlers visit such web-pages several times. Can this be
%      avoided? Also, the crawlers flag as problematic any page
%      that gives an error, but probably only 404 Not Found
%      errors should be flagged. Can you change that?)

\item {\bf (Optional)} Have a look at the catastrophic backtracking
  programs uploaded on KEATS. Convince yourself that they really require
  a lot of computation time. If you have similar examples in your own
  favourite programming language, I am happy to hear about it.


\item Read the handout of the first lecture and the handout
      about notation. Make sure you understand the concepts of
      strings and languages. In the context of the CFL-course,
      what is meant by the term \emph{language}?

      \solution{A language - in this context - is just a set of
        strings. Some of these sets can actually not be described by
        regular expressions. Only regular​ languages can. This is
        something for lecture 3.}
      
\item Give the definition for regular expressions---this is an
      inductive datatype. What is the
      meaning of a regular expression? (Hint: The meaning is
      defined recursively.)

      \solution{Here I would also expect the grammar for basic regular
        expressions and the definition of the recursive L-function. Discuss
        differences between $r_1 + r_2$ and $r^+$. Discuss differences between
      ``real-life regexes'' and regexes in this module.}

\item Assume the concatenation operation of two strings is
      written as $s_1 @ s_2$. Define the operation of
      \emph{concatenating} two sets of strings. This operation
      is also written as $\_ \,@\, \_$. According to 
      this definition, what is $A \,@\, \{\}$ equal to?
      Is in general $A\,@\,B$ equal to $B\,@\,A$?

      \solution{ What is $A @ {[]}$? Are there special cases
        where $A @ B = B @ A$? Obviously when $A = B$ the stament is true.
        But there are also cases when $A \not= B$, for example $A = \{a\}$
      and $B = \{aaa\}$.}

\item Assume a set $A$ contains 4 strings and a set $B$
      contains 7 strings. None of the strings is the empty
      string. How many strings are in $A \,@\, B$?

      \solution{Everyone will probably answer with 28, but there are corner cases where there are fewer
        than 28 elements. Can students think of such corner cases?
      For example $A = \{a, ab, \ldots\}$, $B = \{bc, c,\ldots\}$ }

\item How is the power of a language defined? (Hint: There are two
  rules, one for $\_^0$ and one for $\_^{n+1}$.)

     \solution{Two rules: 0-case and n+1 case.}

\item Let $A = \{[a], [b], [c], [d]\}$. (1) How many strings
      are in $A^4$? (2) Consider also the case of $A^4$ where one of
      the strings in $A$ is the empty string, for example $A =
      \{[a], [b], [c], []\}$.

      \solution{121 is correct. But make sure you understand why it is 121
        in cases you do not have a computer at your fingertips.}

\item (1) How many basic regular expressions are there to match
      \textbf{only} the string $abcd$? (2) How many if they cannot include
      $\ONE$ and $\ZERO$? (3) How many if they are also not
      allowed to contain stars? (4) How many if they are also
      not allowed to contain $\_ + \_$?

      \solution{1-3 are infinite (tell the idea why and give examples);
        4 is five - remember regexes are trees (that is the main point of the question.}
      
\item When are two regular expressions equivalent? Can you
      think of instances where two regular expressions match
      the same strings, but it is not so obvious that they do?
      For example $a + b$ and $b + a$ do not count\ldots they
      obviously match the same strings, namely $[a]$ and
      $[b]$.

      \solution{for example $r^* = 1 + r \cdot r^*$ for any regular expression $r$.
        Can students think about why this is the case? - this would need a proof.\bigskip


        Lemma to prove: $A^* = \{[]\} \cup A \times A^*$ for any language $A$.\bigskip

        The definition of ${}^*$: $\bigcup n. A^n$\bigskip

        We first show $\subseteq$ of the lemma: a string $s \in A^*$, must also be in
        $\bigcup n. A^n$, which means there is an $n$ such that $s\in A^n$.
        If $n = 0$ then $s = []$ and $s$ is also in $\{[]\} \cup A \times A^*$.
        If $n$ is bigger than $0$, then $s\in A^n$, which means by
        definition of power that $s\in A \times A^{n - 1}$. But then
        also $s \in A \times A^*$. That is one direction.\bigskip

        The other direction: Two cases: (i) $s\in \{[]\}$ then
        also $s\in A^*$. (ii) $s\in A \times A^*$ means there exists
        an $n$ such that $s\in A\times A^n$. This in turn means
        $s\in A^{n + 1}$ (by definition of power). But this also means $s\in A^*$.
        So $\{[]\} \cup A \times A^*$ is a subset of $A^*$. Done!
      }

\item What is meant by the notions \emph{evil regular expressions}
  and by \emph{catastrophic backtracking}?

  \solution{catastrophic backtracking also applies to other regexes,
    not just $(a^*)^*b$.  Maybe
    \url{https://www.trevorlasn.com/blog/when-regex-goes-wrong/} is
    of help - even the CrowdStrike issue had an underlying problem
    with a regex, though this one was not due to catastrophic
    backtracking.}

\item Given the regular expression $(a + b)^* \cdot b \cdot (a + b)^*$,
  which of the following regular expressions are equivalent

\begin{center}
\begin{tabular}{ll}    
  1) & $(ab + bb)^* \cdot (a + b)^*$\\                     % no
  2) & $(a + b)^* \cdot (ba + bb + b) \cdot (a + b)^*$\\   % yes
  3) & $(a + b)^* \cdot (a + b) \cdot (a + b)^*$           % no
\end{tabular}
\end{center}

  \solution{no, yes (why?), no.}


\item Given the extended regular expression \texttt{[b-d]a?e+},
  what does the equivalent basic regular expression look like?
  
  \solution{$(b + c + d) \cdot (a + \ONE) \cdot (e \cdot e^*)$}
  
  
\item \POSTSCRIPT  
\end{enumerate}

\end{document}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: t
%%% End: