hws/hw04.tex
changeset 943 5365ef60707e
parent 939 f85e784d3014
--- a/hws/hw04.tex	Fri Oct 13 23:49:34 2023 +0100
+++ b/hws/hw04.tex	Sat Oct 21 09:09:09 2023 +0100
@@ -22,13 +22,49 @@
 there are several values for how these regular expressions can
 recognise the strings (for 1) $ab$ and (for 2) $aaa$. Give in each case
 \emph{all} the values and indicate which one is the POSIX value.
+
+\solution{
+  1) There are only 2 values (writing $a$ for $Char(a)$ and so on)
+
+  \begin{center}
+  \begin{tabular}{l}
+    $Sequ(Left(Sequ(a,b)),Left(Empty))$\\
+    $Sequ(Right(a),Left(b))$\\ 
+  \end{tabular}    
+  \end{center}
   
+  The first is the POSIX value because of the preference for $Left$.
+
+  2) There are three ``main'' values, namely
+
+  \begin{center}
+  \begin{tabular}{l}
+    $Stars\,[Left(Sequ(a,a)),Right(a)]$\\
+    $Stars\,[Right(a), Left(Sequ(a,a))]$\\
+    $Stars\,[Right(a), Right(a), Right(a)]$\\
+  \end{tabular}    
+  \end{center}
+
+  Again the first one is the POSIX value, but if it just about all
+  possible values, then there are in fact infinitely many values because
+  the following 
+
+  \begin{center}
+  \begin{tabular}{l}
+    $Stars\,[Left(Sequ(a,a)),Empty,Right(a)]$\\
+    $Stars\,[Left(Sequ(a,a)),Empty,Empty,Right(a)]$\\
+    $Stars\,[Left(Sequ(a,a)),Empty,Right(a), Empty]$, \ldots\\
+  \end{tabular}    
+  \end{center}
+
+  are also values for this regex and the string $aaa$.
+}  
 
 \item If a regular expression $r$ does not contain any occurrence of $\ZERO$,  
   is it possible for $L(r)$ to be empty? Explain why, or give a proof.
 
   \solution{
-    The property to prove is
+    No. The property to prove by induction is
 
     \begin{center}
     $P(r)$: If $r$ does not contain $\ZERO$, then $L(r) \not= \emptyset$.  
@@ -76,6 +112,10 @@
   In case they can, can you give the corresponding token
   sequences.
 
+  \solution{
+  The first 2 are lexibile. The 3 one contains $/$ which is not an operator.
+  }  
+
 \item Assume $r$ is nullable. Show that
   \[ 1 + r + r\cdot r \;\equiv\; r\cdot r
   \]
@@ -128,8 +168,8 @@
   password should be at least 8 characters long and consist of upper-
   and lower-case letters and digits. It should contain at least a
   single lower-case letter, at least a single upper-case letter and at
-  least a single digit. If possible ise the intersection regular
-  expression from CW1, written $\_\&\_$, the bounded regular
+  least a single digit. If possible use the intersection regular
+  expression from CW1, written $\_\&\_$, and the bounded regular
   expressions; you can also assume a regular expression written
   \texttt{ALL} that can match any character.
 
@@ -144,6 +184,8 @@
                    & \;\&\; & $(ALL^*\cdot [0-9]\cdot ALL^*)$\\
   \end{tabular}
   \end{center}
+
+  $ALL$ could be represented as $\sim \ZERO$.
   }
   
 \item Assume the delimiters for comments are
@@ -161,7 +203,13 @@
       that can recognise any character, and a regular
       expression \texttt{NOT} that recognises the complement
       of a regular expression.)
-  
+
+      \solution{
+        \[/ * \sim (ALL^* * / ALL^*) * /\]
+
+      The idea to make sure in between $/ *$ and $* /$ ar no strings that contain $* /$.  
+      }
+      
 \item Simplify the regular expression
 
 \[
@@ -170,7 +218,12 @@
 \]
 
       Does simplification always preserve the meaning of a
-      regular expression? 
+      regular expression?
+
+      \solution{ Yes, simplification preserves the language. It
+        simplifies to just $\ONE$. It should be remembered that the
+        Brzozowski does not simplify under stars. This does not apply
+        in this example, though.  }
      
 \item The Sulzmann \& Lu algorithm contains the function
       $mkeps$ which answers how a regular expression can match
@@ -184,10 +237,28 @@
     (a + \ONE) \cdot (\ONE + \ONE)\\
     a^*
   \end{array}
-  \]
+\]
+
+\solution{
+  The values are
+  \begin{center}
+  \begin{tabular}{l}
+    $Right(Right(Empty))$\\
+    $Sequ(Right(\ONE),Left(\ONE))$\\
+    $Stars\,[]$
+  \end{tabular}  
+  \end{center}
+
+  The last one uses the rule that $mkeps$ for the star returns always $Star\,[]$.
+  }
 
 \item What is the purpose of the record regular expression in
-      the Sulzmann \& Lu algorithm?
+  the Sulzmann \& Lu algorithm?
+
+  \solution{
+    It marks a part of a regular expression and can be used to extract the part of the
+    string that is matched by this marked part of the regular expression.
+  }
 
 \item Recall the functions \textit{nullable} and
       \textit{zeroable}.  Define recursive functions
@@ -252,7 +323,7 @@
           i(r^*)  &\dn \neg a(r)
         \end{align}
 
-        Here the interesting bit is that as soon $r$ can match at least a single string, then $r^*$
+        Here the interesting bit is that as soon $r$ can match at least a single non-empty string, then $r^*$
         will match infinitely many strings.
         }
 
@@ -264,6 +335,15 @@
   engines like the one in Rust generate DFAs. Explain what is the
   problem with these regex engines and also what is the problem with $a^{\{1000\}}$
   in these engines.
+
+  \solution{
+    Why they use NFAs? NFAs are of similar size as the regular expression (they do not explode
+    for the basic regular expressions. Python regex library supports constructions like
+    back-refernces which cannot be represented by DFAs (string matching with back-references
+    can be NP. What is the problem with $a^{\{1000\}}$. When generating DFAs (and NFAs) for the
+    bounded regular expressions, one has to make $n$ copies, which means their size can grow
+    drastically for large counters.
+  }
       
 %\item (Optional) The tokenizer in \texttt{regexp3.scala} takes as
 %argument a string and a list of rules. The result is a list of tokens. Improve this tokenizer so