# HG changeset patch # User Christian Urban # Date 1417799613 0 # Node ID 7975e4f0d4de5086f0f76c0ea8d98027c18b177d # Parent a61b50c5d57fa68f9b24e4321035f12d24701a6b updated diff -r a61b50c5d57f -r 7975e4f0d4de handouts/ho01.pdf Binary file handouts/ho01.pdf has changed diff -r a61b50c5d57f -r 7975e4f0d4de handouts/ho01.tex --- a/handouts/ho01.tex Fri Dec 05 01:00:34 2014 +0000 +++ b/handouts/ho01.tex Fri Dec 05 17:13:33 2014 +0000 @@ -69,7 +69,7 @@ disposable.style.email.with+symbol@example.com \end{lstlisting} -Identifiers, or variables, in program text are often required +As mentioned above, identifiers, or variables, in program text are often required to satisfy the constraints that they start with a letter and then can be followed by zero or more letters or numbers and also can include underscores, but not as the first character. @@ -118,7 +118,8 @@ \ref{crawler3}.\footnote{There is an interesting twist in the web-scraper where \pcode{re*?} is used instead of \pcode{re*}.} Note, however, the regular expression for -http-addresses in web-pages is meant to be +http-addresses in web-pages in Figure~\ref{crawler1}, Line 15, is +intended to be \[ \pcode{"https?://[^"]*"} @@ -162,7 +163,7 @@ scaled ticks=false, axis lines=left, width=7cm, - height=5cm, + height=4.5cm, legend entries={Python,Ruby}, legend pos=north west, legend cell align=left] @@ -217,7 +218,7 @@ scaled ticks=false, axis lines=left, width=9cm, - height=5cm] + height=4.5cm] \addplot[green,mark=square*,mark options={fill=white}] table {re2b.data}; \addplot[black,mark=square*,mark options={fill=white}] table {re3.data}; \end{axis} @@ -393,6 +394,8 @@ between the different regular expressions $(r_1 + r_2) + r_3$ and $r_1 + (r_2 + r_3)$\ldots they are not the same regular expression, but they have the same meaning. For example +you can do the following calculation which shows they +have the same meaning: \begin{eqnarray*} L((r_1 + r_2) + r_3) & = & L(r_1 + r_2) \cup L(r_3)\\ @@ -517,7 +520,7 @@ specification and that the corresponding implementations do not contain any bugs. We are close, but not yet quite there. -Despite my fascination, I am also happy to admit that regular +My fascination non withstanding, I am also happy to admit that regular expressions have their shortcomings. There are some well-known ``theoretical'' shortcomings, for example recognising strings of the form $a^{n}b^{n}$. I am not so bothered by them. What I diff -r a61b50c5d57f -r 7975e4f0d4de handouts/ho02.pdf Binary file handouts/ho02.pdf has changed diff -r a61b50c5d57f -r 7975e4f0d4de handouts/ho02.tex --- a/handouts/ho02.tex Fri Dec 05 01:00:34 2014 +0000 +++ b/handouts/ho02.tex Fri Dec 05 17:13:33 2014 +0000 @@ -133,7 +133,7 @@ with these equivalences and non-equivalences. -For our matching algorithm however the following six +For our matching algorithm however the following seven equivalences will play an important role: \begin{center} @@ -333,7 +333,7 @@ \noindent Again the operation $Der$ might help to rationalise this algorithm. We want to know whether $abc \in L(r_1)$. We -do not know yet. But lets assume it is. Then $Der\,a\,L(r_1)$ +do not know yet. But let us assume it is. Then $Der\,a\,L(r_1)$ builds the set where all the strings not starting with $a$ are filtered out. Of the remaining strings, the $a$ is stripped off. Then we continue with filtering out all strings not @@ -492,7 +492,7 @@ size of regular expressions it needs to handle. This is of course obvious because both $nullable$ and $der$ need to traverse the whole regular expression. There seems to be one -more source of making the algorithm run faster. The derivative +more issue for making the algorithm run faster. The derivative function often produces ``useless'' $\varnothing$s and $\epsilon$s. To see this, consider $r = ((a \cdot b) + b)^*$ and the following two derivatives diff -r a61b50c5d57f -r 7975e4f0d4de handouts/ho03.pdf Binary file handouts/ho03.pdf has changed diff -r a61b50c5d57f -r 7975e4f0d4de handouts/ho03.tex --- a/handouts/ho03.tex Fri Dec 05 01:00:34 2014 +0000 +++ b/handouts/ho03.tex Fri Dec 05 17:13:33 2014 +0000 @@ -497,8 +497,7 @@ left-hand side is $q_1$ and the right-hand side $q_0\,a$. The right-hand side is essentially all possible ways how to end up in $q_1$. There is only one incoming edge from $q_0$ consuming -an $a$. Therefore we say: if we are in $q_0$ consuming an $a$ -then we end up in $q_1$. Therefore the right hand side is +an $a$. Therefore the right hand side is state followed by character---in this case $q_0\,a$. Now lets have a look at the third equation: there are two incoming edges. Therefore we have two terms, namely $q_1\,a$ and diff -r a61b50c5d57f -r 7975e4f0d4de mk --- a/mk Fri Dec 05 01:00:34 2014 +0000 +++ b/mk Fri Dec 05 17:13:33 2014 +0000 @@ -1,6 +1,6 @@ #!/bin/sh -subdirs="slides hws slides coursework handouts" +subdirs="slides handouts hws coursework" for sd in $subdirs; do cd $sd