afl-material: handouts/ho04.tex@3e5f5d19f514 (annotated)

669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	1	% !TEX program = xelatex
251 5b5a68df6d16 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	2	\documentclass{article}
5b5a68df6d16 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	3	\usepackage{../style}
5b5a68df6d16 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	4	\usepackage{../langs}
282 3e3b927a85cf added ho04 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	5	\usepackage{../graphics}
671 83e38043ed78 updated Christian Urban <urbanc@in.tum.de> parents: 669 diff changeset	6	\usepackage{skull}
251 5b5a68df6d16 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	7
5b5a68df6d16 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	8	\begin{document}
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	9	\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017, 2019}
251 5b5a68df6d16 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	10
282 3e3b927a85cf added ho04 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	11	\section*{Handout 4 (Sulzmann \& Lu Algorithm)}
3e3b927a85cf added ho04 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	12
716 df7d47a507f8 updated Christian Urban <urbanc@in.tum.de> parents: 671 diff changeset	13	So far our algorithm based on derivatives is only able to say
df7d47a507f8 updated Christian Urban <urbanc@in.tum.de> parents: 671 diff changeset	14	yes or no depending on whether a string is matched by a regular
282 3e3b927a85cf added ho04 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	15	expression or not. Often a more interesting question is to
3e3b927a85cf added ho04 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	16	find out \emph{how} a regular expression matched a string?
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	17	Answering this question will also help us with the problem we
319 e7b110f93697 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 296 diff changeset	18	are after, namely tokenising an input string.
282 3e3b927a85cf added ho04 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	19
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	20	The algorithm we will be looking at in this lecture was designed by Sulzmann
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	21	\& Lu in a rather recent research paper (from 2014). A link to it is
350 c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	22	provided on KEATS, in case you are interested.\footnote{In my
c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	23	humble opinion this is an interesting instance of the research
c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	24	literature: it contains a very neat idea, but its presentation
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	25	is rather sloppy. In earlier versions of this paper, a King's
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	26	undergraduate student and I found several rather annoying typos in the
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	27	examples and definitions.} My former PhD student Fahad Ausaf and I even more recently
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	28	wrote a paper where we build on their result: we provided a
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	29	mathematical proof that their algorithm is really correct---the proof
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	30	Sulzmann \& Lu had originally given contained major flaws. Such correctness
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	31	proofs are important: Kuklewicz maintains a unit-test library
520 2849c305b12d updated cu parents: 492 diff changeset	32	for the kind of algorithms we are interested in here and he showed
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	33	that many implementations in the ``wild'' are buggy, that is not
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	34	satisfy his unit tests:
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	35
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	36	\begin{center}
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	37	\url{http://www.haskell.org/haskellwiki/Regex_Posix}
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	38	\end{center}
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	39
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	40
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	41	Coming back to the algorithm by Sulzmann \& Lu, their idea is to
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	42	introduce \emph{values} for producing an answer for \emph{how} a regular
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	43	expression matches a string. So rather than a boolean like in the
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	44	Brzozowski algorithm, a value will be the output of the Sulzman \& Lu
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	45	algorithm whenever the regular expression matches the string. If the
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	46	string does not match the string, an error will be raised. The
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	47	definition for values is given below. They are shown together with the
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	48	regular expressions $r$ to which they correspond:
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	49
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	50	\begin{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	51	\begin{tabular}{cc}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	52	\begin{tabular}{@{}rrl@{}}
319 e7b110f93697 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 296 diff changeset	53	\multicolumn{3}{c}{regular expressions}\medskip\\
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	54	$r$ & $::=$ & $\ZERO$\\
e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	55	& $\mid$ & $\ONE$ \\
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	56	& $\mid$ & $c$ \\
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	57	& $\mid$ & $r_1 \cdot r_2$\\
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	58	& $\mid$ & $r_1 + r_2$ \\
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	59	\\
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	60	& $\mid$ & $r^*$ \\
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	61	\end{tabular}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	62	&
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	63	\begin{tabular}{@{\hspace{0mm}}rrl@{}}
319 e7b110f93697 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 296 diff changeset	64	\multicolumn{3}{c}{values}\medskip\\
284 0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	65	$v$ & $::=$ & \\
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	66	& & $Empty$ \\
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	67	& $\mid$ & $Char(c)$ \\
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	68	& $\mid$ & $Seq(v_1,v_2)$\\
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	69	& $\mid$ & $\Left(v)$ \\
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	70	& $\mid$ & $Right(v)$ \\
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	71	& $\mid$ & $Stars\,[v_1,\ldots\,v_n]$ \\
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	72	\end{tabular}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	73	\end{tabular}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	74	\end{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	75
417 e74c696821a2 updated Christian Urban <urbanc@in.tum.de> parents: 400 diff changeset	76	\noindent There is no value for the
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	77	$\ZERO$ regular expression, since it does not match any
284 0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	78	string. Otherwise there is exactly one value corresponding to
0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	79	each regular expression with the exception of $r_1 + r_2$
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	80	where there are two values, namely $\Left(v)$ and $Right(v)$
284 0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	81	corresponding to the two alternatives. Note that $r^*$ is
0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	82	associated with a list of values, one for each copy of $r$
0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	83	that was needed to match the string. This means we might also
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	84	return the empty list $Stars\,[]$, if no copy was needed
520 2849c305b12d updated cu parents: 492 diff changeset	85	for $r^*$. For sequence, there is exactly one value, composed
350 c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	86	of two component values ($v_1$ and $v_2$).
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	87
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	88	My implementation of regular expressions and values in Scala is
716 df7d47a507f8 updated Christian Urban <urbanc@in.tum.de> parents: 671 diff changeset	89	shown below. I use the convention that
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	90	regular expressions are written entirely with upper-case
716 df7d47a507f8 updated Christian Urban <urbanc@in.tum.de> parents: 671 diff changeset	91	letters, whereas values start with a single upper-case
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	92	character and the rest are lower-case letters.
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	93
490 4fee50f38305 updated Christian Urban <urbanc@in.tum.de> parents: 444 diff changeset	94	{\small\lstinputlisting[language=Scala,numbers=none,linebackgroundcolor=
4fee50f38305 updated Christian Urban <urbanc@in.tum.de> parents: 444 diff changeset	95	{\ifodd\value{lstnumber}\color{capri!3}\fi}]
352 1e1b0fe66107 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 350 diff changeset	96	{../progs/app01.scala}}
350 c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	97
c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	98
490 4fee50f38305 updated Christian Urban <urbanc@in.tum.de> parents: 444 diff changeset	99	{\small\lstinputlisting[language=Scala,numbers=none,linebackgroundcolor=
4fee50f38305 updated Christian Urban <urbanc@in.tum.de> parents: 444 diff changeset	100	{\ifodd\value{lstnumber}\color{capri!3}\fi}]
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	101	{../progs/app02.scala}}
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	102
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	103
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	104	Graphically the algorithm by Sulzmann \& Lu can be illustrated
284 0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	105	by the picture in Figure~\ref{Sulz} where the path from the
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	106	left to the right involving $\der/\nullable$ is the first phase
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	107	of the algorithm and $\textit{mkeps}/\inj$, the path from right to left,
284 0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	108	the second phase. This picture shows the steps required when a
0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	109	regular expression, say $r_1$, matches the string $abc$. We
0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	110	first build the three derivatives (according to $a$, $b$ and
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	111	$c$). We then use $\nullable$ to find out whether the resulting
296 796b9b81ac8d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 288 diff changeset	112	regular expression can match the empty string. If yes, we call
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	113	the function $\textit{mkeps}$. The $\textit{mkeps}$ function calculates a value for how a regular
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	114	expression has matched the empty string. Its definition
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	115	is as follows:
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	116
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	117	\begin{figure}[t]
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	118	\begin{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	119	\begin{tikzpicture}[scale=2,node distance=1.2cm,
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	120	every node/.style={minimum size=7mm}]
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	121	\node (r1) {$r_1$};
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	122	\node (r2) [right=of r1]{$r_2$};
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	123	\draw[->,line width=1mm](r1)--(r2) node[above,midway] {$\der\,a$};
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	124	\node (r3) [right=of r2]{$r_3$};
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	125	\draw[->,line width=1mm](r2)--(r3) node[above,midway] {$\der\,b$};
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	126	\node (r4) [right=of r3]{$r_4$};
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	127	\draw[->,line width=1mm](r3)--(r4) node[above,midway] {$\der\,c$};
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	128	\draw (r4) node[anchor=west] {\;\raisebox{3mm}{$\nullable$}};
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	129	\node (v4) [below=of r4]{$v_4$};
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	130	\draw[->,line width=1mm](r4) -- (v4);
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	131	\node (v3) [left=of v4] {$v_3$};
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	132	\draw[->,line width=1mm](v4)--(v3) node[below,midway] {$\inj\,c$};
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	133	\node (v2) [left=of v3]{$v_2$};
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	134	\draw[->,line width=1mm](v3)--(v2) node[below,midway] {$\inj\,b$};
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	135	\node (v1) [left=of v2] {$v_1$};
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	136	\draw[->,line width=1mm](v2)--(v1) node[below,midway] {$\inj\,a$};
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	137	\draw (r4) node[anchor=north west] {\;\raisebox{-8mm}{$\textit{mkeps}$}};
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	138	\end{tikzpicture}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	139	\end{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	140	\caption{The two phases of the algorithm by Sulzmann \& Lu.
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	141	\label{Sulz}}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	142	\end{figure}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	143
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	144
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	145
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	146	\begin{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	147	\begin{tabular}{lcl}
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	148	$\textit{mkeps}(\ONE)$ & $\dn$ & $Empty$\\
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	149	$\textit{mkeps}(r_1 + r_2)$ & $\dn$ & if $\nullable(r_1)$ \\
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	150	& & then $\Left(\textit{mkeps}(r_1))$\\
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	151	& & else $Right(\textit{mkeps}(r_2))$\\
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	152	$\textit{mkeps}(r_1 \cdot r_2)$ & $\dn$ & $Seq(\textit{mkeps}(r_1),\textit{mkeps}(r_2))$\\
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	153	$\textit{mkeps}(r^*)$ & $\dn$ & $Stars\,[]$ \\
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	154	\end{tabular}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	155	\end{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	156
596 6d6e79f95933 updated Christian Urban <urbanc@in.tum.de> parents: 551 diff changeset	157	\noindent There are no cases for $\ZERO$ and $c$, since
284 0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	158	these regular expression cannot match the empty string. Note
0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	159	also that in case of alternatives we give preference to the
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	160	regular expression on the left-hand side. This will become
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	161	important later on.
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	162
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	163	The second phase of the algorithm is organised so that it will
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	164	calculate a value for how the derivative regular expression
350 c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	165	has matched a string. For this we need a function that
c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	166	reverses this ``chopping off'' for values which we did in the
c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	167	first phase for derivatives. The corresponding function is
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	168	called $\inj$ (short for injection). This function takes three
350 c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	169	arguments: the first one is a regular expression for which we
c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	170	want to calculate the value, the second is the character we
c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	171	want to inject and the third argument is the value where we
c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	172	will inject the character into. The result of this function is a
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	173	new value. The definition of $\inj$ is as follows:
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	174
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	175	\begin{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	176	\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	177	$\inj\,(c)\,c\,Empty$ & $\dn$ & $Char\,c$\\
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	178	$\inj\,(r_1 + r_2)\,c\,\Left(v)$ & $\dn$ & $\Left(\inj\,r_1\,c\,v)$\\
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	179	$\inj\,(r_1 + r_2)\,c\,Right(v)$ & $\dn$ & $Right(\inj\,r_2\,c\,v)$\\
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	180	$\inj\,(r_1 \cdot r_2)\,c\,Seq(v_1,v_2)$ & $\dn$ & $Seq(\inj\,r_1\,c\,v_1,v_2)$\\
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	181	$\inj\,(r_1 \cdot r_2)\,c\,\Left(Seq(v_1,v_2))$ & $\dn$ & $Seq(\inj\,r_1\,c\,v_1,v_2)$\\
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	182	$\inj\,(r_1 \cdot r_2)\,c\,Right(v)$ & $\dn$ & $Seq(\textit{mkeps}(r_1),\inj\,r_2\,c\,v)$\\
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	183	$\inj\,(r^*)\,c\,Seq(v,Stars\,vs)$ & $\dn$ & $Stars(\inj\,r\,c\,v\,::\,vs)$\\
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	184	\end{tabular}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	185	\end{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	186
284 0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	187	\noindent This definition is by recursion on the regular
0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	188	expression and by analysing the shape of the values. Therefore
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	189	there are three cases for sequence regular
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	190	expressions (for all possible shapes of the value we can encounter). The last
350 c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	191	clause for the star regular expression returns a list where
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	192	the first element is $\inj\,r\,c\,v$ and the other elements are
350 c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	193	$vs$. That means $\_\,::\,\_$ should be read as list cons.
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	194
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	195	To understand what is going on, it might be best to do some
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	196	example calculations and compare them with Figure~\ref{Sulz}.
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	197	For this note that we have not yet dealt with the need of
284 0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	198	simplifying regular expressions (this will be a topic on its
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	199	own later). Suppose the regular expression is $a \cdot (b
284 0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	200	\cdot c)$ and the input string is $abc$. The derivatives from
0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	201	the first phase are as follows:
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	202
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	203	\begin{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	204	\begin{tabular}{ll}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	205	$r_1$: & $a \cdot (b \cdot c)$\\
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	206	$r_2$: & $\ONE \cdot (b \cdot c)$\\
e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	207	$r_3$: & $(\ZERO \cdot (b \cdot c)) + (\ONE \cdot c)$\\
e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	208	$r_4$: & $(\ZERO \cdot (b \cdot c)) + ((\ZERO \cdot c) + \ONE)$\\
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	209	\end{tabular}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	210	\end{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	211
284 0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	212	\noindent According to the simple algorithm, we would test
350 c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	213	whether $r_4$ is nullable, which in this case it indeed is.
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	214	This means we can use the function $\textit{mkeps}$ to calculate a
350 c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	215	value for how $r_4$ was able to match the empty string.
c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	216	Remember that this function gives preference for alternatives
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	217	on the left-hand side. However there is only $\ONE$ on the
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	218	very right-hand side of $r_4$ (underlined)
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	219
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	220	\begin{center}
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	221	$r_4$:\;$(\ZERO \cdot (b \cdot c)) +
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	222	((\ZERO \cdot c) + \underline{\ONE})$\\
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	223	\end{center}
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	224
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	225	\noindent
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	226	that matches the empty string.
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	227	Therefore $\textit{mkeps}$ returns the value
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	228
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	229	\begin{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	230	$v_4:\;Right(Right(Empty))$
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	231	\end{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	232
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	233	\noindent If there had been a $\ONE$ on the left, then
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	234	$\textit{mkeps}$ would have returned something of the form
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	235	$\Left(\ldots)$. The point is that from this value we can
350 c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	236	directly read off which part of $r_4$ matched the empty
c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	237	string: take the right-alternative first, and then the
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	238	right-alternative again, then you got to the $\ONE$.
326 94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	239
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	240	Next we have to ``inject'' the last character, that is $c$ in
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	241	the running example, into this value $v_4$ in order to
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	242	calculate how $r_3$ could have matched the string $c$.
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	243	For this we call injection with $\inj(r_3, c, v_4)$.
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	244	According to the definition of $\inj$ we obtain
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	245
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	246	\begin{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	247	$v_3:\;Right(Seq(Empty, Char(c)))$
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	248	\end{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	249
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	250	\noindent This is the correct result, because $r_3$ needs
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	251	to use the right-hand alternative, and then $\ONE$ needs
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	252	to match the empty string and $c$ needs to match $c$.
596 6d6e79f95933 updated Christian Urban <urbanc@in.tum.de> parents: 551 diff changeset	253	Next we need to inject back the letter $b$ into $v_3$.
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	254	For this we call injection with $\inj(r_2, b, v_3)$.
596 6d6e79f95933 updated Christian Urban <urbanc@in.tum.de> parents: 551 diff changeset	255	This gives
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	256
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	257	\begin{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	258	$v_2:\;Seq(Empty, Seq(Char(b), Char(c)))$
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	259	\end{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	260
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	261	\noindent which is again the correct result for how $r_2$
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	262	matched the string $bc$. Finally we need to inject back the
596 6d6e79f95933 updated Christian Urban <urbanc@in.tum.de> parents: 551 diff changeset	263	letter $a$ into $v_2$ giving the final result.
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	264	For this we call injection with $\inj(r_1, a, v_2)$
596 6d6e79f95933 updated Christian Urban <urbanc@in.tum.de> parents: 551 diff changeset	265	and obtain
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	266
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	267	\begin{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	268	$v_1:\;Seq(Char(a), Seq(Char(b), Char(c)))$
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	269	\end{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	270
596 6d6e79f95933 updated Christian Urban <urbanc@in.tum.de> parents: 551 diff changeset	271	\noindent This value corresponds to how the regular expression $r_1$,
6d6e79f95933 updated Christian Urban <urbanc@in.tum.de> parents: 551 diff changeset	272	namely $a \cdot (b \cdot c)$, matched the string $abc$.
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	273
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	274
284 0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	275	There are a few auxiliary functions that are of interest
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	276	when analysing this algorithm. One is called \emph{flatten},
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	277	written $\|\_\|$, which extracts the string ``underlying'' a
284 0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	278	value. It is defined recursively as
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	279
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	280	\begin{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	281	\begin{tabular}{lcl}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	282	$\|Empty\|$ & $\dn$ & $[]$\\
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	283	$\|Char(c)\|$ & $\dn$ & $[c]$\\
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	284	$\|\Left(v)\|$ & $\dn$ & $\|v\|$\\
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	285	$\|Right(v)\|$ & $\dn$ & $\|v\|$\\
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	286	$\|Seq(v_1,v_2)\|$& $\dn$ & $\|v_1\| \,@\, \|v_2\|$\\
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	287	$\|Stars\,[v_1,\ldots ,v_n]\|$ & $\dn$ & $\|v_1\| \,@\ldots @\, \|v_n\|$\\
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	288	\end{tabular}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	289	\end{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	290
350 c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	291	\noindent Using flatten we can see what are the strings behind
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	292	the values calculated by $\textit{mkeps}$ and $\inj$. In our running
350 c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	293	example:
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	294
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	295	\begin{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	296	\begin{tabular}{ll}
284 0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	297	$\|v_4\|$: & $[]$\\
0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	298	$\|v_3\|$: & $c$\\
0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	299	$\|v_2\|$: & $bc$\\
0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	300	$\|v_1\|$: & $abc$
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	301	\end{tabular}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	302	\end{center}
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	303
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	304	\noindent This indicates that $\inj$ indeed is injecting, or
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	305	adding, back a character into the value.
350 c4e7caa06c74 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 326 diff changeset	306
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	307	The definition of $\inj$ might still be very puzzling and each clause
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	308	might look arbitrary, but there is in fact a relatively simple idea
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	309	behind it. Ultimately the $\inj$-functions is determined by the
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	310	derivative functions. For this consider one of the ``squares'' from
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	311	Figure~\ref{Sulz}:
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	312
720 ecbed0155f72 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 716 diff changeset	313	\begin{center}
ecbed0155f72 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 716 diff changeset	314	\begin{tikzpicture}[scale=2,node distance=1.2cm,
ecbed0155f72 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 716 diff changeset	315	every node/.style={minimum size=7mm}]
ecbed0155f72 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 716 diff changeset	316	\node (r) {$r$};
ecbed0155f72 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 716 diff changeset	317	\node (rd) [right=of r]{$r_{der}$};
ecbed0155f72 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 716 diff changeset	318	\draw[->,line width=1mm](r)--(rd) node[above,midway] {$\der\,c$};
ecbed0155f72 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 716 diff changeset	319	\node (vd) [below=of r2]{$v_{der}$};
ecbed0155f72 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 716 diff changeset	320	\draw[->,line width=1mm](rd) -- (vd);
ecbed0155f72 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 716 diff changeset	321	\node (v) [left=of vd] {$v$};
ecbed0155f72 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 716 diff changeset	322	\draw[->,line width=1mm](vd)--(v) node[below,midway] {$\inj\,c$};
ecbed0155f72 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 716 diff changeset	323	\draw[->,line width=0.5mm,dotted](r) -- (v) node[left,midway,red] {\bf ?};
ecbed0155f72 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 716 diff changeset	324	\end{tikzpicture}
ecbed0155f72 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 716 diff changeset	325	\end{center}
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	326
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	327	\noindent
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	328	The input to the $\inj$-function is $r$ (on the top left), $c$ (the
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	329	character to be injected) and $v_{der}$ (bottom right). The output is
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	330	the value $v$ for how the regular expression $r$ matched the
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	331	corresponding string. How does $\inj$ calculate this value? Well, it has
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	332	to analyse the value $v_{der}$ and transform it into the value $v$ for
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	333	the regular expression $r$. The value $v_{der}$ corresponds to the
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	334	$r_{der}$-regular expression which is the derivative of $r$. Remember
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	335	that $v_{der}$ is the value for how $r_{der}$ matches the corresponding string
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	336	where $c$ has been chopped off.
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	337
720 ecbed0155f72 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 716 diff changeset	338	For a concrete example, let $r$ be $r_1 + r_2$. Then $r_{der}$
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	339	is of the form $(\der\,c\,r_1) + (\der\,c\,r_2)$. What are the possible
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	340	values corresponding to $r_{der}$? Well, they can be only of the form
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	341	$\Left(\ldots)$ and $\Right(\ldots)$. Therefore you have two
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	342	cases in the $\inj$ function -- one for $\Left$ and one for $\Right$.
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	343
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	344	\begin{center}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	345	\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{2mm}}l}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	346	$\inj\,(r_1 + r_2)\,c\,\,\Left(v)$ & $\dn$ & $\ldots$\\
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	347	$\inj\,(r_1 + r_2)\,c\,\,\Right(v)$ & $\dn$ & $\ldots$\\
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	348	\end{tabular}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	349	\end{center}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	350
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	351	\noindent
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	352	Let's look next at the sequence case where $r = r_1 \cdot r_2$. What does the
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	353	derivative of $r$ look like? According to the definition it is:
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	354
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	355	\begin{center}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	356	\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	357	$\der\, c\, (r_1 \cdot r_2)$ & $\dn$ & if $\nullable (r_1)$\\
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	358	& & then $(\der\,c\,r_1) \cdot r_2 + \der\, c\, r_2$\\
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	359	& & else $(\der\, c\, r_1) \cdot r_2$\\
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	360	\end{tabular}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	361	\end{center}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	362
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	363	\noindent As you can see there is a derivative in the if-branch and another
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	364	in the else-branch. In the if-branch we have an alternative of the form
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	365	$\_ + \_$. We already know what the possible values are for such a regular
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	366	expression, namely $\Left$ or $\Right$. Therefore we have in $\inj$ the
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	367	two cases:
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	368
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	369	\begin{center}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	370	\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{2mm}}l}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	371	$\inj\,(r_1 \cdot r_2)\,c\,\,\Left(Seq(v_1,v_2))$ & $\dn$ & $\ldots$\\
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	372	$\inj\,(r_1 \cdot r_2)\,c\,\,Right(v)$ & $\dn$ & $\ldots$\\
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	373	\end{tabular}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	374	\end{center}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	375
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	376	\noindent
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	377	In the first case we even know that it is not just a $\Left$-value, but
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	378	$\Left(Seq(\ldots))$, because the corresponding regular expression
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	379	in the derivation is \mbox{$(\der\,c\,r_1) \cdot r_2$}. Hence we know
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	380	it must be a $Seq$-value enclosed inside $\Left$.
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	381	The third clause for $r_1\cdot r_2$ in the $\inj$-function
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	382
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	383	\begin{center}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	384	\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	385	$\inj\,(r_1 \cdot r_2)\,c\,\,Seq(v_1,v_2)$ & $\dn$ & $\ldots$\\
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	386	\end{tabular}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	387	\end{center}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	388
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	389	\noindent corresponds to the else-branch in the derivative. In this
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	390	case we know the derivative is of the form $(\der\,c\,r_1) \cdot r_2$ and
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	391	therefore the value must be of the form $Seq(\ldots)$.
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	392
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	393	Hopefully the $inj$-function makes now more sense. I let you explain
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	394	the $r^$ case. What do the derivative of $r^$ and
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	395	the corresponding value look like? Does this explain the shape of
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	396	the clause?
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	397
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	398	\begin{center}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	399	\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	400	$\inj\,(r^*)\,c\,\,Seq(v,Stars\,vs)$ & $\dn$ & $Stars(\inj\,r\,c\,v\,::\,vs)$\\
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	401	\end{tabular}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	402	\end{center}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	403
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	404	\noindent If yes, you made sense of the left-hand sides of
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	405	the $\inj$-definition.
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	406
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	407	How about the right-hand sides? Well, in the
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	408	$r^*$ case for example we have according to the square in the picture
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	409	above a value $v_{der}$ which says how the derivative $r_{der}$
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	410	matched the string. Since the derivative is of the form
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	411	$(\der\,c\,r) \cdot (r^*)$ the corresponding value is of the
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	412	form $Seq(v, Stars\,vs)$. But for $r^*$ we are looking for a value
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	413	for the original (top left) regular expression --- so it cannot
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	414	be a value of the shape $Seq(\ldots, Stars\ldots)$ because that is the
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	415	shape for sequence regular expressions. What we need is a value
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	416	of the form $Stars \ldots$ (remember the correspondence between
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	417	values and regular expressions). This suggests to define the right hand
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	418	side as
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	419
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	420	\begin{center}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	421	$\ldots \dn Stars(\inj\,r\,c\,v\,::\,vs)$
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	422	\end{center}
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	423
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	424	\noindent It is a value of the right shape, namely $Stars$. It injected
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	425	$c$ into the first-value, which is in fact the value where we need to
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	426	undo the derivative. Remember again the shape of the derivative of $r^*$.
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	427	In place where we chopped off the $c$, we now need to do the $\inj$ of $c$.
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	428	Therefore $\inj\,r\,c\,v$ in the definition above. That is the same with
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	429	the other clauses in $\inj$.
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	430
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	431
671 83e38043ed78 updated Christian Urban <urbanc@in.tum.de> parents: 669 diff changeset	432	Phew\ldots{}Sweat\ldots!\#@$\skull$\%\ldots Unfortunately, there is
83e38043ed78 updated Christian Urban <urbanc@in.tum.de> parents: 669 diff changeset	433	a gigantic problem with the described algorithm so far: it is very
720 ecbed0155f72 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 716 diff changeset	434	slow. To make it faster, we have to include in all this the simplification
ecbed0155f72 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 716 diff changeset	435	from Lecture 2\ldots{}and what rotten luck: simplification messes things
ecbed0155f72 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 716 diff changeset	436	up and we need to rectify the mess. This is what we shall do next.
282 3e3b927a85cf added ho04 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	437
3e3b927a85cf added ho04 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	438
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	439	\subsubsection*{Simplification}
282 3e3b927a85cf added ho04 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	440
283 c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	441	Generally the matching algorithms based on derivatives do
c14e5ebf0c3b updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 282 diff changeset	442	poorly unless the regular expressions are simplified after
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	443	each derivative step. But this is a bit more involved in the
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	444	algorithm of Sulzmann \& Lu. So what follows might require you
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	445	to read several times before it makes sense and also might
326 94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	446	require that you do some example calculations yourself. As a
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	447	first example consider the last derivation step in our earlier
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	448	example:
284 0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	449
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	450	\begin{equation}\label{regexfour}
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	451	r_4 = \der\,c\,r_3 =
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	452	(\ZERO \cdot (b \cdot c)) + ((\ZERO \cdot c) + \ONE)
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	453	\end{equation}
284 0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	454
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	455	\noindent Simplifying this regular expression would just give
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	456	us $\ONE$. Running $\textit{mkeps}$ with this $\ONE$ as
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	457	input, however, would give us with the value $Empty$ instead of
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	458
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	459	\[Right(Right(Empty))\]
284 0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	460
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	461	\noindent
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	462	that was obtained without the simplification. The problem is we need
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	463	to recreate this more complicated value, rather than just return
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	464	$Empty$. This will require what I call \emph{rectification functions}.
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	465	They need to be calculated whenever a regular expression gets
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	466	simplified.
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	467
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	468	Rectification functions take a value as argument and return a
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	469	(rectified) value. In the example above the argument would be $Empty$
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	470	and the output $Right(Right(Empty))$. Before we define these
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	471	rectification functions in general, let us first take a look again at
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	472	our simplification rules:
284 0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	473
0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	474	\begin{center}
0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	475	\begin{tabular}{l@{\hspace{2mm}}c@{\hspace{2mm}}l}
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	476	$r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\
e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	477	$\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\
e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	478	$r \cdot \ONE$ & $\mapsto$ & $r$\\
e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	479	$\ONE \cdot r$ & $\mapsto$ & $r$\\
e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	480	$r + \ZERO$ & $\mapsto$ & $r$\\
e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	481	$\ZERO + r$ & $\mapsto$ & $r$\\
284 0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	482	$r + r$ & $\mapsto$ & $r$\\
0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	483	\end{tabular}
0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	484	\end{center}
0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	485
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	486	\noindent Applying them to $r_4$ in \eqref{regexfour} will require several nested
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	487	simplifications in order end up with just $\ONE$. However,
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	488	it is possible to apply them in a depth-first, or inside-out,
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	489	manner in order to calculate this simplified regular
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	490	expression.
285 8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	491
356 d9c784c71305 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 352 diff changeset	492	The rectification we can implement by letting simp return not
d9c784c71305 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 352 diff changeset	493	just a (simplified) regular expression, but also a
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	494	rectification function. Let us consider the alternative case,
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	495	$r_1 + r_2$, first. By going depth-first, we first simplify
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	496	the component regular expressions $r_1$ and $r_2.$ This will
356 d9c784c71305 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 352 diff changeset	497	return simplified versions, say $r_{1s}$ and $r_{2s}$, of the
d9c784c71305 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 352 diff changeset	498	component regular expressions (if they can be simplified) but
d9c784c71305 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 352 diff changeset	499	also two rectification functions $f_{1s}$ and $f_{2s}$. We
d9c784c71305 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 352 diff changeset	500	need to assemble them in order to obtain a rectified value for
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	501	$r_1 + r_2$. In case $r_{1s}$ simplified to $\ZERO$, we
356 d9c784c71305 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 352 diff changeset	502	continue the derivative calculation with $r_{2s}$. The
d9c784c71305 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 352 diff changeset	503	Sulzmann \& Lu algorithm would return a corresponding value,
d9c784c71305 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 352 diff changeset	504	say $v_{2s}$. But now this value needs to be ``rectified'' to
d9c784c71305 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 352 diff changeset	505	the value
285 8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	506
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	507	\begin{center}
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	508	$Right(v_{2s})$
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	509	\end{center}
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	510
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	511	\noindent The reason is that we look for the value that tells
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	512	us how $r_1 + r_2$ could have matched the string, not just
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	513	$r_2$ or $r_{2s}$. Unfortunately, this is still not the right
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	514	value in general because there might be some simplifications
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	515	that happened inside $r_2$ and for which the simplification
520 2849c305b12d updated cu parents: 492 diff changeset	516	function returned also a rectification function $f_{2s}$. So in
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	517	fact we need to apply this one too which gives
285 8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	518
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	519	\begin{center}
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	520	$Right(f_{2s}(v_{2s}))$
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	521	\end{center}
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	522
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	523	\noindent This is now the correct, or rectified, value. Since
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	524	the simplification will be done in the first phase of the
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	525	algorithm, but the rectification needs to be done to the
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	526	values in the second phase, it is advantageous to calculate
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	527	the rectification as a function, remember this function and
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	528	then apply the value to this function during the second phase.
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	529	So if we want to implement the rectification as function, we
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	530	would need to return
285 8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	531
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	532	\begin{center}
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	533	$\lambda v.\,Right(f_{2s}(v))$
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	534	\end{center}
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	535
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	536	\noindent which is the lambda-calculus notation for
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	537	a function that expects a value $v$ and returns everything
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	538	after the dot where $v$ is replaced by whatever value is
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	539	given.
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	540
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	541	Let us package this idea with rectification functions
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	542	into a single function (still only considering the alternative
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	543	case):
284 0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	544
285 8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	545	\begin{center}
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	546	\begin{tabular}{ll}
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	547	$_1$ & $simp(r)$:\\
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	548	$_2$ & \quad case $r = r_1 + r_2$\\
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	549	$_3$ & \qquad let $(r_{1s}, f_{1s}) = simp(r_1)$\\
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	550	$_4$ & \qquad \phantom{let} $(r_{2s}, f_{2s}) = simp(r_2)$\smallskip\\
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	551	$_5$ & \qquad case $r_{1s} = \ZERO$:
285 8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	552	return $(r_{2s}, \lambda v. \,Right(f_{2s}(v)))$\\
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	553	$_6$ & \qquad case $r_{2s} = \ZERO$:
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	554	return $(r_{1s}, \lambda v. \,\Left(f_{1s}(v)))$\\
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	555	$_7$ & \qquad case $r_{1s} = r_{2s}$:
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	556	return $(r_{1s}, \lambda v. \,\Left(f_{1s}(v)))$\\
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	557	$_8$ & \qquad otherwise:
285 8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	558	return $(r_{1s} + r_{2s}, f_{alt}(f_{1s}, f_{2s}))$\\
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	559	\end{tabular}
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	560	\end{center}
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	561
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	562	\noindent We first recursively call the simplification with
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	563	$r_1$ and $r_2$ (Lines 3 and 4). This gives simplified regular expressions,
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	564	$r_{1s}$ and $r_{2s}$, as well as two rectification functions
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	565	$f_{1s}$ and $f_{2s}$. We next need to test whether the
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	566	simplified regular expressions are $\ZERO$ so as to make
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	567	further simplifications. In case $r_{1s}$ is $\ZERO$ (Line 5),
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	568	then we can return $r_{2s}$ (the other alternative). However
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	569	for this we need to build a corresponding rectification
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	570	function, which as said above is
285 8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	571
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	572	\begin{center}
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	573	$\lambda v.\,Right(f_{2s}(v))$
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	574	\end{center}
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	575
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	576	\noindent The case where $r_{2s} = \ZERO$ is similar:
e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	577	We return $r_{1s}$ and rectify with $\Left(\_)$ and the
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	578	other calculated rectification function $f_{1s}$. This gives
284 0afe43616b6a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 283 diff changeset	579
285 8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	580	\begin{center}
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	581	$\lambda v.\,\Left(f_{1s}(v))$
285 8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	582	\end{center}
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	583
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	584	\noindent The next case where $r_{1s} = r_{2s}$ can be treated
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	585	like the one where $r_{2s} = \ZERO$. We return $r_{1s}$
e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	586	and rectify with $\Left(\_)$ and so on.
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	587
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	588
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	589	The otherwise-case is slightly more complicated. In this case
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	590	neither $r_{1s}$ nor $r_{2s}$ are $\ZERO$ and also
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	591	$r_{1s} \not= r_{2s}$, which means no further simplification
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	592	can be applied. Accordingly, we return $r_{1s} + r_{2s}$ as
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	593	the simplified regular expression. In principle we also do not
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	594	have to do any rectification, because no simplification was
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	595	done in this case. But this is actually not true: There might
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	596	have been simplifications inside $r_{1s}$ and $r_{2s}$. We
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	597	therefore need to take into account the calculated
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	598	rectification functions $f_{1s}$ and $f_{2s}$. We can do this
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	599	by defining a rectification function $f_{alt}$ which takes two
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	600	rectification functions as arguments and applies them
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	601	according to whether the value is of the form $\Left(\_)$ or
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	602	$Right(\_)$:
285 8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	603
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	604	\begin{center}
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	605	\begin{tabular}{l@{\hspace{1mm}}l}
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	606	$f_{alt}(f_1, f_2) \dn$\\
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	607	\qquad $\lambda v.\,$ case $v = \Left(v')$:
e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	608	& return $\Left(f_1(v'))$\\
285 8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	609	\qquad \phantom{$\lambda v.\,$} case $v = Right(v')$:
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	610	& return $Right(f_2(v'))$\\
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	611	\end{tabular}
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	612	\end{center}
8a222559278f updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 284 diff changeset	613
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	614	The other interesting case with simplification is the sequence
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	615	case. In this case the main simplification function is as
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	616	follows
286 19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	617
19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	618	\begin{center}
19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	619	\begin{tabular}{l}
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	620	$simp(r)$:\hspace{5cm} (continued)\\
286 19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	621	\quad case $r = r_1 \cdot r_2$\\
19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	622	\qquad let $(r_{1s}, f_{1s}) = simp(r_1)$\\
19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	623	\qquad \phantom{let} $(r_{2s}, f_{2s}) = simp(r_2)$\smallskip\\
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	624	\qquad case $r_{1s} = \ZERO$:
e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	625	return $(\ZERO, f_{error})$\\
e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	626	\qquad case $r_{2s} = \ZERO$:
e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	627	return $(\ZERO, f_{error})$\\
e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	628	\qquad case $r_{1s} = \ONE$:
286 19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	629	return $(r_{2s}, \lambda v. \,Seq(f_{1s}(Empty), f_{2s}(v)))$\\
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	630	\qquad case $r_{2s} = \ONE$:
286 19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	631	return $(r_{1s}, \lambda v. \,Seq(f_{1s}(v), f_{2s}(Empty)))$\\
19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	632	\qquad otherwise:
19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	633	return $(r_{1s} \cdot r_{2s}, f_{seq}(f_{1s}, f_{2s}))$\\
19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	634	\end{tabular}
19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	635	\end{center}
19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	636
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	637	\noindent whereby in the last line $f_{seq}$ is again pushing
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	638	the two rectification functions into the two components of the
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	639	Seq-value:
286 19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	640
19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	641	\begin{center}
19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	642	\begin{tabular}{l@{\hspace{1mm}}l}
19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	643	$f_{seq}(f_1, f_2) \dn$\\
19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	644	\qquad $\lambda v.\,$ case $v = Seq(v_1, v_2)$:
19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	645	& return $Seq(f_1(v_1), f_2(v_2))$\\
19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	646	\end{tabular}
19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	647	\end{center}
19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	648
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	649	\noindent Note that in the case of $r_{1s} = \ZERO$
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	650	(similarly $r_{2s}$) we use the function $f_{error}$ for
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	651	rectification. If you think carefully, then you will realise
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	652	that this function will actually never been called. This is
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	653	because a sequence with $\ZERO$ will never recognise any
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	654	string and therefore the second phase of the algorithm would
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	655	never been called. The simplification function still expects
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	656	us to give a function. So in my own implementation I just
326 94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	657	returned a function that raises an error. In the case
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	658	where $r_{1s} = \ONE$ (similarly $r_{2s}$) we have
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	659	to create a sequence where the first component is a rectified
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	660	version of $Empty$. Therefore we call $f_{1s}$ with $Empty$.
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	661
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	662	Since we only simplify regular expressions of the form $r_1 +
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	663	r_2$ and $r_1 \cdot r_2$ we do not have to do anything else
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	664	in the remaining cases. The rectification function will
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	665	be just the identity, which in lambda-calculus terms is
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	666	just
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	667
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	668	\begin{center}
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	669	$\lambda v.\,v$
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	670	\end{center}
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	671
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	672	\noindent This completes the high-level version of the simplification
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	673	function, which is shown again in Figure~\ref{simp}. The Scala code
444 3056a4c071b0 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 422 diff changeset	674	for the simplification function is in Figure~\ref{simprect}.
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	675
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	676	\begin{figure}[t]
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	677	\begin{center}
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	678	\begin{tabular}{l}
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	679	$simp(r)$:\\
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	680	\quad case $r = r_1 + r_2$\\
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	681	\qquad let $(r_{1s}, f_{1s}) = simp(r_1)$\\
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	682	\qquad \phantom{let} $(r_{2s}, f_{2s}) = simp(r_2)$\smallskip\\
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	683	\qquad case $r_{1s} = \ZERO$:
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	684	return $(r_{2s}, \lambda v. \,Right(f_{2s}(v)))$\\
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	685	\qquad case $r_{2s} = \ZERO$:
e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	686	return $(r_{1s}, \lambda v. \,\Left(f_{1s}(v)))$\\
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	687	\qquad case $r_{1s} = r_{2s}$:
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	688	return $(r_{1s}, \lambda v. \,\Left(f_{1s}(v)))$\\
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	689	\qquad otherwise:
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	690	return $(r_{1s} + r_{2s}, f_{alt}(f_{1s}, f_{2s}))$
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	691	\medskip\\
286 19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	692
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	693	\quad case $r = r_1 \cdot r_2$\\
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	694	\qquad let $(r_{1s}, f_{1s}) = simp(r_1)$\\
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	695	\qquad \phantom{let} $(r_{2s}, f_{2s}) = simp(r_2)$\smallskip\\
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	696	\qquad case $r_{1s} = \ZERO$:
e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	697	return $(\ZERO, f_{error})$\\
e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	698	\qquad case $r_{2s} = \ZERO$:
e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	699	return $(\ZERO, f_{error})$\\
e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	700	\qquad case $r_{1s} = \ONE$:
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	701	return $(r_{2s}, \lambda v. \,Seq(f_{1s}(Empty), f_{2s}(v)))$\\
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	702	\qquad case $r_{2s} = \ONE$:
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	703	return $(r_{1s}, \lambda v. \,Seq(f_{1s}(v), f_{2s}(Empty)))$\\
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	704	\qquad otherwise:
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	705	return $(r_{1s} \cdot r_{2s}, f_{seq}(f_{1s}, f_{2s}))$
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	706	\medskip\\
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	707
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	708	\quad otherwise:\\
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	709	\qquad return $(r, \lambda v.\,v)$\\
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	710	\end{tabular}
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	711	\end{center}
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	712	\caption{The simplification function that returns a simplified
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	713	regular expression and a rectification function.\label{simp}}
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	714	\end{figure}
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	715
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	716	\begin{figure}[p]
492 39b7ff2cf1bc updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	717	\small
490 4fee50f38305 updated Christian Urban <urbanc@in.tum.de> parents: 444 diff changeset	718	\lstinputlisting[numbers=left,linebackgroundcolor=
4fee50f38305 updated Christian Urban <urbanc@in.tum.de> parents: 444 diff changeset	719	{\ifodd\value{lstnumber}\color{capri!3}\fi}]{../progs/app61.scala}
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	720
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	721	\caption{The Scala code for the simplification function. The
520 2849c305b12d updated cu parents: 492 diff changeset	722	first part defines some auxiliary functions for the rectification.
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	723	The second part give the simplification function.
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	724	\label{simprect}}
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	725	\end{figure}
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	726
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	727
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	728	We are now in the position to define a \emph{lexing function} as follows:
5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	729
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	730	\begin{center}
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	731	\begin{tabular}{lcl}
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	732	$lex\,r\,[]$ & $\dn$ & if $\nullable(r)$ then $\textit{mkeps}(r)$\\
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	733	& & else $error$\\
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	734	$lex\,r\,c\!::\!s$ & $\dn$ & let
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	735	$(r_{simp}, f_{rect}) = simp(\der(c, r))$\\
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	736	& & $\inj\,r\,c\,f_{rect}(lex\,r_{simp}\,s)$
287 2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	737	\end{tabular}
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	738	\end{center}
2c50b8b5886c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 286 diff changeset	739
326 94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	740	\noindent This corresponds to the $matches$ function we have
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	741	seen in earlier lectures. In the first clause we are given an
520 2849c305b12d updated cu parents: 492 diff changeset	742	empty string, $[]$, and need to test whether the regular
326 94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	743	expression is $nullable$. If yes, we can proceed normally and
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	744	just return the value calculated by $\textit{mkeps}$. The second clause
326 94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	745	is for strings where the first character is $c$, say, and the
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	746	rest of the string is $s$. We first build the derivative of
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	747	$r$ with respect to $c$; simplify the resulting regular
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	748	expression. We continue lexing with the simplified regular
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	749	expression and the string $s$. Whatever will be returned as
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	750	value, we sill need to rectify using the $f_{rect}$ from the
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	751	simplification and finally inject $c$ back into the
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	752	(rectified) value.
286 19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	753
19020b75d75e updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 285 diff changeset	754
288 39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	755	\subsubsection*{Records}
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	756
326 94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	757	Remember we wanted to tokenize input strings, that means
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	758	splitting strings into their ``word'' components. Furthermore
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	759	we want to classify each token as being a keyword or
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	760	identifier and so on. For this one more feature will be
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	761	required, which I call a \emph{record} regular expression.
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	762	While values encode how a regular expression matches a string,
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	763	records can be used to ``focus'' on some particular parts of
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	764	the regular expression and ``forget'' about others.
288 39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	765
326 94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	766	Let us look at an example. Suppose you have the regular
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	767	expression $a\cdot b + a\cdot c$. Clearly this regular expression can only
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	768	recognise two strings. But suppose you are not interested
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	769	whether it can recognise $ab$ or $ac$, but rather if it
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	770	matched, then what was the last character of the matched
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	771	string\ldots either $b$ or $c$. You can do this by annotating
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	772	the regular expression with a record, written in general
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	773	$(x:r)$, where $x$ is just an identifier (in my implementation
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	774	a plain string) and $r$ is a regular expression. A record will
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	775	be regarded as a regular expression. The extended definition
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	776	in Scala therefore looks as follows:
288 39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	777
490 4fee50f38305 updated Christian Urban <urbanc@in.tum.de> parents: 444 diff changeset	778	{\small\lstinputlisting[language=Scala, numbers=none,linebackgroundcolor=
4fee50f38305 updated Christian Urban <urbanc@in.tum.de> parents: 444 diff changeset	779	{\ifodd\value{lstnumber}\color{capri!3}\fi}]
288 39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	780	{../progs/app03.scala}}
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	781
326 94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	782	\noindent Since we regard records as regular expressions we
669 2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	783	need to extend the functions $\nullable$ and $\der$. Similarly
2f5a4d76756d updated Christian Urban <urbanc@in.tum.de> parents: 643 diff changeset	784	$\textit{mkeps}$ and $\inj$ need to be extended. This means we also need
326 94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	785	to extend the definition of values, which in Scala looks as
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	786	follows:
288 39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	787
490 4fee50f38305 updated Christian Urban <urbanc@in.tum.de> parents: 444 diff changeset	788	{\small\lstinputlisting[language=Scala, numbers=none,linebackgroundcolor=
4fee50f38305 updated Christian Urban <urbanc@in.tum.de> parents: 444 diff changeset	789	{\ifodd\value{lstnumber}\color{capri!3}\fi}]
288 39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	790	{../progs/app04.scala}}
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	791
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	792	\noindent Let us now look at the purpose of records more
326 94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	793	closely and let us return to our question whether the string
288 39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	794	terminated in a $b$ or $c$. We can do this as follows: we
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	795	annotate the regular expression $ab + ac$ with a record
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	796	as follows
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	797
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	798	\begin{center}
326 94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	799	$a\cdot (x:b) + a\cdot (x:c)$
288 39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	800	\end{center}
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	801
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	802	\noindent This regular expression can still only recognise
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	803	the strings $ab$ and $ac$, but we can now use a function
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	804	that takes a value and returns all records. I call this
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	805	function \emph{env} for environment\ldots it builds a list
326 94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	806	of identifiers associated with a string. This function
288 39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	807	can be defined as follows:
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	808
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	809	\begin{center}
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	810	\begin{tabular}{lcl}
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	811	$env(Empty)$ & $\dn$ & $[]$\\
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	812	$env(Char(c))$ & $\dn$ & $[]$\\
400 e4afe3f46c29 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 394 diff changeset	813	$env(\Left(v))$ & $\dn$ & $env(v)$\\
288 39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	814	$env(Right(v))$ & $\dn$ & $env(v)$\\
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	815	$env(Seq(v_1,v_2))$& $\dn$ & $env(v_1) \,@\, env(v_2)$\\
422 5deefcc8cffa updated programs Christian Urban <urbanc@in.tum.de> parents: 417 diff changeset	816	$env(Stars\,[v_1,\ldots ,v_n])$ & $\dn$ &
288 39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	817	$env(v_1) \,@\ldots @\, env(v_n)$\\
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	818	$env(Rec(x:v))$ & $\dn$ & $(x:\|v\|) :: env(v)$\\
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	819	\end{tabular}
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	820	\end{center}
251 5b5a68df6d16 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	821
326 94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	822	\noindent where in the last clause we use the flatten function
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	823	defined earlier. As can be seen, the function $env$ ``picks''
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	824	out all underlying strings where a record is given. Since
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	825	there can be more than one, the environment will potentially
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	826	contain many ``records''. If we now postprocess the value
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	827	calculated by $lex$ extracting all records using $env$, we can
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	828	answer the question whether the last element in the string was
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	829	an $b$ or a $c$. Lets see this in action: if we use $a\cdot b
94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	830	+ a\cdot c$ and $ac$ the calculated value will be
288 39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	831
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	832	\begin{center}
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	833	$Right(Seq(Char(a), Char(c)))$
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	834	\end{center}
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	835
326 94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	836	\noindent If we use instead $a\cdot (x:b) + a\cdot (x:c)$ and
288 39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	837	use the $env$ function to extract the recording for
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	838	$x$ we obtain
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	839
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	840	\begin{center}
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	841	$[(x:c)]$
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	842	\end{center}
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	843
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	844	\noindent If we had given the string $ab$ instead, then the
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	845	record would have been $[(x:b)]$. The fun starts if we
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	846	iterate this. Consider the regular expression
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	847
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	848	\begin{center}
326 94700593a2d5 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 319 diff changeset	849	$(a\cdot (x:b) + a\cdot (y:c))^*$
288 39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	850	\end{center}
251 5b5a68df6d16 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	851
288 39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	852	\noindent and the string $ababacabacab$. This string is
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	853	clearly matched by the regular expression, but we are only
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	854	interested in the sequence of $b$s and $c$s. Using $env$
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	855	we obtain
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	856
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	857	\begin{center}
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	858	$[(x:b), (x:b), (y:c), (x:b), (y:c), (x:b)]$
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	859	\end{center}
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	860
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	861	\noindent While this feature might look silly, it is in fact
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	862	quite useful. For example if we want to match the name of
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	863	an email we might use the regular expression
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	864
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	865	\[
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	866	(name: [a\mbox{-}z0\mbox{-}9\_\!\_\,.-]^+)\cdot @\cdot
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	867	(domain: [a\mbox{-}z0\mbox{-}9\,.-]^+)\cdot .\cdot
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	868	(top\_level: [a\mbox{-}z\,.]^{\{2,6\}})
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	869	\]
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	870
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	871	\noindent Then if we match the email address
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	872
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	873	\[
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	874	\texttt{christian.urban@kcl.ac.uk}
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	875	\]
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	876
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	877	\noindent we can use the $env$ function and find out
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	878	what the name, domain and top-level part of the email
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	879	address are:
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	880
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	881	\begin{center}
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	882	$[(name:\texttt{christian.urban}),
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	883	(domain:\texttt{kcl}),
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	884	(top\_level:\texttt{ac.uk})]$
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	885	\end{center}
39aeca14af8c updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 287 diff changeset	886
721 e3c64f22dd31 added slides from Rochester Christian Urban <christian.urban@kcl.ac.uk> parents: 720 diff changeset	887	Recall that we want to lex a little programming language, called the
e3c64f22dd31 added slides from Rochester Christian Urban <christian.urban@kcl.ac.uk> parents: 720 diff changeset	888	\emph{While}-language. A simple program in this language is shown in
e3c64f22dd31 added slides from Rochester Christian Urban <christian.urban@kcl.ac.uk> parents: 720 diff changeset	889	Figure~\ref{while}. The main keywords in this language are
e3c64f22dd31 added slides from Rochester Christian Urban <christian.urban@kcl.ac.uk> parents: 720 diff changeset	890	\pcode{while}, \pcode{if}, \pcode{then} and
e3c64f22dd31 added slides from Rochester Christian Urban <christian.urban@kcl.ac.uk> parents: 720 diff changeset	891	\pcode{else}.\footnote{Contrast this with the COBOL programming
e3c64f22dd31 added slides from Rochester Christian Urban <christian.urban@kcl.ac.uk> parents: 720 diff changeset	892	language, which was developed around 1960 and thought to be dead for
e3c64f22dd31 added slides from Rochester Christian Urban <christian.urban@kcl.ac.uk> parents: 720 diff changeset	893	many decades---even your friendly lecturer is not old enough to have
e3c64f22dd31 added slides from Rochester Christian Urban <christian.urban@kcl.ac.uk> parents: 720 diff changeset	894	been taught this language. Anyway, this language had over 600
e3c64f22dd31 added slides from Rochester Christian Urban <christian.urban@kcl.ac.uk> parents: 720 diff changeset	895	keywords (or what they called \emph{reserved words}). Interestingly
e3c64f22dd31 added slides from Rochester Christian Urban <christian.urban@kcl.ac.uk> parents: 720 diff changeset	896	though this language is still used in 2020: during the height of
e3c64f22dd31 added slides from Rochester Christian Urban <christian.urban@kcl.ac.uk> parents: 720 diff changeset	897	Corona crisis the State of New Jewrsey in the US was looking for
e3c64f22dd31 added slides from Rochester Christian Urban <christian.urban@kcl.ac.uk> parents: 720 diff changeset	898	COBOL programers who could fix the state's national insurance
e3c64f22dd31 added slides from Rochester Christian Urban <christian.urban@kcl.ac.uk> parents: 720 diff changeset	899	webpage. You were probably paid in gold and diamonds, if you were
e3c64f22dd31 added slides from Rochester Christian Urban <christian.urban@kcl.ac.uk> parents: 720 diff changeset	900	able to program in COBOL. If you fixed their webpage, surely you
e3c64f22dd31 added slides from Rochester Christian Urban <christian.urban@kcl.ac.uk> parents: 720 diff changeset	901	were allowed to marry the governer's son/daughter. } As usual we
e3c64f22dd31 added slides from Rochester Christian Urban <christian.urban@kcl.ac.uk> parents: 720 diff changeset	902	have syntactic categories for identifiers, operators, numbers and so
e3c64f22dd31 added slides from Rochester Christian Urban <christian.urban@kcl.ac.uk> parents: 720 diff changeset	903	on. For this we would need to design the corresponding regular
e3c64f22dd31 added slides from Rochester Christian Urban <christian.urban@kcl.ac.uk> parents: 720 diff changeset	904	expressions to recognise these syntactic categories. I let you do this
e3c64f22dd31 added slides from Rochester Christian Urban <christian.urban@kcl.ac.uk> parents: 720 diff changeset	905	design task. Having these regular expressions at our disposal, we can
357 603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	906	form the regular expression
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	907
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	908	\begin{figure}[t]
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	909	\begin{center}
778 3e5f5d19f514 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 721 diff changeset	910	\mbox{\lstinputlisting[language=while]{../progs/while-tests/fib.while}}
357 603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	911	\end{center}
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	912	\caption{The Fibonacci program in the While-language.\label{while}}
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	913	\end{figure}
356 d9c784c71305 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 352 diff changeset	914
d9c784c71305 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 352 diff changeset	915	\begin{center}
357 603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	916	$\textit{WhileRegs} \;\dn\;
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	917	\left(\begin{array}{l}
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	918	(k, KeyWords)\; +\\
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	919	(i, Ids)\;+\\
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	920	(o, Ops)\;+ \\
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	921	(n, Nums)\;+ \\
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	922	(s, Semis)\;+ \\
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	923	(p, (LParens + RParens))\;+\\
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	924	(b, (Begin + End))\;+ \\
551 bd551ede2be6 updated Christian Urban <urbanc@in.tum.de> parents: 520 diff changeset	925	(w, WhiteSpaces)
394 2f9fe225ecc8 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 357 diff changeset	926	\end{array}\right)^{\mbox{\LARGE{}*}}$
357 603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	927	\end{center}
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	928
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	929	\noindent and ask the algorithm by Sulzmann \& Lu to lex, say
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	930	the following string
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	931
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	932	\begin{center}\large
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	933	\code{"if true then then 42 else +"}
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	934	\end{center}
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	935
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	936	\noindent
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	937	By using the records and extracting the environment, the
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	938	result is the following list:
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	939
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	940	\begin{center}\tt
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	941	\begin{tabular}{l}
643 08375ca3874e updated Christian Urban <urbanc@in.tum.de> parents: 596 diff changeset	942	(k, if),\\
08375ca3874e updated Christian Urban <urbanc@in.tum.de> parents: 596 diff changeset	943	(w, " "),\\
08375ca3874e updated Christian Urban <urbanc@in.tum.de> parents: 596 diff changeset	944	(i, true),\\
08375ca3874e updated Christian Urban <urbanc@in.tum.de> parents: 596 diff changeset	945	(w, " "),\\
08375ca3874e updated Christian Urban <urbanc@in.tum.de> parents: 596 diff changeset	946	(k, then),\\
08375ca3874e updated Christian Urban <urbanc@in.tum.de> parents: 596 diff changeset	947	(w, " "),\\
08375ca3874e updated Christian Urban <urbanc@in.tum.de> parents: 596 diff changeset	948	(k, then),\\
08375ca3874e updated Christian Urban <urbanc@in.tum.de> parents: 596 diff changeset	949	(w, " "),\\
08375ca3874e updated Christian Urban <urbanc@in.tum.de> parents: 596 diff changeset	950	(n, 42),\\
08375ca3874e updated Christian Urban <urbanc@in.tum.de> parents: 596 diff changeset	951	(w, " "),\\
08375ca3874e updated Christian Urban <urbanc@in.tum.de> parents: 596 diff changeset	952	(k, else),\\
08375ca3874e updated Christian Urban <urbanc@in.tum.de> parents: 596 diff changeset	953	(w, " "),\\
08375ca3874e updated Christian Urban <urbanc@in.tum.de> parents: 596 diff changeset	954	(o, +)
356 d9c784c71305 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 352 diff changeset	955	\end{tabular}
d9c784c71305 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 352 diff changeset	956	\end{center}
d9c784c71305 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 352 diff changeset	957
357 603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	958	\noindent Typically we are not interested in the whitespaces
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	959	and comments and would filter them out: this gives
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	960
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	961	\begin{center}\tt
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	962	\begin{tabular}{l}
643 08375ca3874e updated Christian Urban <urbanc@in.tum.de> parents: 596 diff changeset	963	(k, if),\\
08375ca3874e updated Christian Urban <urbanc@in.tum.de> parents: 596 diff changeset	964	(i, true),\\
08375ca3874e updated Christian Urban <urbanc@in.tum.de> parents: 596 diff changeset	965	(k, then),\\
08375ca3874e updated Christian Urban <urbanc@in.tum.de> parents: 596 diff changeset	966	(k, then),\\
08375ca3874e updated Christian Urban <urbanc@in.tum.de> parents: 596 diff changeset	967	(n, 42),\\
08375ca3874e updated Christian Urban <urbanc@in.tum.de> parents: 596 diff changeset	968	(k, else),\\
08375ca3874e updated Christian Urban <urbanc@in.tum.de> parents: 596 diff changeset	969	(o, +)
357 603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	970	\end{tabular}
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	971	\end{center}
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	972
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	973	\noindent
603e171a7b48 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 356 diff changeset	974	which will be the input for the next phase of our compiler.
356 d9c784c71305 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 352 diff changeset	975
251 5b5a68df6d16 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	976	\end{document}
5b5a68df6d16 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	977
5b5a68df6d16 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	978	%%% Local Variables:
5b5a68df6d16 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	979	%%% mode: latex
5b5a68df6d16 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	980	%%% TeX-master: t
5b5a68df6d16 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	981	%%% End:

author	Christian Urban <christian.urban@kcl.ac.uk>
	Sun, 11 Oct 2020 09:10:08 +0100 (2020-10-11)
changeset 778	3e5f5d19f514
parent 721	e3c64f22dd31
child 792	34132a854d03
permissions	-rw-r--r--