# HG changeset patch # User Chengsong # Date 1562102667 -3600 # Node ID 7d18745dd7c9d9e28c44c387e5c051cdc44c3e19 # Parent b5363c0dcd99d27a24b89180c0ca7dd5e6e51544 changes diff -r b5363c0dcd99 -r 7d18745dd7c9 ninems/.DS_Store Binary file ninems/.DS_Store has changed diff -r b5363c0dcd99 -r 7d18745dd7c9 ninems/ninems.tex --- a/ninems/ninems.tex Tue Jul 02 14:01:42 2019 +0100 +++ b/ninems/ninems.tex Tue Jul 02 22:24:27 2019 +0100 @@ -338,9 +338,9 @@ \] This algorithm can be illustrated as follows: \begin{tikzcd}\label{graph:*} -r_0 \arrow[r, "\backslash c_0"] & r_1 \arrow[r, "\backslash c_1"] & r_2 \arrow[r, dashed] & r_n +r_0 \arrow[r, "\backslash c_0"] & r_1 \arrow[r, "\backslash c_1"] & r_2 \arrow[r, dashed] & r_n \arrow[r,"nullable?"] & Yes/No \end{tikzcd} - +where we bnuild the successive derivative until we exhaust the string. One limitation, however, of Brzozowski's algorithm is that it only produces a YES/NO answer for whether a string is being matched by a @@ -375,10 +375,11 @@ \end{center} \noindent - Here we put the regular expression and values of the same shape on the same level to illustrate the corresponding relation between them. - - The flatten notation $| v |$ means extracting the characters in the value $v$ to form a string. For example, $|\mathit{Seq} \, \mathit{Char(c)} \, \mathit{Char(d)}|$ = $cd$. We omit this straightforward definition. - Values are a way of expressing parse trees(the tree structure that tells how a sub-regex matches a substring). For example, $\Seq\,v_1\, v_2$ tells us how the string $|v_1| \cdot |v_2|$ matches the regex $r_1 \cdot r_2$: $r_1$ matches $|v_1|$ and $r_2$ matches $|v_2|$. Exactly how these two are matched are contained in the sub-structure of $v_1$ and $v_2$. +Values and regular expressions correspond to each other as illustrated by placing corresponding values next to the regular expressions. +The idea of values is to express parse trees. +Suppose by using a flatten operation, written $|v|$, we can extract the underlying string of v. +For example, $|\mathit{Seq} \, \mathit{Char(c)} \, \mathit{Char(d)}|$ = $cd$. We omit this straightforward definition. +Using this flatten notation, we now elaborate how values can express parse trees. $\Seq\,v_1\, v_2$ tells us how the string $|v_1| \cdot |v_2|$ matches the regex $r_1 \cdot r_2$: $r_1$ matches $|v_1|$ and respectively $r_2$ matches $|v_2|$. Exactly how these two are matched are contained in the sub-structure of $v_1$ and $v_2$. To give a concrete example of how value works, consider the string $xy$ and the regular expression $(x + (y + xy))^*$. We can view this regular @@ -410,14 +411,11 @@ The contribution of Sulzmann and Lu is an extension of Brzozowski's algorithm by a second phase (the first phase being building successive derivatives). In this second phase, for every successful match the -corresponding POSIX value is computed. The whole process looks like the following diagram(the working flow of the simple matching algorithm that just gives a $YES/NO$ answer is given before \ref{graph:*}):\\ +corresponding POSIX value is computed. Pictorially, the Sulzmann and Lu algorithm looks like the following diagram(the working flow of the simple matching algorithm that just gives a $YES/NO$ answer is given before \ref{graph:*}):\\ \begin{tikzcd} r_0 \arrow[r, "\backslash c_0"] \arrow[d] & r_1 \arrow[r, "\backslash c_1"] \arrow[d] & r_2 \arrow[r, dashed] \arrow[d] & r_n \arrow[d, "mkeps" description] \\ v_0 & v_1 \arrow[l,"inj_{r_0} c_0"] & v_2 \arrow[l, "inj_{r_1} c_1"] & v_n \arrow[l, dashed] \end{tikzcd} -\begin{tikzcd} -r_0 \arrow[r, "\backslash c_0"] & r_1 \arrow[r, "\backslash c_1"] & r_2 \arrow[r, dashed] & r_n -\end{tikzcd} We shall briefly explain this interesting process.\\ For the convenience of explanation, we have the following notations: the regular expression $r$ used for matching is also called $r_0$ and the string $s$ is composed of $n$ characters $c_0 c_1 ... c_{n-1}$.