equal
deleted
inserted
replaced
139 sequence of tokens, POSIX is the more natural disambiguation strategy for |
139 sequence of tokens, POSIX is the more natural disambiguation strategy for |
140 what programmers consider basic syntactic building blocks in their programs. |
140 what programmers consider basic syntactic building blocks in their programs. |
141 These building blocks are often specified by some regular expressions, say |
141 These building blocks are often specified by some regular expressions, say |
142 @{text "r\<^bsub>key\<^esub>"} and @{text "r\<^bsub>id\<^esub>"} for recognising keywords and |
142 @{text "r\<^bsub>key\<^esub>"} and @{text "r\<^bsub>id\<^esub>"} for recognising keywords and |
143 identifiers, respectively. There are two underlying (informal) rules behind |
143 identifiers, respectively. There are two underlying (informal) rules behind |
144 tokenising a string in a POSIX fashion: |
144 tokenising a string in a POSIX fashion according to a collection of regular |
|
145 expressions: |
145 |
146 |
146 \begin{itemize} |
147 \begin{itemize} |
147 \item[$\bullet$] \underline{The Longest Match Rule (or ``maximal munch rule''):} |
148 \item[$\bullet$] \underline{The Longest Match Rule (or ``maximal munch rule''):} |
148 |
149 |
149 The longest initial substring matched by any regular expression is taken as |
150 The longest initial substring matched by any regular expression is taken as |
164 token, not a keyword followed by an identifier. For @{text "if"} we obtain by |
165 token, not a keyword followed by an identifier. For @{text "if"} we obtain by |
165 the priority rule a keyword token, not an identifier token---even if @{text |
166 the priority rule a keyword token, not an identifier token---even if @{text |
166 "r\<^bsub>id\<^esub>"} matches also.\bigskip |
167 "r\<^bsub>id\<^esub>"} matches also.\bigskip |
167 |
168 |
168 \noindent {\bf Contributions:} We have implemented in Isabelle/HOL the |
169 \noindent {\bf Contributions:} We have implemented in Isabelle/HOL the |
169 derivative-based regular expression matching algorithm as described by |
170 derivative-based regular expression matching algorithm of |
170 Sulzmann and Lu \cite{Sulzmann2014}. We have proved the correctness of this |
171 Sulzmann and Lu \cite{Sulzmann2014}. We have proved the correctness of this |
171 algorithm according to our specification of what a POSIX value is. Sulzmann |
172 algorithm according to our specification of what a POSIX value is. Sulzmann |
172 and Lu sketch in \cite{Sulzmann2014} an informal correctness proof: but to |
173 and Lu sketch in \cite{Sulzmann2014} an informal correctness proof: but to |
173 us it contains unfillable gaps.\footnote{An extended version of |
174 us it contains unfillable gaps.\footnote{An extended version of |
174 \cite{Sulzmann2014} is available at the website of its first author; this |
175 \cite{Sulzmann2014} is available at the website of its first author; this |
1133 implemented easily in functional languages. A bespoke lexer for the |
1134 implemented easily in functional languages. A bespoke lexer for the |
1134 Imp-Language is formalised in Coq as part of the Software Foundations book |
1135 Imp-Language is formalised in Coq as part of the Software Foundations book |
1135 \cite{Pierce2015}. The disadvantage of such bespoke lexers is that they |
1136 \cite{Pierce2015}. The disadvantage of such bespoke lexers is that they |
1136 do not generalise easily to more advanced features. |
1137 do not generalise easily to more advanced features. |
1137 Our formalisation is available from |
1138 Our formalisation is available from |
1138 |
1139 \url{http://www.inf.kcl.ac.uk/staff/urbanc/lex}. |
1139 \begin{center} |
|
1140 \url{http://www.inf.kcl.ac.uk/staff/urbanc/lex} |
|
1141 \end{center} |
|
1142 |
1140 |
1143 %\noindent |
1141 %\noindent |
1144 %{\bf Acknowledgements:} |
1142 %{\bf Acknowledgements:} |
1145 %We are grateful for the comments we received from anonymous |
1143 %We are grateful for the comments we received from anonymous |
1146 %referees. |
1144 %referees. |