|
1 Title: |
|
2 ====== |
|
3 POSIX Lexing with Derivatives of Regular Expressions |
|
4 |
|
5 |
|
6 Authors: |
|
7 ======== |
|
8 Fahad Ausaf <fahad.ausaf at icloud.com>, 2016 |
|
9 Roy Dyckhoff <roy.dyckhoff at st-andrews.ac.uk>, 2016 |
|
10 Christian Urban <christian.urban at kcl.ac.uk>, 2016 |
|
11 |
|
12 |
|
13 Abstract: |
|
14 ========= |
|
15 |
|
16 Brzozowski introduced the notion of derivatives for regular |
|
17 expressions. They can be used for a very simple regular expression |
|
18 matching algorithm. Sulzmann and Lu cleverly extended this algorithm |
|
19 in order to deal with POSIX matching, which is the underlying |
|
20 disambiguation strategy for regular expressions needed in |
|
21 lexers. Sulzmann and Lu have made available on-line what they call a |
|
22 ``rigorous proof'' of the correctness of their algorithm w.r.t. their |
|
23 specification; regrettably, it appears to us to have unfillable |
|
24 gaps. In the first part of this paper we give our inductive definition |
|
25 of what a POSIX value is and show (i) that such a value is unique (for |
|
26 given regular expression and string being matched) and (ii) that |
|
27 Sulzmann and Lu's algorithm always generates such a value (provided |
|
28 that the regular expression matches the string). We also prove the |
|
29 correctness of an optimised version of the POSIX matching |
|
30 algorithm. Our definitions and proof are much simpler than those by |
|
31 Sulzmann and Lu and can be easily formalised in Isabelle/HOL. In the |
|
32 second part we analyse the correctness argument by Sulzmann and Lu and |
|
33 explain why the gaps in this argument cannot be filled easily. |
|
34 |
|
35 |
|
36 New Theories: |
|
37 ============= |
|
38 |
|
39 Lexer.thy |
|
40 Simplifying.thy |
|
41 |
|
42 The repository can be checked using Isabelle 2016. |
|
43 |
|
44 isabelle build -c -v -d . Posix-Lexing |
|
45 |
|
46 |
|
47 |
|
48 |
|
49 |
|
50 |
|
51 |
|
52 |
|
53 |