thys2/Paper/Paper.thy
changeset 460 6e269f557fc5
parent 459 484403cf0c9d
child 461 c4b6906068a9
equal deleted inserted replaced
459:484403cf0c9d 460:6e269f557fc5
  1420    \cite[Page 14]{Sulzmann2014}. 
  1420    \cite[Page 14]{Sulzmann2014}. 
  1421    Given the growth of the
  1421    Given the growth of the
  1422    derivatives in some cases even after aggressive simplification, this
  1422    derivatives in some cases even after aggressive simplification, this
  1423    is a hard to believe fact. A similar claim about a theoretical runtime
  1423    is a hard to believe fact. A similar claim about a theoretical runtime
  1424    of @{text "O(n\<^sup>2)"} is made for the Verbatim lexer, which calculates
  1424    of @{text "O(n\<^sup>2)"} is made for the Verbatim lexer, which calculates
  1425    POSIX matches and is based on
  1425    tokens according to POSIX rules \cite{verbatim}. For this it uses Brzozowski's
  1426    derivatives \cite{verbatim}. In this case derivatives are not even simplified.
  1426    derivatives . 
  1427    They write: ``For a specific list of lexical rules, Verbatim has quadratic
  1427    They write: ``The results of our empirical tests [..] confirm that Verbatim has
  1428    theoretical time complexity with respect to the length of the input string.''
  1428    $O(n^2)$ time complexity.'' \cite[Section~VII]{verbatim}.
  1429    While their correctness proof for Verbatim is formalised in Coq, the claim about
  1429    While their correctness proof for Verbatim is formalised in Coq, the claim about
  1430    the runtime complexity is only supported by emperical evidence.
  1430    the runtime complexity is only supported by some emperical evidence.
  1431    When we
  1431    In the context of our observation with the ``growth problem'' of derivatives,
  1432    tried out their extracted OCaml code with our example
  1432    we
  1433    \mbox{@{text "(a + aa)\<^sup>*"}}, it took around 5 minutes to tokenise a
  1433    tried out their extracted OCaml code with the example
       
  1434    \mbox{@{text "(a + aa)\<^sup>*"}} as a single lexing rule, and it took for us around 5 minutes to tokenise a
  1434    string of 40 $a$'s and that increased to approximately 19 minutes when the
  1435    string of 40 $a$'s and that increased to approximately 19 minutes when the
  1435    string was 50 $a$'s long. Given that derivatives are not simplified in
  1436    string is 50 $a$'s long. Given that derivatives are not simplified in the Verbatim
  1436    the work on Verbatim, such numbers are not surprising. 
  1437    lexer, such numbers are not surprising. 
  1437    Clearly our result of having finite
  1438    Clearly our result of having finite
  1438    derivatives might sound rather weak in this context but we think such effeciency claims
  1439    derivatives might sound rather weak in this context but we think such effeciency claims
  1439    really require further scrutiny.\medskip
  1440    really require further scrutiny.\medskip
  1440 
  1441 
  1441    \noindent
  1442    \noindent