209 This work is about regular expressions and derivatives. It combines functional algorithms and their formal verification in the Isabelle/HOL theorem prover. |
209 This work is about regular expressions and derivatives. It combines functional algorithms and their formal verification in the Isabelle/HOL theorem prover. |
210 |
210 |
211 Theoretical results say that regular expression matching |
211 Theoretical results say that regular expression matching |
212 should be linear with respect to the input. |
212 should be linear with respect to the input. |
213 Under a certain class of regular expressions and inputs though, |
213 Under a certain class of regular expressions and inputs though, |
214 practical implementations suffer from non-linear or even |
214 practical implementations often suffer from non-linear or even |
215 exponential running time, |
215 exponential running time, |
216 allowing a ReDoS (regular expression denial-of-service ) attack. |
216 allowing ReDoS (regular expression denial-of-service ) attacks. |
217 |
217 The reason behind this is that regex libraries in popular programming |
218 |
218 languages often want to support richer constructs |
219 The reason behind is that regex libraries in popular language engines |
219 than the usual basic regular expressions. Unfortunately, this means that even simple cases |
220 often want to support richer constructs |
220 are "infected" with atrocious running time. |
221 than the most basic regular expressions, and lexing rather than matching |
221 |
222 is needed for sub-match extraction. |
222 |
223 |
223 This work aims to address the above vulnerability by a combination |
224 This work aims to address the above vulnerability by the combination |
|
225 of Brzozowski's derivatives and interactive theorem proving. We give an |
224 of Brzozowski's derivatives and interactive theorem proving. We give an |
226 improved version of Sulzmann and Lu's bit-coded algorithm using |
225 improved version of Sulzmann and Lu's bit-coded algorithm using |
227 derivatives, which come with a formal guarantee in terms of correctness and |
226 derivatives, and give a formal guarantee in terms of correctness and |
228 running time as an Isabelle/HOL proof. |
227 size of derivatives, which we formalised in Isabelle/HOL. |
229 Then we improve the algorithm with an even stronger version of |
228 Then we improve the algorithm with a stronger version of |
230 simplification, and prove a time bound linear to input and |
229 simplification, and prove a time bound linear to input and |
231 cubic to regular expression size using a technique by |
230 cubic to regular expression size using a technique introduced by |
232 Antimirov. |
231 Antimirov. |
233 The result is an algorithm with provable guarantees on |
232 The result is an algorithm with provable guarantees on |
234 correctness and running time. We believe this is the first |
233 correctness and running time. We believe this is the first |
235 work with these two guarantees together. |
234 work with these two guarantees together. |
236 |
235 |