I have been rewriting my own regex in C just for fun, since I found that python's implementation is not that fast, after reading following article.
<Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...)
I only implemented python's findall function. Actually, its relatively fast, but I am just doing it for fun.
Test string Length : 7200000 (7.2M Characters)
Sample : abbbcdexfbeczexczczkefanncdehbzzdezf * 200000
Computer : Mobile Intel Celeron 1.6MHz, 512 MB
1. a bit complicated regex, mine is 2.5x faster than python's re
regex pattern: [a]?[a-b]?[a]?[bezn][a-z0-9_]{1,3}?[e-k][^dsag]f?
my regex:
0.960999965668 s1000000 matches [u'abbbcdexf', u'beczex', u'zczkef', u'anncdeh', u'bzzdezf', u'abbbcdexf', u'beczex'] .. [u'zczkef', u'anncde
python re:
2.3029999733 s1000000 matches [u'abbbcdexf', u'beczex', u'zczkef', u'anncdeh', u'bzzdezf', u'abbbcdexf', u'beczex'] .. [u'zczkef', u'anncde
2. simple regex, 5x faster than python's re
regex pattern: [a-z]
my regex:
1.83299994469 s7200000 matches [u'a', u'b', u'b', u'b', u'c', u'd', u'e'] .. [u'e', u'z', u'f']
python re:
9.35300016403 s7200000 matches [u'a', u'b', u'b', u'b', u'c', u'd', u'e'] .. [u'e', u'z', u'f']
3. non-greedy matching, 4x faster than python's re
regex pattern: [a-z]{2,5}?
my regex:
1.40199995041 s3600000 matches [u'ab', u'bb', u'cd', u'ex', u'fb', u'ec', u'ze'] .. [u'zz', u'de', u'zf']
python re:
5.8789999485 s3600000 matches [u'ab', u'bb', u'cd', u'ex', u'fb', u'ec', u'ze'] .. [u'zz', u'de', u'zf']
4. fix range matching, 4x faster than python's re
regex pattern: [a-z]{5}
my regex:
0.690999984741 s1440000 matches [u'abbbc', u'dexfb', u'eczex', u'czczk', u'efann', u'cdehb', u'zzdez'] .. [u'fannc', u'dehbz', u'zdezf']
python re:
2.66400003433 s1440000 matches [u'abbbc', u'dexfb', u'eczex', u'czczk', u'efann', u'cdehb', u'zzdez'] .. [u'fannc', u'dehbz', u'zdezf']
Cheers,
Mark