I have been rewriting my own regex in C just for fun, since I found that python's implementation is not that fast, after reading following article.
<Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...)
I only implemented python's findall function. Actually, its relatively fast, but I am just doing it for fun.
Test string Length : 7200000 (7.2M Characters)
Sample : abbbcdexfbeczexczczkefanncdehbzzdezf * 200000
Computer : Mobile Intel Celeron 1.6MHz, 512 MB
1. a bit complicated regex, mine is 2.5x faster than python's re
my regex: 0.960999965668 s
1000000 matches [u'abbbcdexf', u'beczex', u'zczkef', u'anncdeh', u'bzzdezf', u'abbbcdexf', u'beczex'] .. [u'zczkef', u'anncde
python re: 2.3029999733 s
1000000 matches [u'abbbcdexf', u'beczex', u'zczkef', u'anncdeh', u'bzzdezf', u'abbbcdexf', u'beczex'] .. [u'zczkef', u'anncde
2. simple regex, 5x faster than python's re
my regex: 1.83299994469 s
7200000 matches [u'a', u'b', u'b', u'b', u'c', u'd', u'e'] .. [u'e', u'z', u'f']
python re: 9.35300016403 s
7200000 matches [u'a', u'b', u'b', u'b', u'c', u'd', u'e'] .. [u'e', u'z', u'f']
3. non-greedy matching, 4x faster than python's re
my regex: 1.40199995041 s
3600000 matches [u'ab', u'bb', u'cd', u'ex', u'fb', u'ec', u'ze'] .. [u'zz', u'de', u'zf']
python re: 5.8789999485 s
3600000 matches [u'ab', u'bb', u'cd', u'ex', u'fb', u'ec', u'ze'] .. [u'zz', u'de', u'zf']
4. fix range matching, 4x faster than python's re
my regex: 0.690999984741 s
1440000 matches [u'abbbc', u'dexfb', u'eczex', u'czczk', u'efann', u'cdehb', u'zzdez'] .. [u'fannc', u'dehbz', u'zdezf']
python re: 2.66400003433 s
1440000 matches [u'abbbc', u'dexfb', u'eczex', u'czczk', u'efann', u'cdehb', u'zzdez'] .. [u'fannc', u'dehbz', u'zdezf']
Cheers,
Mark
2 comments:
Appraently, Google Go's implementation http://golang.org/pkg/regexp/ is worse than Python's. http://groups.google.com/group/golang-nuts/browse_thread/thread/f436d16dee9c8875?pli=1
So go and write a good one in Go and submit it for merging. Gain geek credits :P
Yeah bro, I knew that, but I am not going to write full feature regex engine, so may be they dont need that.
Post a Comment