Hi, Line! On Feb 29, Line Bie Pedersen wrote:
I have some questions related to this project, foremost are of course what kind of enhancement are you looking for? Are you looking for an update of existing code? I looked at the existing code and it seems to be the old regex library written by Henry Spencer. I did not look into it in detail, so I have no clear idea of the changes needed to bring it up to speed. Or are you looking to upgrade with an existing library? Or perhaps a rewrite suited to your needs? I can't see if you use the regex library for anything but deciding simple acceptance, but this would probably be a big factor in deciding. I would be happy to look into all three suggestions, or even a fourth, if you prefer something else.
There're two answers to that.
The first one - on the level of requirements, I am very much looking for multi-byte support in the regex library. And REGEX_REPLACE() function would be nice too - users ask for it quite often.
How will the REGEX_REPLACE() function work?
Similar to any other regex replace. Something like REGEX_REPLACE(orig_str, regex_str, replace_str, [flags]) for example REGEX_REPLACE('3.1415926', '(1).', '\\1_', 'g') which would return 3.1_1_926
The second answer - I thought that the simple way of getting multi-byte support, would be to remove Henry Spencer library, and put some modern regex implementation instead. That could be relatively easy to do for any undegraduate student with little experience - this task was one of ideas we had for Google Summer of Code.
Agree, this is the simple solution. Did you already have a library picked out? Are there any requirements to license and language? Would it be useful if you could limit the amount of memory the library can use?
No, I didn't. I would ask a student to compile a list of libraries and suggest which one we should use. If I were doing it myself, I'd start looking from PCRE, but I didn't "pick it up", becase I didn't really consider the alternatives yet. GPL will not do. LGPL is fine, BSD, MIT - fine too. Language - C or C++. And it should allow concurrent multi-threaded use. Yes, it would be a very useful feature. Otherwise a low-privileged user may consume an arbitrary amount of memory and DoS the system.
Now, with your experience, you may prefer to do something else - not replace the library, but rewrite it, or extend, whatever. I'm fine either way. It's only important that the result works with multi-byte character sets (ideally - our character set code), and that it can support Henry Spencer regex syntax.
What I would really like to do, is rewrite the thing! However, I do not have an unlimited amount of time for this project. I'll look into Henry Spencers old library and let you know what I decide to do.
Great! Thank you very much! Regards, Sergei