Hi Sudheera and Sergei,> In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about that.?
Do you know "oniguruma"?Oniguruma is a regular expressions library, that supports multi-byte character sets like big5, euc-kr and shift_jis. Oniguruma is used by "mregexp". "Mregexp" is a multi-byte support regex UDF for MySQL. So, I think you can understand easily about how to use it.
Thanks,Kentoku2013/4/20 Sudheera Palihakkara <catchsudheera@gmail.com>_______________________________________________In the google-melange page, under the application template there is a field called "Project description", what should I include there.? i mean do you expect a full description about the project including figures or just a brief just like in projects ideas page.In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about that.?Hello Sir,I've been working on this project for the past couple of days. I found that there are few good regex libraries suitable for this task. Considering the requirements I think PCRE, ICU regex and RGX would do the job. But ICU regex doesn't have recursion but it has well-documented easy-to-understand code. Currently I think PCRE is the best option we can have.
Thank you.On Fri, Apr 19, 2013 at 3:46 PM, Sergei Golubchik <serg@askmonty.org> wrote:
Hi, Sudheera!
On Apr 19, Sudheera Palihakkara wrote:
> Hi,
> I went through other threads on this topic. In one thread you mentioned to> *( Preliminary research - only about chosing a regex library to use in
> choose a suitable regex library.
>
> MariaDB. You should be able to explain why we should use this library and> *
> not some other one.)
>
> What do you mean by "choosing"? don't we have to enhance the exiting regexEnhancing our old regex library to support all modern features and
> library? Or choose from exiting already implemented libraries which are
> free to use? sorry if it's a stupid question, but I'm confused. :O
multiple charsets is complex and bug-prone work.
I don't see why we should bother doing it, when there are plenty of
regex libraries available.
There's PHP's mb_regex, there's prce, and many others too. We'd better
just pick one that works better for MariaDB, and put it instead of
Henry Spencer's library.
Regards,
Sergei
P.S. Please, don't reply to me only, use reply-to-all, so that your
mails appear on the mailing list.
--
Sudheera Palihakkara.
UndergraduateDepartment of Computer Science and Engineering,University of Moratuwa,
Faculty of Engineering,
Sri Lanka.
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help : https://help.launchpad.net/ListHelp