Re: [Maria-developers] [Gsoc] Regex enhancements Project
Hi, Sudheera! On Apr 19, Sudheera Palihakkara wrote:
Hi, I went through other threads on this topic. In one thread you mentioned to choose a suitable regex library.
*( Preliminary research - only about chosing a regex library to use in MariaDB. You should be able to explain why we should use this library and not some other one.)
* What do you mean by "choosing"? don't we have to enhance the exiting regex library? Or choose from exiting already implemented libraries which are free to use? sorry if it's a stupid question, but I'm confused. :O
Enhancing our old regex library to support all modern features and multiple charsets is complex and bug-prone work. I don't see why we should bother doing it, when there are plenty of regex libraries available. There's PHP's mb_regex, there's prce, and many others too. We'd better just pick one that works better for MariaDB, and put it instead of Henry Spencer's library. Regards, Sergei P.S. Please, don't reply to me only, use reply-to-all, so that your mails appear on the mailing list.
Hello Sir, I've been working on this project for the past couple of days. I found that there are few good regex libraries suitable for this task. Considering the requirements I think PCRE, ICU regex and RGX would do the job. But ICU regex doesn't have recursion but it has well-documented easy-to-understand code. Currently I think PCRE is the best option we can have. In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about that.? In the google-melange page, under the application template there is a field called "Project description", what should I include there.? i mean do you expect a full description about the project including figures or just a brief just like in projects ideas page. Thank you. On Fri, Apr 19, 2013 at 3:46 PM, Sergei Golubchik <serg@askmonty.org> wrote:
Hi, Sudheera!
On Apr 19, Sudheera Palihakkara wrote:
Hi, I went through other threads on this topic. In one thread you mentioned to choose a suitable regex library.
*( Preliminary research - only about chosing a regex library to use in MariaDB. You should be able to explain why we should use this library and not some other one.)
* What do you mean by "choosing"? don't we have to enhance the exiting regex library? Or choose from exiting already implemented libraries which are free to use? sorry if it's a stupid question, but I'm confused. :O
Enhancing our old regex library to support all modern features and multiple charsets is complex and bug-prone work.
I don't see why we should bother doing it, when there are plenty of regex libraries available.
There's PHP's mb_regex, there's prce, and many others too. We'd better just pick one that works better for MariaDB, and put it instead of Henry Spencer's library.
Regards, Sergei
P.S. Please, don't reply to me only, use reply-to-all, so that your mails appear on the mailing list.
-- *Sudheera Palihakkara.* Undergraduate Department of *Computer Science and Engineering, *Faculty of Engineering, *University of Moratuwa*, Sri Lanka.
Hi Sudheera and Sergei,
In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about that.?
Do you know "oniguruma"? http://www.geocities.jp/kosako3/oniguruma/ http://en.wikipedia.org/wiki/Oniguruma Oniguruma is a regular expressions library, that supports multi-byte character sets like big5, euc-kr and shift_jis. Oniguruma is used by "mregexp". "Mregexp" is a multi-byte support regex UDF for MySQL. So, I think you can understand easily about how to use it. Thanks, Kentoku 2013/4/20 Sudheera Palihakkara <catchsudheera@gmail.com>
Hello Sir,
I've been working on this project for the past couple of days. I found that there are few good regex libraries suitable for this task. Considering the requirements I think PCRE, ICU regex and RGX would do the job. But ICU regex doesn't have recursion but it has well-documented easy-to-understand code. Currently I think PCRE is the best option we can have.
In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about that.?
In the google-melange page, under the application template there is a field called "Project description", what should I include there.? i mean do you expect a full description about the project including figures or just a brief just like in projects ideas page.
Thank you.
On Fri, Apr 19, 2013 at 3:46 PM, Sergei Golubchik <serg@askmonty.org>wrote:
Hi, Sudheera!
On Apr 19, Sudheera Palihakkara wrote:
Hi, I went through other threads on this topic. In one thread you mentioned to choose a suitable regex library.
*( Preliminary research - only about chosing a regex library to use in MariaDB. You should be able to explain why we should use this library and not some other one.)
* What do you mean by "choosing"? don't we have to enhance the exiting regex library? Or choose from exiting already implemented libraries which are free to use? sorry if it's a stupid question, but I'm confused. :O
Enhancing our old regex library to support all modern features and multiple charsets is complex and bug-prone work.
I don't see why we should bother doing it, when there are plenty of regex libraries available.
There's PHP's mb_regex, there's prce, and many others too. We'd better just pick one that works better for MariaDB, and put it instead of Henry Spencer's library.
Regards, Sergei
P.S. Please, don't reply to me only, use reply-to-all, so that your mails appear on the mailing list.
-- *Sudheera Palihakkara.* Undergraduate Department of *Computer Science and Engineering, *Faculty of Engineering, *University of Moratuwa*, Sri Lanka.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Hi Kentoku, thank you, I will surely study about Oniguruma..! :) On Sat, Apr 20, 2013 at 11:26 PM, kentoku <kentokushiba@gmail.com> wrote:
Hi Sudheera and Sergei,
In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about that.?
Do you know "oniguruma"? http://www.geocities.jp/kosako3/oniguruma/ http://en.wikipedia.org/wiki/Oniguruma
Oniguruma is a regular expressions library, that supports multi-byte character sets like big5, euc-kr and shift_jis. Oniguruma is used by "mregexp". "Mregexp" is a multi-byte support regex UDF for MySQL. So, I think you can understand easily about how to use it.
Thanks, Kentoku
2013/4/20 Sudheera Palihakkara <catchsudheera@gmail.com>
Hello Sir,
I've been working on this project for the past couple of days. I found that there are few good regex libraries suitable for this task. Considering the requirements I think PCRE, ICU regex and RGX would do the job. But ICU regex doesn't have recursion but it has well-documented easy-to-understand code. Currently I think PCRE is the best option we can have.
In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about that.?
In the google-melange page, under the application template there is a field called "Project description", what should I include there.? i mean do you expect a full description about the project including figures or just a brief just like in projects ideas page.
Thank you.
On Fri, Apr 19, 2013 at 3:46 PM, Sergei Golubchik <serg@askmonty.org>wrote:
Hi, Sudheera!
On Apr 19, Sudheera Palihakkara wrote:
Hi, I went through other threads on this topic. In one thread you mentioned to choose a suitable regex library.
*( Preliminary research - only about chosing a regex library to use in MariaDB. You should be able to explain why we should use this library and not some other one.)
* What do you mean by "choosing"? don't we have to enhance the exiting regex library? Or choose from exiting already implemented libraries which are free to use? sorry if it's a stupid question, but I'm confused. :O
Enhancing our old regex library to support all modern features and multiple charsets is complex and bug-prone work.
I don't see why we should bother doing it, when there are plenty of regex libraries available.
There's PHP's mb_regex, there's prce, and many others too. We'd better just pick one that works better for MariaDB, and put it instead of Henry Spencer's library.
Regards, Sergei
P.S. Please, don't reply to me only, use reply-to-all, so that your mails appear on the mailing list.
-- *Sudheera Palihakkara.* Undergraduate Department of *Computer Science and Engineering, *Faculty of Engineering, *University of Moratuwa*, Sri Lanka.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
-- *Sudheera Palihakkara.* Undergraduate Department of *Computer Science and Engineering, *Faculty of Engineering, *University of Moratuwa*, Sri Lanka.
Hi Kentoku, Oniguruma is looking great. But I can't find if the following features are implemented on Oniguruma or not(even the Wikipedia page doesn't have those information). Any Idea where I can find those? * look-aheads/look-behinds, * non-greedy modifiers * recursion thanks. On Sun, Apr 21, 2013 at 12:00 PM, Sudheera Palihakkara <catchsudheera@gmail.com> wrote:
Hi Kentoku,
thank you, I will surely study about Oniguruma..! :)
On Sat, Apr 20, 2013 at 11:26 PM, kentoku <kentokushiba@gmail.com> wrote:
Hi Sudheera and Sergei,
In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about that.?
Do you know "oniguruma"? http://www.geocities.jp/kosako3/oniguruma/ http://en.wikipedia.org/wiki/Oniguruma
Oniguruma is a regular expressions library, that supports multi-byte character sets like big5, euc-kr and shift_jis. Oniguruma is used by "mregexp". "Mregexp" is a multi-byte support regex UDF for MySQL. So, I think you can understand easily about how to use it.
Thanks, Kentoku
2013/4/20 Sudheera Palihakkara <catchsudheera@gmail.com>
Hello Sir,
I've been working on this project for the past couple of days. I found that there are few good regex libraries suitable for this task. Considering the requirements I think PCRE, ICU regex and RGX would do the job. But ICU regex doesn't have recursion but it has well-documented easy-to-understand code. Currently I think PCRE is the best option we can have.
In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about that.?
In the google-melange page, under the application template there is a field called "Project description", what should I include there.? i mean do you expect a full description about the project including figures or just a brief just like in projects ideas page.
Thank you.
On Fri, Apr 19, 2013 at 3:46 PM, Sergei Golubchik <serg@askmonty.org> wrote:
Hi, Sudheera!
On Apr 19, Sudheera Palihakkara wrote:
Hi, I went through other threads on this topic. In one thread you mentioned to choose a suitable regex library.
*( Preliminary research - only about chosing a regex library to use in MariaDB. You should be able to explain why we should use this library and not some other one.)
* What do you mean by "choosing"? don't we have to enhance the exiting regex library? Or choose from exiting already implemented libraries which are free to use? sorry if it's a stupid question, but I'm confused. :O
Enhancing our old regex library to support all modern features and multiple charsets is complex and bug-prone work.
I don't see why we should bother doing it, when there are plenty of regex libraries available.
There's PHP's mb_regex, there's prce, and many others too. We'd better just pick one that works better for MariaDB, and put it instead of Henry Spencer's library.
Regards, Sergei
P.S. Please, don't reply to me only, use reply-to-all, so that your mails appear on the mailing list.
-- Sudheera Palihakkara. Undergraduate Department of Computer Science and Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
-- Sudheera Palihakkara. Undergraduate Department of Computer Science and Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka.
-- Sudheera Palihakkara. Undergraduate Department of Computer Science and Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka.
Hi Sergei, Is this project already started? Thanks, Kentoku 2013/4/21 Sudheera Palihakkara <catchsudheera@gmail.com>
Hi Kentoku,
Oniguruma is looking great. But I can't find if the following features are implemented on Oniguruma or not(even the Wikipedia page doesn't have those information). Any Idea where I can find those?
* look-aheads/look-behinds, * non-greedy modifiers * recursion
thanks.
Hi Kentoku,
thank you, I will surely study about Oniguruma..! :)
On Sat, Apr 20, 2013 at 11:26 PM, kentoku <kentokushiba@gmail.com> wrote:
Hi Sudheera and Sergei,
In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about
Do you know "oniguruma"? http://www.geocities.jp/kosako3/oniguruma/ http://en.wikipedia.org/wiki/Oniguruma
Oniguruma is a regular expressions library, that supports multi-byte character sets like big5, euc-kr and shift_jis. Oniguruma is used by "mregexp". "Mregexp" is a multi-byte support regex UDF for MySQL. So, I think you can understand easily about how to use it.
Thanks, Kentoku
2013/4/20 Sudheera Palihakkara <catchsudheera@gmail.com>
Hello Sir,
I've been working on this project for the past couple of days. I found that there are few good regex libraries suitable for this task.
Considering
the requirements I think PCRE, ICU regex and RGX would do the job. But ICU regex doesn't have recursion but it has well-documented easy-to-understand code. Currently I think PCRE is the best option we can have.
In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about
In the google-melange page, under the application template there is a field called "Project description", what should I include there.? i
mean do
you expect a full description about the project including figures or just a brief just like in projects ideas page.
Thank you.
On Fri, Apr 19, 2013 at 3:46 PM, Sergei Golubchik <serg@askmonty.org> wrote:
Hi, Sudheera!
On Apr 19, Sudheera Palihakkara wrote:
Hi, I went through other threads on this topic. In one thread you mentioned to choose a suitable regex library.
*( Preliminary research - only about chosing a regex library to use
in
MariaDB. You should be able to explain why we should use this
On Sun, Apr 21, 2013 at 12:00 PM, Sudheera Palihakkara <catchsudheera@gmail.com> wrote: that.? that.? library
and not some other one.)
* What do you mean by "choosing"? don't we have to enhance the exiting regex library? Or choose from exiting already implemented libraries which are free to use? sorry if it's a stupid question, but I'm confused. :O
Enhancing our old regex library to support all modern features and multiple charsets is complex and bug-prone work.
I don't see why we should bother doing it, when there are plenty of regex libraries available.
There's PHP's mb_regex, there's prce, and many others too. We'd better just pick one that works better for MariaDB, and put it instead of Henry Spencer's library.
Regards, Sergei
P.S. Please, don't reply to me only, use reply-to-all, so that your mails appear on the mailing list.
-- Sudheera Palihakkara. Undergraduate Department of Computer Science and Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
-- Sudheera Palihakkara. Undergraduate Department of Computer Science and Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka.
-- Sudheera Palihakkara. Undergraduate Department of Computer Science and Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka.
Hi, Kentoku! On Apr 22, kentoku wrote:
Hi Sergei,
Is this project already started?
No, it's Google Summer of Code project. The work, basically, starts in June. Now students only look at project ideas of different mentoring organizations, and decide what they want to work on in summer.
2013/4/21 Sudheera Palihakkara <catchsudheera@gmail.com>
Oniguruma is looking great. But I can't find if the following features are implemented on Oniguruma or not(even the Wikipedia page doesn't have those information). Any Idea where I can find those?
* look-aheads/look-behinds, * non-greedy modifiers * recursion
Regards, Sergei
Hi Sudheera and Sergei, Sudheera, How much do you want to prepare for this project before starting GSOC? Serugei, How can I and other people help this project after starting GSOC? Is it same as now? Thanks, Kentoku 2013/4/21 Sudheera Palihakkara <catchsudheera@gmail.com>
Hi Kentoku,
Oniguruma is looking great. But I can't find if the following features are implemented on Oniguruma or not(even the Wikipedia page doesn't have those information). Any Idea where I can find those?
* look-aheads/look-behinds, * non-greedy modifiers * recursion
thanks.
Hi Kentoku,
thank you, I will surely study about Oniguruma..! :)
On Sat, Apr 20, 2013 at 11:26 PM, kentoku <kentokushiba@gmail.com> wrote:
Hi Sudheera and Sergei,
In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about
Do you know "oniguruma"? http://www.geocities.jp/kosako3/oniguruma/ http://en.wikipedia.org/wiki/Oniguruma
Oniguruma is a regular expressions library, that supports multi-byte character sets like big5, euc-kr and shift_jis. Oniguruma is used by "mregexp". "Mregexp" is a multi-byte support regex UDF for MySQL. So, I think you can understand easily about how to use it.
Thanks, Kentoku
2013/4/20 Sudheera Palihakkara <catchsudheera@gmail.com>
Hello Sir,
I've been working on this project for the past couple of days. I found that there are few good regex libraries suitable for this task.
Considering
the requirements I think PCRE, ICU regex and RGX would do the job. But ICU regex doesn't have recursion but it has well-documented easy-to-understand code. Currently I think PCRE is the best option we can have.
In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about
In the google-melange page, under the application template there is a field called "Project description", what should I include there.? i
mean do
you expect a full description about the project including figures or just a brief just like in projects ideas page.
Thank you.
On Fri, Apr 19, 2013 at 3:46 PM, Sergei Golubchik <serg@askmonty.org> wrote:
Hi, Sudheera!
On Apr 19, Sudheera Palihakkara wrote:
Hi, I went through other threads on this topic. In one thread you mentioned to choose a suitable regex library.
*( Preliminary research - only about chosing a regex library to use
in
MariaDB. You should be able to explain why we should use this
On Sun, Apr 21, 2013 at 12:00 PM, Sudheera Palihakkara <catchsudheera@gmail.com> wrote: that.? that.? library
and not some other one.)
* What do you mean by "choosing"? don't we have to enhance the exiting regex library? Or choose from exiting already implemented libraries which are free to use? sorry if it's a stupid question, but I'm confused. :O
Enhancing our old regex library to support all modern features and multiple charsets is complex and bug-prone work.
I don't see why we should bother doing it, when there are plenty of regex libraries available.
There's PHP's mb_regex, there's prce, and many others too. We'd better just pick one that works better for MariaDB, and put it instead of Henry Spencer's library.
Regards, Sergei
P.S. Please, don't reply to me only, use reply-to-all, so that your mails appear on the mailing list.
-- Sudheera Palihakkara. Undergraduate Department of Computer Science and Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
-- Sudheera Palihakkara. Undergraduate Department of Computer Science and Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka.
-- Sudheera Palihakkara. Undergraduate Department of Computer Science and Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka.
Hi, Kentoku! On May 05, kentoku wrote:
Hi Sudheera and Sergei,
Sudheera, How much do you want to prepare for this project before starting GSOC?
Serugei, How can I and other people help this project after starting GSOC? Is it same as now?
The mentor for this project is Alexander Barkov, I've added him to Cc. And here's the task description: https://mariadb.atlassian.net/browse/MDEV-4425 But let's wait until we know what student proposal (if any) got accepted and who will be working on this project. Regards, Sergei
Hi Sergei!
The mentor for this project is Alexander Barkov, I've added him to Cc. And here's the task description: https://mariadb.atlassian.net/browse/MDEV-4425
But let's wait until we know what student proposal (if any) got accepted and who will be working on this project.
O.K. Thanks, Kentoku 2013/5/4 Sergei Golubchik <serg@askmonty.org>
Hi, Kentoku!
On May 05, kentoku wrote:
Hi Sudheera and Sergei,
Sudheera, How much do you want to prepare for this project before starting GSOC?
Serugei, How can I and other people help this project after starting GSOC? Is it same as now?
The mentor for this project is Alexander Barkov, I've added him to Cc. And here's the task description: https://mariadb.atlassian.net/browse/MDEV-4425
But let's wait until we know what student proposal (if any) got accepted and who will be working on this project.
Regards, Sergei
participants (3)
-
kentoku
-
Sergei Golubchik
-
Sudheera Palihakkara