Re: [Maria-developers] [Gsoc] Regex enhancements Project

newer
[Maria-developers] Missing locking...

older
[Maria-developers] Fwd: MariaDB...

Sergei Golubchik

19 Apr 2013 19 Apr '13

10:16 a.m.

Hi, Sudheera! On Apr 19, Sudheera Palihakkara wrote:

...

Hi, I went through other threads on this topic. In one thread you mentioned to choose a suitable regex library.

*( Preliminary research - only about chosing a regex library to use in MariaDB. You should be able to explain why we should use this library and not some other one.)

* What do you mean by "choosing"? don't we have to enhance the exiting regex library? Or choose from exiting already implemented libraries which are free to use? sorry if it's a stupid question, but I'm confused. :O

Enhancing our old regex library to support all modern features and multiple charsets is complex and bug-prone work. I don't see why we should bother doing it, when there are plenty of regex libraries available. There's PHP's mb_regex, there's prce, and many others too. We'd better just pick one that works better for MariaDB, and put it instead of Henry Spencer's library. Regards, Sergei P.S. Please, don't reply to me only, use reply-to-all, so that your mails appear on the mailing list.

Show replies by date

Sudheera Palihakkara

20 Apr 20 Apr

7:45 a.m.

New subject: [Maria-developers] [Gsoc] Regex enhancements Project

Hello Sir, I've been working on this project for the past couple of days. I found that there are few good regex libraries suitable for this task. Considering the requirements I think PCRE, ICU regex and RGX would do the job. But ICU regex doesn't have recursion but it has well-documented easy-to-understand code. Currently I think PCRE is the best option we can have. In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about that.? In the google-melange page, under the application template there is a field called "Project description", what should I include there.? i mean do you expect a full description about the project including figures or just a brief just like in projects ideas page. Thank you. On Fri, Apr 19, 2013 at 3:46 PM, Sergei Golubchik <serg@askmonty.org> wrote:

...

Hi, Sudheera!

On Apr 19, Sudheera Palihakkara wrote:

...
Hi, I went through other threads on this topic. In one thread you mentioned to choose a suitable regex library.

*( Preliminary research - only about chosing a regex library to use in MariaDB. You should be able to explain why we should use this library and not some other one.)

* What do you mean by "choosing"? don't we have to enhance the exiting regex library? Or choose from exiting already implemented libraries which are free to use? sorry if it's a stupid question, but I'm confused. :O

Enhancing our old regex library to support all modern features and multiple charsets is complex and bug-prone work.

I don't see why we should bother doing it, when there are plenty of regex libraries available.

There's PHP's mb_regex, there's prce, and many others too. We'd better just pick one that works better for MariaDB, and put it instead of Henry Spencer's library.

Regards, Sergei

P.S. Please, don't reply to me only, use reply-to-all, so that your mails appear on the mailing list.

-- *Sudheera Palihakkara.* Undergraduate Department of *Computer Science and Engineering, *Faculty of Engineering, *University of Moratuwa*, Sri Lanka.

kentoku

5:56 p.m.

New subject: [Maria-developers] [Gsoc] Regex enhancements Project

Hi Sudheera and Sergei,

...

In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about that.?

Do you know "oniguruma"? http://www.geocities.jp/kosako3/oniguruma/ http://en.wikipedia.org/wiki/Oniguruma Oniguruma is a regular expressions library, that supports multi-byte character sets like big5, euc-kr and shift_jis. Oniguruma is used by "mregexp". "Mregexp" is a multi-byte support regex UDF for MySQL. So, I think you can understand easily about how to use it. Thanks, Kentoku 2013/4/20 Sudheera Palihakkara <catchsudheera@gmail.com>

...

Hello Sir,

I've been working on this project for the past couple of days. I found that there are few good regex libraries suitable for this task. Considering the requirements I think PCRE, ICU regex and RGX would do the job. But ICU regex doesn't have recursion but it has well-documented easy-to-understand code. Currently I think PCRE is the best option we can have.

In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about that.?

In the google-melange page, under the application template there is a field called "Project description", what should I include there.? i mean do you expect a full description about the project including figures or just a brief just like in projects ideas page.

Thank you.

On Fri, Apr 19, 2013 at 3:46 PM, Sergei Golubchik <serg@askmonty.org>wrote:

...
Hi, Sudheera!

On Apr 19, Sudheera Palihakkara wrote:

...
Hi, I went through other threads on this topic. In one thread you mentioned to choose a suitable regex library.

*( Preliminary research - only about chosing a regex library to use in MariaDB. You should be able to explain why we should use this library and not some other one.)

* What do you mean by "choosing"? don't we have to enhance the exiting regex library? Or choose from exiting already implemented libraries which are free to use? sorry if it's a stupid question, but I'm confused. :O

Enhancing our old regex library to support all modern features and multiple charsets is complex and bug-prone work.

I don't see why we should bother doing it, when there are plenty of regex libraries available.

There's PHP's mb_regex, there's prce, and many others too. We'd better just pick one that works better for MariaDB, and put it instead of Henry Spencer's library.

Regards, Sergei

P.S. Please, don't reply to me only, use reply-to-all, so that your mails appear on the mailing list.

-- *Sudheera Palihakkara.* Undergraduate Department of *Computer Science and Engineering, *Faculty of Engineering, *University of Moratuwa*, Sri Lanka.

_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

Sudheera Palihakkara

21 Apr 21 Apr

6:30 a.m.

New subject: [Maria-developers] [Gsoc] Regex enhancements Project

Hi Kentoku, thank you, I will surely study about Oniguruma..! :) On Sat, Apr 20, 2013 at 11:26 PM, kentoku <kentokushiba@gmail.com> wrote:

...

Hi Sudheera and Sergei,

...
In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about that.?

Do you know "oniguruma"? http://www.geocities.jp/kosako3/oniguruma/ http://en.wikipedia.org/wiki/Oniguruma

Oniguruma is a regular expressions library, that supports multi-byte character sets like big5, euc-kr and shift_jis. Oniguruma is used by "mregexp". "Mregexp" is a multi-byte support regex UDF for MySQL. So, I think you can understand easily about how to use it.

Thanks, Kentoku

2013/4/20 Sudheera Palihakkara <catchsudheera@gmail.com>

...
Hello Sir,

I've been working on this project for the past couple of days. I found that there are few good regex libraries suitable for this task. Considering the requirements I think PCRE, ICU regex and RGX would do the job. But ICU regex doesn't have recursion but it has well-documented easy-to-understand code. Currently I think PCRE is the best option we can have.

In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about that.?

In the google-melange page, under the application template there is a field called "Project description", what should I include there.? i mean do you expect a full description about the project including figures or just a brief just like in projects ideas page.

Thank you.

On Fri, Apr 19, 2013 at 3:46 PM, Sergei Golubchik <serg@askmonty.org>wrote:

...
Hi, Sudheera!

On Apr 19, Sudheera Palihakkara wrote:

...
Hi, I went through other threads on this topic. In one thread you mentioned to choose a suitable regex library.

*( Preliminary research - only about chosing a regex library to use in MariaDB. You should be able to explain why we should use this library and not some other one.)

* What do you mean by "choosing"? don't we have to enhance the exiting regex library? Or choose from exiting already implemented libraries which are free to use? sorry if it's a stupid question, but I'm confused. :O

Enhancing our old regex library to support all modern features and multiple charsets is complex and bug-prone work.

I don't see why we should bother doing it, when there are plenty of regex libraries available.

There's PHP's mb_regex, there's prce, and many others too. We'd better just pick one that works better for MariaDB, and put it instead of Henry Spencer's library.

Regards, Sergei

P.S. Please, don't reply to me only, use reply-to-all, so that your mails appear on the mailing list.

-- *Sudheera Palihakkara.* Undergraduate Department of *Computer Science and Engineering, *Faculty of Engineering, *University of Moratuwa*, Sri Lanka.

_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

-- *Sudheera Palihakkara.* Undergraduate Department of *Computer Science and Engineering, *Faculty of Engineering, *University of Moratuwa*, Sri Lanka.

Sudheera Palihakkara

11:19 a.m.

New subject: [Maria-developers] [Gsoc] Regex enhancements Project

Hi Kentoku, Oniguruma is looking great. But I can't find if the following features are implemented on Oniguruma or not(even the Wikipedia page doesn't have those information). Any Idea where I can find those? * look-aheads/look-behinds, * non-greedy modifiers * recursion thanks. On Sun, Apr 21, 2013 at 12:00 PM, Sudheera Palihakkara <catchsudheera@gmail.com> wrote:

...

Hi Kentoku,

thank you, I will surely study about Oniguruma..! :)

On Sat, Apr 20, 2013 at 11:26 PM, kentoku <kentokushiba@gmail.com> wrote:

...
Hi Sudheera and Sergei,

...
In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about that.?

Do you know "oniguruma"? http://www.geocities.jp/kosako3/oniguruma/ http://en.wikipedia.org/wiki/Oniguruma

Oniguruma is a regular expressions library, that supports multi-byte character sets like big5, euc-kr and shift_jis. Oniguruma is used by "mregexp". "Mregexp" is a multi-byte support regex UDF for MySQL. So, I think you can understand easily about how to use it.

Thanks, Kentoku

2013/4/20 Sudheera Palihakkara <catchsudheera@gmail.com>

...
Hello Sir,

I've been working on this project for the past couple of days. I found that there are few good regex libraries suitable for this task. Considering the requirements I think PCRE, ICU regex and RGX would do the job. But ICU regex doesn't have recursion but it has well-documented easy-to-understand code. Currently I think PCRE is the best option we can have.

In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about that.?

In the google-melange page, under the application template there is a field called "Project description", what should I include there.? i mean do you expect a full description about the project including figures or just a brief just like in projects ideas page.

Thank you.

On Fri, Apr 19, 2013 at 3:46 PM, Sergei Golubchik <serg@askmonty.org> wrote:

...
Hi, Sudheera!

On Apr 19, Sudheera Palihakkara wrote:

...
Hi, I went through other threads on this topic. In one thread you mentioned to choose a suitable regex library.

*( Preliminary research - only about chosing a regex library to use in MariaDB. You should be able to explain why we should use this library and not some other one.)

* What do you mean by "choosing"? don't we have to enhance the exiting regex library? Or choose from exiting already implemented libraries which are free to use? sorry if it's a stupid question, but I'm confused. :O

Enhancing our old regex library to support all modern features and multiple charsets is complex and bug-prone work.

I don't see why we should bother doing it, when there are plenty of regex libraries available.

There's PHP's mb_regex, there's prce, and many others too. We'd better just pick one that works better for MariaDB, and put it instead of Henry Spencer's library.

Regards, Sergei

P.S. Please, don't reply to me only, use reply-to-all, so that your mails appear on the mailing list.

-- Sudheera Palihakkara. Undergraduate Department of Computer Science and Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka.

_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

-- Sudheera Palihakkara. Undergraduate Department of Computer Science and Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka.

-- Sudheera Palihakkara. Undergraduate Department of Computer Science and Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka.

kentoku

6:27 p.m.

New subject: [Maria-developers] [Gsoc] Regex enhancements Project

Hi Sergei, Is this project already started? Thanks, Kentoku 2013/4/21 Sudheera Palihakkara <catchsudheera@gmail.com>

...

Hi Kentoku,

Oniguruma is looking great. But I can't find if the following features are implemented on Oniguruma or not(even the Wikipedia page doesn't have those information). Any Idea where I can find those?

* look-aheads/look-behinds, * non-greedy modifiers * recursion

thanks.

...
Hi Kentoku,

thank you, I will surely study about Oniguruma..! :)

On Sat, Apr 20, 2013 at 11:26 PM, kentoku <kentokushiba@gmail.com> wrote:

...
Hi Sudheera and Sergei,

...
In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about

...
...
Do you know "oniguruma"? http://www.geocities.jp/kosako3/oniguruma/ http://en.wikipedia.org/wiki/Oniguruma

Oniguruma is a regular expressions library, that supports multi-byte character sets like big5, euc-kr and shift_jis. Oniguruma is used by "mregexp". "Mregexp" is a multi-byte support regex UDF for MySQL. So, I think you can understand easily about how to use it.

Thanks, Kentoku

2013/4/20 Sudheera Palihakkara <catchsudheera@gmail.com>

...
Hello Sir,

I've been working on this project for the past couple of days. I found that there are few good regex libraries suitable for this task.

Considering

...
...
the requirements I think PCRE, ICU regex and RGX would do the job. But ICU regex doesn't have recursion but it has well-documented easy-to-understand code. Currently I think PCRE is the best option we can have.

In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about

...
...
...
In the google-melange page, under the application template there is a field called "Project description", what should I include there.? i

mean do

...
you expect a full description about the project including figures or just a brief just like in projects ideas page.

Thank you.

On Fri, Apr 19, 2013 at 3:46 PM, Sergei Golubchik <serg@askmonty.org> wrote:

...
Hi, Sudheera!

On Apr 19, Sudheera Palihakkara wrote:

...
Hi, I went through other threads on this topic. In one thread you mentioned to choose a suitable regex library.

*( Preliminary research - only about chosing a regex library to use

in

...
...
MariaDB. You should be able to explain why we should use this

On Sun, Apr 21, 2013 at 12:00 PM, Sudheera Palihakkara <catchsudheera@gmail.com> wrote: that.? that.? library

...
...
...
...
...
and not some other one.)

* What do you mean by "choosing"? don't we have to enhance the exiting regex library? Or choose from exiting already implemented libraries which are free to use? sorry if it's a stupid question, but I'm confused. :O

Enhancing our old regex library to support all modern features and multiple charsets is complex and bug-prone work.

I don't see why we should bother doing it, when there are plenty of regex libraries available.

There's PHP's mb_regex, there's prce, and many others too. We'd better just pick one that works better for MariaDB, and put it instead of Henry Spencer's library.

Regards, Sergei

P.S. Please, don't reply to me only, use reply-to-all, so that your mails appear on the mailing list.

-- Sudheera Palihakkara. Undergraduate Department of Computer Science and Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka.

_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

-- Sudheera Palihakkara. Undergraduate Department of Computer Science and Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka.

-- Sudheera Palihakkara. Undergraduate Department of Computer Science and Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka.

Sergei Golubchik

22 Apr 22 Apr

2:41 a.m.

New subject: [Maria-developers] [Gsoc] Regex enhancements Project

Hi, Kentoku! On Apr 22, kentoku wrote:

...

Hi Sergei,

Is this project already started?

No, it's Google Summer of Code project. The work, basically, starts in June. Now students only look at project ideas of different mentoring organizations, and decide what they want to work on in summer.

...

2013/4/21 Sudheera Palihakkara <catchsudheera@gmail.com>

...
Oniguruma is looking great. But I can't find if the following features are implemented on Oniguruma or not(even the Wikipedia page doesn't have those information). Any Idea where I can find those?

* look-aheads/look-behinds, * non-greedy modifiers * recursion

Regards, Sergei

kentoku

4 May 4 May

3:40 p.m.

New subject: [Maria-developers] [Gsoc] Regex enhancements Project

Hi Sudheera and Sergei, Sudheera, How much do you want to prepare for this project before starting GSOC? Serugei, How can I and other people help this project after starting GSOC? Is it same as now? Thanks, Kentoku 2013/4/21 Sudheera Palihakkara <catchsudheera@gmail.com>

...

Hi Kentoku,

Oniguruma is looking great. But I can't find if the following features are implemented on Oniguruma or not(even the Wikipedia page doesn't have those information). Any Idea where I can find those?

* look-aheads/look-behinds, * non-greedy modifiers * recursion

thanks.

...
Hi Kentoku,

thank you, I will surely study about Oniguruma..! :)

On Sat, Apr 20, 2013 at 11:26 PM, kentoku <kentokushiba@gmail.com> wrote:

...
Hi Sudheera and Sergei,

...
In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about

...
...
Do you know "oniguruma"? http://www.geocities.jp/kosako3/oniguruma/ http://en.wikipedia.org/wiki/Oniguruma

Oniguruma is a regular expressions library, that supports multi-byte character sets like big5, euc-kr and shift_jis. Oniguruma is used by "mregexp". "Mregexp" is a multi-byte support regex UDF for MySQL. So, I think you can understand easily about how to use it.

Thanks, Kentoku

2013/4/20 Sudheera Palihakkara <catchsudheera@gmail.com>

...
Hello Sir,

I've been working on this project for the past couple of days. I found that there are few good regex libraries suitable for this task.

Considering

...
...
the requirements I think PCRE, ICU regex and RGX would do the job. But ICU regex doesn't have recursion but it has well-documented easy-to-understand code. Currently I think PCRE is the best option we can have.

In case of I missed some libraries, I guess you will enlighten me to study about them too. considering the requirements I didn't see Asian multi-byte support implemented in anywhere, what would we do about

...
...
...
In the google-melange page, under the application template there is a field called "Project description", what should I include there.? i

mean do

...
you expect a full description about the project including figures or just a brief just like in projects ideas page.

Thank you.

On Fri, Apr 19, 2013 at 3:46 PM, Sergei Golubchik <serg@askmonty.org> wrote:

...
Hi, Sudheera!

On Apr 19, Sudheera Palihakkara wrote:

...
Hi, I went through other threads on this topic. In one thread you mentioned to choose a suitable regex library.

*( Preliminary research - only about chosing a regex library to use

in

...
...
MariaDB. You should be able to explain why we should use this

On Sun, Apr 21, 2013 at 12:00 PM, Sudheera Palihakkara <catchsudheera@gmail.com> wrote: that.? that.? library

...
...
...
...
...
and not some other one.)

* What do you mean by "choosing"? don't we have to enhance the exiting regex library? Or choose from exiting already implemented libraries which are free to use? sorry if it's a stupid question, but I'm confused. :O

Enhancing our old regex library to support all modern features and multiple charsets is complex and bug-prone work.

I don't see why we should bother doing it, when there are plenty of regex libraries available.

There's PHP's mb_regex, there's prce, and many others too. We'd better just pick one that works better for MariaDB, and put it instead of Henry Spencer's library.

Regards, Sergei

P.S. Please, don't reply to me only, use reply-to-all, so that your mails appear on the mailing list.

-- Sudheera Palihakkara. Undergraduate Department of Computer Science and Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka.

_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

-- Sudheera Palihakkara. Undergraduate Department of Computer Science and Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka.

-- Sudheera Palihakkara. Undergraduate Department of Computer Science and Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka.

Sergei Golubchik

3:49 p.m.

New subject: [Maria-developers] [Gsoc] Regex enhancements Project

Hi, Kentoku! On May 05, kentoku wrote:

...

Hi Sudheera and Sergei,

Sudheera, How much do you want to prepare for this project before starting GSOC?

Serugei, How can I and other people help this project after starting GSOC? Is it same as now?

The mentor for this project is Alexander Barkov, I've added him to Cc. And here's the task description: https://mariadb.atlassian.net/browse/MDEV-4425 But let's wait until we know what student proposal (if any) got accepted and who will be working on this project. Regards, Sergei

kentoku

4:07 p.m.

New subject: [Maria-developers] [Gsoc] Regex enhancements Project

Hi Sergei!

...

The mentor for this project is Alexander Barkov, I've added him to Cc. And here's the task description: https://mariadb.atlassian.net/browse/MDEV-4425

...

But let's wait until we know what student proposal (if any) got accepted and who will be working on this project.

O.K. Thanks, Kentoku 2013/5/4 Sergei Golubchik <serg@askmonty.org>

...

Hi, Kentoku!

On May 05, kentoku wrote:

...
Hi Sudheera and Sergei,

Sudheera, How much do you want to prepare for this project before starting GSOC?

Serugei, How can I and other people help this project after starting GSOC? Is it same as now?

The mentor for this project is Alexander Barkov, I've added him to Cc. And here's the task description: https://mariadb.atlassian.net/browse/MDEV-4425

But let's wait until we know what student proposal (if any) got accepted and who will be working on this project.

Regards, Sergei

4444

Age (days ago)

4459

Last active (days ago)

List overview

9 comments

3 participants

participants (3)

kentoku
Sergei Golubchik
Sudheera Palihakkara

Re: [Maria-developers] [Gsoc] Regex enhancements Project

tags

participants (3)