[Maria-discuss] GSOC 2015 :UDFs returning arrays
Hi My name is Blessing Atie and i would love to participate in the Google summer of code 2015 under Mariadb. I am interested in the project UDF returning arrays. I have build mariadb from source and and i also ran the test suites. I am currently studing the sql/sql.** files to understand UDF's and i also downloaded some ebook on mysql internals and mysql plugin development which im reading while hoping to find a breakthrough on this project in them. I would love to know if there are any bugs related to this task on JIRA I could work on as a means of preparing for the project. Please give me the link. Regards
Hi, MySQL has no ARRAY data type. In array is basicaly a nested table data type and MySQL doesn't support nested tables either, except through dynamic columns. So, a UDF that returned an array could only be used by other UDF that understand arrays. That isn't very useful. I see two options if you want to push this forward: a) If you want this to work with MySQL too (the UDF interface is identical) then pass data back as delimited text. For example, I have a UDF called RAPID_EXTRACT_ALL(..) which extracts all values for a key from a JSON document, and returns the values separated by newline. I have stored routines (or other UDF) that understand newline separated input (essentially arrays). I don't think you can do much better for MySQL at the moment. b) If you want to be compatible with only MariaDB, what you could do is return a BLOB that is properly formatted as a DYNCOL. This way COLUMN_GET, COLUMN_JSON, etc, could work on the blob, and there is a well defined interface for transmitting and accessing the data. Neither really require development on your part though, except to understand how to pack a DYNCOL blob properly. --Justin On Tue, Mar 10, 2015 at 11:28 AM, Delveri chick <chickdelveri@gmail.com> wrote:
Hi My name is Blessing Atie and i would love to participate in the Google summer of code 2015 under Mariadb. I am interested in the project UDF returning arrays. I have build mariadb from source and and i also ran the test suites. I am currently studing the sql/sql.** files to understand UDF's and i also downloaded some ebook on mysql internals and mysql plugin development which im reading while hoping to find a breakthrough on this project in them. I would love to know if there are any bugs related to this task on JIRA I could work on as a means of preparing for the project. Please give me the link.
Regards
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi, One more thing: The UDF interface is stuck in the 2000s (3.23 timeframe). It has not, and can not, change. So your only option for returning complex data is through a string anyway, so using a delimited string or packing a blob are really the only options. Maybe someday this bug: http://bugs.mysql.com/bug.php?id=70947 will get fixed, and a UDF will be able to: get THD, handle character sets, and return different/new data types. Even if it were implemenetd you'll still have to implement the ARRAY data type in the server though, and then storage engines will have to be modified to be able to use it. That requires table functions too, really, which are part of wl#820/mdev-820. --Justin On Tue, Mar 10, 2015 at 11:38 AM, Justin Swanhart <greenlion@gmail.com> wrote:
Hi,
MySQL has no ARRAY data type. In array is basicaly a nested table data type and MySQL doesn't support nested tables either, except through dynamic columns. So, a UDF that returned an array could only be used by other UDF that understand arrays. That isn't very useful.
I see two options if you want to push this forward: a) If you want this to work with MySQL too (the UDF interface is identical) then pass data back as delimited text. For example, I have a UDF called RAPID_EXTRACT_ALL(..) which extracts all values for a key from a JSON document, and returns the values separated by newline. I have stored routines (or other UDF) that understand newline separated input (essentially arrays). I don't think you can do much better for MySQL at the moment.
b) If you want to be compatible with only MariaDB, what you could do is return a BLOB that is properly formatted as a DYNCOL. This way COLUMN_GET, COLUMN_JSON, etc, could work on the blob, and there is a well defined interface for transmitting and accessing the data.
Neither really require development on your part though, except to understand how to pack a DYNCOL blob properly.
--Justin
On Tue, Mar 10, 2015 at 11:28 AM, Delveri chick <chickdelveri@gmail.com> wrote:
Hi My name is Blessing Atie and i would love to participate in the Google summer of code 2015 under Mariadb. I am interested in the project UDF returning arrays. I have build mariadb from source and and i also ran the test suites. I am currently studing the sql/sql.** files to understand UDF's and i also downloaded some ebook on mysql internals and mysql plugin development which im reading while hoping to find a breakthrough on this project in them. I would love to know if there are any bugs related to this task on JIRA I could work on as a means of preparing for the project. Please give me the link.
Regards
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi! On 10.03.15 19:38, Justin Swanhart wrote:
Hi,
MySQL has no ARRAY data type. In array is basicaly a nested table data type and MySQL doesn't support nested tables either, except through dynamic columns. So, a UDF that returned an array could only be used by other UDF that understand arrays. That isn't very useful.
Strictly speaking server support row type (also kind of array). It can be used in comparison operations and returned/used in a subquery. [skip]
[accidentally replied only to sender, pasting reply here to all] Hi, So how exactly would you go about returning a row type? You get four choices with UDF: STRING_RESULT INT_RESULT REAL_RESULT DECIMAL_RESULT (which is handled just like strings, because decimal was a string when UDF interface was baked) You can't change the UDF specification. It is not versioned :) On Tue, Mar 10, 2015 at 12:46 PM, Oleksandr Byelkin <sanja@montyprogram.com> wrote:
Hi!
On 10.03.15 19:38, Justin Swanhart wrote:
Hi,
MySQL has no ARRAY data type. In array is basicaly a nested table data type and MySQL doesn't support nested tables either, except through dynamic columns. So, a UDF that returned an array could only be used by other UDF that understand arrays. That isn't very useful.
Strictly speaking server support row type (also kind of array). It can be used in comparison operations and returned/used in a subquery.
[skip]
Hi! On 10.03.15 21:01, Justin Swanhart wrote:
[accidentally replied only to sender, pasting reply here to all]
Hi,
So how exactly would you go about returning a row type? You get four choices with UDF: STRING_RESULT INT_RESULT REAL_RESULT DECIMAL_RESULT (which is handled just like strings, because decimal was a string when UDF interface was baked)
You can't change the UDF specification. It is not versioned :)
I was talking about server internal support which is: enum Item_result { STRING_RESULT=0, REAL_RESULT, INT_RESULT, ROW_RESULT, DECIMAL_RESULT, TIME_RESULT,IMPOSSIBLE_RESULT }; and UDF also can support more :) [skip]
No, You SAY the UDF can support them, but you neglect to point to a technical way how. If you want to return a row as an array, well, how do you add an item to a row outside of the UDF, remove one? How do you count the items in the row? How do you get an item by index? If you want to use internal Item_result, well, how do you access item functions without a THD? You can't get a THD in a UDF, except an opaque pointer to one, unless you pull in half the server and it isn't really "legal" to do so. How do you pass an array to a stored routine, how does the routine modify it, can routines return arrays or just UDF? Row as an array is a ludicrous hack. You can't even pass row to other functions. If you want a new UDF interface that can do those things, use a pluggable item function which I linked to the patch to. You absolutely can't change an unversioned binary interface like the existing UDF without risking server crash. So, if you want ARRAY to be returned by a UDF, you have to start with a new UDF interface. Again, pluggable item func are the right way to do it. --Justin On Tue, Mar 10, 2015 at 1:08 PM, Oleksandr Byelkin <sanja@montyprogram.com> wrote:
Hi!
On 10.03.15 21:01, Justin Swanhart wrote:
[accidentally replied only to sender, pasting reply here to all]
Hi,
So how exactly would you go about returning a row type? You get four choices with UDF: STRING_RESULT INT_RESULT REAL_RESULT DECIMAL_RESULT (which is handled just like strings, because decimal was a string when UDF interface was baked)
You can't change the UDF specification. It is not versioned :)
I was talking about server internal support which is: enum Item_result { STRING_RESULT=0, REAL_RESULT, INT_RESULT, ROW_RESULT, DECIMAL_RESULT, TIME_RESULT,IMPOSSIBLE_RESULT };
and UDF also can support more :)
[skip]
Hi! On 10.03.15 21:41, Justin Swanhart wrote:
No,
You SAY the UDF can support them, but you neglect to point to a technical way how.
If you want to return a row as an array, well, how do you add an item to a row outside of the UDF, remove one? How do you count the items in the row? How do you get an item by index? If you want to use internal Item_result, well, how do you access item functions without a THD? You can't get a THD in a UDF, except an opaque pointer to one, unless you pull in half the server and it isn't really "legal" to do so. How do you pass an array to a stored routine, how does the routine modify it, can routines return arrays or just UDF? Row as an array is a ludicrous hack. You can't even pass row to other functions.
If you want a new UDF interface that can do those things, use a pluggable item function which I linked to the patch to. You absolutely can't change an unversioned binary interface like the existing UDF without risking server crash.
So, if you want ARRAY to be returned by a UDF, you have to start with a new UDF interface. Again, pluggable item func are the right way to do it.
It looks like you misunderstand me. I do not want something with UDF at all, I just mentioned that internally there is something array-like and somebody who will do the task can be aware of it. P.S.: But now I see that UDF interface should be changed due to problems you mentioned.
Hi, Sorry for the misunderstanding. On Tue, Mar 10, 2015 at 3:17 PM, Oleksandr Byelkin <sanja@montyprogram.com> wrote:
Hi!
On 10.03.15 21:41, Justin Swanhart wrote:
No,
You SAY the UDF can support them, but you neglect to point to a technical way how.
If you want to return a row as an array, well, how do you add an item to a row outside of the UDF, remove one? How do you count the items in the row? How do you get an item by index? If you want to use internal Item_result, well, how do you access item functions without a THD? You can't get a THD in a UDF, except an opaque pointer to one, unless you pull in half the server and it isn't really "legal" to do so. How do you pass an array to a stored routine, how does the routine modify it, can routines return arrays or just UDF? Row as an array is a ludicrous hack. You can't even pass row to other functions.
If you want a new UDF interface that can do those things, use a pluggable item function which I linked to the patch to. You absolutely can't change an unversioned binary interface like the existing UDF without risking server crash.
So, if you want ARRAY to be returned by a UDF, you have to start with a new UDF interface. Again, pluggable item func are the right way to do it.
It looks like you misunderstand me.
I do not want something with UDF at all, I just mentioned that internally there is something array-like and somebody who will do the task can be aware of it.
P.S.: But now I see that UDF interface should be changed due to problems you mentioned.
This JIRA task doesn't mention GSoC, but it seems to be what you are asking for: https://mariadb.atlassian.net/browse/MDEV-5199 Hope it helps. Federico -------------------------------------------- Mar 10/3/15, Delveri chick <chickdelveri@gmail.com> ha scritto: Oggetto: [Maria-discuss] GSOC 2015 :UDFs returning arrays A: maria-discuss@lists.launchpad.net Data: Martedì 10 marzo 2015, 19:28 Hi My name is Blessing Atie and i would love to participate in the Google summer of code 2015 under Mariadb. I am interested in the project UDF returning arrays. I have build mariadb from source and and i also ran the test suites. I am currently studing the sql/sql.** files to understand UDF's and i also downloaded some ebook on mysql internals and mysql plugin development which im reading while hoping to find a breakthrough on this project in them. I would love to know if there are any bugs related to this task on JIRA I could work on as a means of preparing for the project.Please give me the link. Regards -----Segue allegato----- _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi, I am not sure of the point of that mdev. First, arrays are not table functions. Arrays are data structures. A table function can certainly return all the items in an array data structure as rows (which would be part of #820) but that doesn't mean that a table function could replace an ARRAY data type. Arrays also have to be able to be passed to stored routines and stored in tables too, or they are virtually useless. Second, WL#820, which far precedes that mdev, adds real table functions without magic "pretend" storage engine as suggested in the later mdev. In any case, you can't return an array or table, from a UDF, without changing the UDF interface which will break other UDF. The UDF interface IS NOT VERSIONED. wl#820 adds external stored procedures and doesn't modify the udf interface (UDF can rot) The mysql bug link I sent makes item functions internally pluggable, so they could support table functions as well and provide a replacement for UDF. Add an ARRAY data type and implement these patches (which are from Antony Curtis) and you'll get what you want. It would be a real shame to replace wl#820 work with some half-backed UDF that has not access to internals. --Justin On Tue, Mar 10, 2015 at 12:04 PM, Federico Razzoli <federico_raz@yahoo.it> wrote:
This JIRA task doesn't mention GSoC, but it seems to be what you are asking for:
https://mariadb.atlassian.net/browse/MDEV-5199
Hope it helps.
Federico
-------------------------------------------- Mar 10/3/15, Delveri chick <chickdelveri@gmail.com> ha scritto:
Oggetto: [Maria-discuss] GSOC 2015 :UDFs returning arrays A: maria-discuss@lists.launchpad.net Data: Martedì 10 marzo 2015, 19:28
Hi My name is Blessing Atie and i would love to participate in the Google summer of code 2015 under Mariadb. I am interested in the project UDF returning arrays. I have build mariadb from source and and i also ran the test suites. I am currently studing the sql/sql.** files to understand UDF's and i also downloaded some ebook on mysql internals and mysql plugin development which im reading while hoping to find a breakthrough on this project in them. I would love to know if there are any bugs related to this task on JIRA I could work on as a means of preparing for the project.Please give me the link. Regards -----Segue allegato-----
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi, Justin! On Mar 10, Justin Swanhart wrote:
I am not sure of the point of that mdev. First, arrays are not table functions. Arrays are data structures. A table function can certainly return all the items in an array data structure as rows (which would be part of #820) but that doesn't mean that a table function could replace an ARRAY data type. Arrays also have to be able to be passed to stored routines and stored in tables too, or they are virtually useless.
Yes, the subject says "UDF returning arrays/ result set", and the point is to have result set, not just an array.
Second, WL#820, which far precedes that mdev, adds real table functions without magic "pretend" storage engine as suggested in the later mdev. In any case, you can't return an array or table, from a UDF, without changing the UDF interface which will break other UDF. The UDF interface IS NOT VERSIONED.
Right. But MDEV-5199 doesn't need to use or extend existing UDF interface. It simply uses "UDF" to mean "user definable function" without implying that it has anything to do with existing UDFs.
wl#820 adds external stored procedures and doesn't modify the udf interface (UDF can rot) The mysql bug link I sent makes item functions internally pluggable, so they could support table functions as well and provide a replacement for UDF. Add an ARRAY data type and implement these patches (which are from Antony Curtis) and you'll get what you want. It would be a real shame to replace wl#820 work with some half-backed UDF that has not access to internals.
This is largely unrelated. This MDEV-5199 is about user functions that can return a table. Something like SELECT * FROM t1, tf(15, NOW()) as t2 WHERE t1.a=t2.b; So, it's about the server that can invoke a UDF in this context and retrieve many rows from the user function. While wl#820 is about the language in which this user function is written. Regards, Sergei
wl#820 also has table functions in it: http://antbits.blogspot.com/2009/01/table-functions-in-mysql.html mysql> INSTALL PLUGIN Deep_Thought SONAME 'psm_example.so'; Query OK, 0 rows affected (0.10 sec) mysql> CREATE FUNCTION test.FooAnswer() -> RETURNS TABLE(answer TEXT) -> NO SQL LANGUAGE Deep_Thought EXTERNAL NAME 'compute'; Query OK, 0 rows affected (0.01 sec) mysql> CREATE VIEW foobar AS SELECT * FROM TABLE(test.FooAnswer) AS wibble; Query OK, 0 rows affected (0.01 sec) mysql> SELECT * FROM foobar WHERE answer LIKE 'F%'; +-----------+ | answer | +-----------+ | Forty-Two | +-----------+ 1 row in set (0.01 sec) On Tue, Mar 10, 2015 at 1:45 PM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Justin!
On Mar 10, Justin Swanhart wrote:
I am not sure of the point of that mdev. First, arrays are not table functions. Arrays are data structures. A table function can certainly return all the items in an array data structure as rows (which would be part of #820) but that doesn't mean that a table function could replace
an
ARRAY data type. Arrays also have to be able to be passed to stored routines and stored in tables too, or they are virtually useless.
Yes, the subject says "UDF returning arrays/ result set", and the point is to have result set, not just an array.
Second, WL#820, which far precedes that mdev, adds real table functions without magic "pretend" storage engine as suggested in the later mdev. In any case, you can't return an array or table, from a UDF, without changing the UDF interface which will break other UDF. The UDF interface IS NOT VERSIONED.
Right. But MDEV-5199 doesn't need to use or extend existing UDF interface. It simply uses "UDF" to mean "user definable function" without implying that it has anything to do with existing UDFs.
wl#820 adds external stored procedures and doesn't modify the udf interface (UDF can rot) The mysql bug link I sent makes item functions internally pluggable, so they could support table functions as well and provide a replacement for UDF. Add an ARRAY data type and implement these patches (which are from Antony Curtis) and you'll get what you want. It would be a real shame to replace wl#820 work with some half-backed UDF that has not access to internals.
This is largely unrelated. This MDEV-5199 is about user functions that can return a table. Something like
SELECT * FROM t1, tf(15, NOW()) as t2 WHERE t1.a=t2.b;
So, it's about the server that can invoke a UDF in this context and retrieve many rows from the user function.
While wl#820 is about the language in which this user function is written.
Regards, Sergei
Hi, Justin! On Mar 10, Justin Swanhart wrote:
wl#820 also has table functions in it: http://antbits.blogspot.com/2009/01/table-functions-in-mysql.html
mysql> INSTALL PLUGIN Deep_Thought SONAME 'psm_example.so'; mysql> CREATE FUNCTION test.FooAnswer() -> RETURNS TABLE(answer TEXT) -> NO SQL LANGUAGE Deep_Thought EXTERNAL NAME 'compute'; mysql> CREATE VIEW foobar AS SELECT * FROM TABLE(test.FooAnswer) AS wibble; mysql> SELECT * FROM foobar WHERE answer LIKE 'F%'; +-----------+ | answer | +-----------+ | Forty-Two | +-----------+
Ah, indeed. Thanks. It's still an unrelated feature - I mean, there is no reason why a table function must necessarily be impemented as an external procedure. And the implementation - I believe - creates a temporary table and puts all the data into it. I would rather prefer a generator-like interface, that produces rows one by one. It works much better for big data sets. Or, may be, we can do both. Either way - this implementation can be very helpful. But it'll need at least to be updated to work in 10.1. And may be we add the generator interface. Regards, Sergei
Hi, Right, my point was that UDF (classical UDF) really can't do arrays or table functions, you need a different type of UDF, and external procs are one type of "new udf". The other type of "new UDF" I would like to see my s Antony's other patch I linked to, which adds "pluggable item functions". Such functions could also use the same table function interface (be it via generator or temp table, or whatever). The biggest advantage over classic UDF is that pluggable item functions get a THD, can use SQL security, properly deal with character sets, etc, and it is a small patch. I'd kind of like two or three type of table func: One returns a SQL statement and is evaluated like a dependent subquery - it doesn't have to be materialized, but if it is, it can be indexed. This is like a "dynamic view" which is difficult to do without ugly hacks. His would be the easiest to implement. Another type would returns a temporary table name, and the temp table is used in the query. These would be evaluated at start of execution like old dependent subqueries. This would allow table functions with a varying number of columns, or that essentially can return more than one data type based in the input. While conceptually similar to the first type, this type can get data from external sources and store it in the table. The temporary would be dropped in deinit() function. Finally, generator functions would use fixed schema and could return big data sets, but they probably can't open any tables without locking problems (all table locks must be taken at query execution start). I hope that makes sense. Sent from my iPhone
On Mar 11, 2015, at 7:35 AM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Justin!
On Mar 10, Justin Swanhart wrote: wl#820 also has table functions in it: http://antbits.blogspot.com/2009/01/table-functions-in-mysql.html
mysql> INSTALL PLUGIN Deep_Thought SONAME 'psm_example.so'; mysql> CREATE FUNCTION test.FooAnswer() -> RETURNS TABLE(answer TEXT) -> NO SQL LANGUAGE Deep_Thought EXTERNAL NAME 'compute'; mysql> CREATE VIEW foobar AS SELECT * FROM TABLE(test.FooAnswer) AS wibble; mysql> SELECT * FROM foobar WHERE answer LIKE 'F%'; +-----------+ | answer | +-----------+ | Forty-Two | +-----------+
Ah, indeed. Thanks.
It's still an unrelated feature - I mean, there is no reason why a table function must necessarily be impemented as an external procedure.
And the implementation - I believe - creates a temporary table and puts all the data into it. I would rather prefer a generator-like interface, that produces rows one by one. It works much better for big data sets. Or, may be, we can do both.
Either way - this implementation can be very helpful. But it'll need at least to be updated to work in 10.1. And may be we add the generator interface.
Regards, Sergei
participants (5)
-
Delveri chick
-
Federico Razzoli
-
Justin Swanhart
-
Oleksandr Byelkin
-
Sergei Golubchik