[Maria-discuss] GSOC 2015 :UDFs returning arrays
Hi My name is Blessing Atie and i would love to participate in the Google summer of code 2015 under Mariadb. I am interested in the project UDF returning arrays. I have build mariadb from source and and i also ran the test suites. I am currently studing the sql/sql.** files to understand UDF's and i also downloaded some ebook on mysql internals and mysql plugin development which im reading while hoping to find a breakthrough on this project in them. I would love to know if there are any bugs related to this task on JIRA I could work on as a means of preparing for the project. Please give me the link. Regards
Hi,
MySQL has no ARRAY data type. In array is basicaly a nested table data
type and MySQL doesn't support nested tables either, except through dynamic
columns. So, a UDF that returned an array could only be used by other UDF
that understand arrays. That isn't very useful.
I see two options if you want to push this forward:
a) If you want this to work with MySQL too (the UDF interface is identical)
then pass data back as delimited text. For example, I have a UDF called
RAPID_EXTRACT_ALL(..) which extracts all values for a key from a JSON
document, and returns the values separated by newline. I have stored
routines (or other UDF) that understand newline separated input
(essentially arrays). I don't think you can do much better for MySQL at
the moment.
b) If you want to be compatible with only MariaDB, what you could do is
return a BLOB that is properly formatted as a DYNCOL. This way COLUMN_GET,
COLUMN_JSON, etc, could work on the blob, and there is a well defined
interface for transmitting and accessing the data.
Neither really require development on your part though, except to
understand how to pack a DYNCOL blob properly.
--Justin
On Tue, Mar 10, 2015 at 11:28 AM, Delveri chick
Hi My name is Blessing Atie and i would love to participate in the Google summer of code 2015 under Mariadb. I am interested in the project UDF returning arrays. I have build mariadb from source and and i also ran the test suites. I am currently studing the sql/sql.** files to understand UDF's and i also downloaded some ebook on mysql internals and mysql plugin development which im reading while hoping to find a breakthrough on this project in them. I would love to know if there are any bugs related to this task on JIRA I could work on as a means of preparing for the project. Please give me the link.
Regards
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi,
One more thing: The UDF interface is stuck in the 2000s (3.23 timeframe).
It has not, and can not, change. So your only option for returning complex
data is through a string anyway, so using a delimited string or packing a
blob are really the only options. Maybe someday this bug:
http://bugs.mysql.com/bug.php?id=70947 will get fixed, and a UDF will be
able to: get THD, handle character sets, and return different/new data
types. Even if it were implemenetd you'll still have to implement the
ARRAY data type in the server though, and then storage engines will have to
be modified to be able to use it. That requires table functions too,
really, which are part of wl#820/mdev-820.
--Justin
On Tue, Mar 10, 2015 at 11:38 AM, Justin Swanhart
Hi,
MySQL has no ARRAY data type. In array is basicaly a nested table data type and MySQL doesn't support nested tables either, except through dynamic columns. So, a UDF that returned an array could only be used by other UDF that understand arrays. That isn't very useful.
I see two options if you want to push this forward: a) If you want this to work with MySQL too (the UDF interface is identical) then pass data back as delimited text. For example, I have a UDF called RAPID_EXTRACT_ALL(..) which extracts all values for a key from a JSON document, and returns the values separated by newline. I have stored routines (or other UDF) that understand newline separated input (essentially arrays). I don't think you can do much better for MySQL at the moment.
b) If you want to be compatible with only MariaDB, what you could do is return a BLOB that is properly formatted as a DYNCOL. This way COLUMN_GET, COLUMN_JSON, etc, could work on the blob, and there is a well defined interface for transmitting and accessing the data.
Neither really require development on your part though, except to understand how to pack a DYNCOL blob properly.
--Justin
On Tue, Mar 10, 2015 at 11:28 AM, Delveri chick
wrote: Hi My name is Blessing Atie and i would love to participate in the Google summer of code 2015 under Mariadb. I am interested in the project UDF returning arrays. I have build mariadb from source and and i also ran the test suites. I am currently studing the sql/sql.** files to understand UDF's and i also downloaded some ebook on mysql internals and mysql plugin development which im reading while hoping to find a breakthrough on this project in them. I would love to know if there are any bugs related to this task on JIRA I could work on as a means of preparing for the project. Please give me the link.
Regards
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi! On 10.03.15 19:38, Justin Swanhart wrote:
Hi,
MySQL has no ARRAY data type. In array is basicaly a nested table data type and MySQL doesn't support nested tables either, except through dynamic columns. So, a UDF that returned an array could only be used by other UDF that understand arrays. That isn't very useful.
Strictly speaking server support row type (also kind of array). It can be used in comparison operations and returned/used in a subquery. [skip]
[accidentally replied only to sender, pasting reply here to all]
Hi,
So how exactly would you go about returning a row type? You get four
choices with UDF:
STRING_RESULT
INT_RESULT
REAL_RESULT
DECIMAL_RESULT (which is handled just like strings, because decimal was a
string when UDF interface was baked)
You can't change the UDF specification. It is not versioned :)
On Tue, Mar 10, 2015 at 12:46 PM, Oleksandr Byelkin
Hi!
On 10.03.15 19:38, Justin Swanhart wrote:
Hi,
MySQL has no ARRAY data type. In array is basicaly a nested table data type and MySQL doesn't support nested tables either, except through dynamic columns. So, a UDF that returned an array could only be used by other UDF that understand arrays. That isn't very useful.
Strictly speaking server support row type (also kind of array). It can be used in comparison operations and returned/used in a subquery.
[skip]
Hi! On 10.03.15 21:01, Justin Swanhart wrote:
[accidentally replied only to sender, pasting reply here to all]
Hi,
So how exactly would you go about returning a row type? You get four choices with UDF: STRING_RESULT INT_RESULT REAL_RESULT DECIMAL_RESULT (which is handled just like strings, because decimal was a string when UDF interface was baked)
You can't change the UDF specification. It is not versioned :)
I was talking about server internal support which is: enum Item_result { STRING_RESULT=0, REAL_RESULT, INT_RESULT, ROW_RESULT, DECIMAL_RESULT, TIME_RESULT,IMPOSSIBLE_RESULT }; and UDF also can support more :) [skip]
No,
You SAY the UDF can support them, but you neglect to point to a technical
way how.
If you want to return a row as an array, well, how do you add an item to a
row outside of the UDF, remove one? How do you count the items in the
row? How do you get an item by index? If you want to use internal
Item_result, well, how do you access item functions without a THD? You
can't get a THD in a UDF, except an opaque pointer to one, unless you pull
in half the server and it isn't really "legal" to do so. How do you pass
an array to a stored routine, how does the routine modify it, can routines
return arrays or just UDF? Row as an array is a ludicrous hack. You can't
even pass row to other functions.
If you want a new UDF interface that can do those things, use a pluggable
item function which I linked to the patch to. You absolutely can't change
an unversioned binary interface like the existing UDF without risking
server crash.
So, if you want ARRAY to be returned by a UDF, you have to start with a new
UDF interface. Again, pluggable item func are the right way to do it.
--Justin
On Tue, Mar 10, 2015 at 1:08 PM, Oleksandr Byelkin
Hi!
On 10.03.15 21:01, Justin Swanhart wrote:
[accidentally replied only to sender, pasting reply here to all]
Hi,
So how exactly would you go about returning a row type? You get four choices with UDF: STRING_RESULT INT_RESULT REAL_RESULT DECIMAL_RESULT (which is handled just like strings, because decimal was a string when UDF interface was baked)
You can't change the UDF specification. It is not versioned :)
I was talking about server internal support which is: enum Item_result { STRING_RESULT=0, REAL_RESULT, INT_RESULT, ROW_RESULT, DECIMAL_RESULT, TIME_RESULT,IMPOSSIBLE_RESULT };
and UDF also can support more :)
[skip]
Hi! On 10.03.15 21:41, Justin Swanhart wrote:
No,
You SAY the UDF can support them, but you neglect to point to a technical way how.
If you want to return a row as an array, well, how do you add an item to a row outside of the UDF, remove one? How do you count the items in the row? How do you get an item by index? If you want to use internal Item_result, well, how do you access item functions without a THD? You can't get a THD in a UDF, except an opaque pointer to one, unless you pull in half the server and it isn't really "legal" to do so. How do you pass an array to a stored routine, how does the routine modify it, can routines return arrays or just UDF? Row as an array is a ludicrous hack. You can't even pass row to other functions.
If you want a new UDF interface that can do those things, use a pluggable item function which I linked to the patch to. You absolutely can't change an unversioned binary interface like the existing UDF without risking server crash.
So, if you want ARRAY to be returned by a UDF, you have to start with a new UDF interface. Again, pluggable item func are the right way to do it.
It looks like you misunderstand me. I do not want something with UDF at all, I just mentioned that internally there is something array-like and somebody who will do the task can be aware of it. P.S.: But now I see that UDF interface should be changed due to problems you mentioned.
Hi,
Sorry for the misunderstanding.
On Tue, Mar 10, 2015 at 3:17 PM, Oleksandr Byelkin
Hi!
On 10.03.15 21:41, Justin Swanhart wrote:
No,
You SAY the UDF can support them, but you neglect to point to a technical way how.
If you want to return a row as an array, well, how do you add an item to a row outside of the UDF, remove one? How do you count the items in the row? How do you get an item by index? If you want to use internal Item_result, well, how do you access item functions without a THD? You can't get a THD in a UDF, except an opaque pointer to one, unless you pull in half the server and it isn't really "legal" to do so. How do you pass an array to a stored routine, how does the routine modify it, can routines return arrays or just UDF? Row as an array is a ludicrous hack. You can't even pass row to other functions.
If you want a new UDF interface that can do those things, use a pluggable item function which I linked to the patch to. You absolutely can't change an unversioned binary interface like the existing UDF without risking server crash.
So, if you want ARRAY to be returned by a UDF, you have to start with a new UDF interface. Again, pluggable item func are the right way to do it.
It looks like you misunderstand me.
I do not want something with UDF at all, I just mentioned that internally there is something array-like and somebody who will do the task can be aware of it.
P.S.: But now I see that UDF interface should be changed due to problems you mentioned.
This JIRA task doesn't mention GSoC, but it seems to be what you are asking for:
https://mariadb.atlassian.net/browse/MDEV-5199
Hope it helps.
Federico
--------------------------------------------
Mar 10/3/15, Delveri chick
Hi,
I am not sure of the point of that mdev. First, arrays are not table
functions. Arrays are data structures. A table function can certainly
return all the items in an array data structure as rows (which would be
part of #820) but that doesn't mean that a table function could replace an
ARRAY data type. Arrays also have to be able to be passed to stored
routines and stored in tables too, or they are virtually useless.
Second, WL#820, which far precedes that mdev, adds real table functions
without magic "pretend" storage engine as suggested in the later mdev. In
any case, you can't return an array or table, from a UDF, without changing
the UDF interface which will break other UDF. The UDF interface IS NOT
VERSIONED.
wl#820 adds external stored procedures and doesn't modify the udf interface
(UDF can rot) The mysql bug link I sent makes item functions internally
pluggable, so they could support table functions as well and provide a
replacement for UDF. Add an ARRAY data type and implement these patches
(which are from Antony Curtis) and you'll get what you want.
It would be a real shame to replace wl#820 work with some half-backed UDF
that has not access to internals.
--Justin
On Tue, Mar 10, 2015 at 12:04 PM, Federico Razzoli
This JIRA task doesn't mention GSoC, but it seems to be what you are asking for:
https://mariadb.atlassian.net/browse/MDEV-5199
Hope it helps.
Federico
-------------------------------------------- Mar 10/3/15, Delveri chick
ha scritto: Oggetto: [Maria-discuss] GSOC 2015 :UDFs returning arrays A: maria-discuss@lists.launchpad.net Data: Martedì 10 marzo 2015, 19:28
Hi My name is Blessing Atie and i would love to participate in the Google summer of code 2015 under Mariadb. I am interested in the project UDF returning arrays. I have build mariadb from source and and i also ran the test suites. I am currently studing the sql/sql.** files to understand UDF's and i also downloaded some ebook on mysql internals and mysql plugin development which im reading while hoping to find a breakthrough on this project in them. I would love to know if there are any bugs related to this task on JIRA I could work on as a means of preparing for the project.Please give me the link. Regards -----Segue allegato-----
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi, Justin! On Mar 10, Justin Swanhart wrote:
I am not sure of the point of that mdev. First, arrays are not table functions. Arrays are data structures. A table function can certainly return all the items in an array data structure as rows (which would be part of #820) but that doesn't mean that a table function could replace an ARRAY data type. Arrays also have to be able to be passed to stored routines and stored in tables too, or they are virtually useless.
Yes, the subject says "UDF returning arrays/ result set", and the point is to have result set, not just an array.
Second, WL#820, which far precedes that mdev, adds real table functions without magic "pretend" storage engine as suggested in the later mdev. In any case, you can't return an array or table, from a UDF, without changing the UDF interface which will break other UDF. The UDF interface IS NOT VERSIONED.
Right. But MDEV-5199 doesn't need to use or extend existing UDF interface. It simply uses "UDF" to mean "user definable function" without implying that it has anything to do with existing UDFs.
wl#820 adds external stored procedures and doesn't modify the udf interface (UDF can rot) The mysql bug link I sent makes item functions internally pluggable, so they could support table functions as well and provide a replacement for UDF. Add an ARRAY data type and implement these patches (which are from Antony Curtis) and you'll get what you want. It would be a real shame to replace wl#820 work with some half-backed UDF that has not access to internals.
This is largely unrelated. This MDEV-5199 is about user functions that can return a table. Something like SELECT * FROM t1, tf(15, NOW()) as t2 WHERE t1.a=t2.b; So, it's about the server that can invoke a UDF in this context and retrieve many rows from the user function. While wl#820 is about the language in which this user function is written. Regards, Sergei
wl#820 also has table functions in it:
http://antbits.blogspot.com/2009/01/table-functions-in-mysql.html
mysql> INSTALL PLUGIN Deep_Thought SONAME 'psm_example.so';
Query OK, 0 rows affected (0.10 sec)
mysql> CREATE FUNCTION test.FooAnswer()
-> RETURNS TABLE(answer TEXT)
-> NO SQL LANGUAGE Deep_Thought EXTERNAL NAME 'compute';
Query OK, 0 rows affected (0.01 sec)
mysql> CREATE VIEW foobar AS SELECT * FROM TABLE(test.FooAnswer) AS wibble;
Query OK, 0 rows affected (0.01 sec)
mysql> SELECT * FROM foobar WHERE answer LIKE 'F%';
+-----------+
| answer |
+-----------+
| Forty-Two |
+-----------+
1 row in set (0.01 sec)
On Tue, Mar 10, 2015 at 1:45 PM, Sergei Golubchik
Hi, Justin!
On Mar 10, Justin Swanhart wrote:
I am not sure of the point of that mdev. First, arrays are not table functions. Arrays are data structures. A table function can certainly return all the items in an array data structure as rows (which would be part of #820) but that doesn't mean that a table function could replace
an
ARRAY data type. Arrays also have to be able to be passed to stored routines and stored in tables too, or they are virtually useless.
Yes, the subject says "UDF returning arrays/ result set", and the point is to have result set, not just an array.
Second, WL#820, which far precedes that mdev, adds real table functions without magic "pretend" storage engine as suggested in the later mdev. In any case, you can't return an array or table, from a UDF, without changing the UDF interface which will break other UDF. The UDF interface IS NOT VERSIONED.
Right. But MDEV-5199 doesn't need to use or extend existing UDF interface. It simply uses "UDF" to mean "user definable function" without implying that it has anything to do with existing UDFs.
wl#820 adds external stored procedures and doesn't modify the udf interface (UDF can rot) The mysql bug link I sent makes item functions internally pluggable, so they could support table functions as well and provide a replacement for UDF. Add an ARRAY data type and implement these patches (which are from Antony Curtis) and you'll get what you want. It would be a real shame to replace wl#820 work with some half-backed UDF that has not access to internals.
This is largely unrelated. This MDEV-5199 is about user functions that can return a table. Something like
SELECT * FROM t1, tf(15, NOW()) as t2 WHERE t1.a=t2.b;
So, it's about the server that can invoke a UDF in this context and retrieve many rows from the user function.
While wl#820 is about the language in which this user function is written.
Regards, Sergei
Hi, Justin! On Mar 10, Justin Swanhart wrote:
wl#820 also has table functions in it: http://antbits.blogspot.com/2009/01/table-functions-in-mysql.html
mysql> INSTALL PLUGIN Deep_Thought SONAME 'psm_example.so'; mysql> CREATE FUNCTION test.FooAnswer() -> RETURNS TABLE(answer TEXT) -> NO SQL LANGUAGE Deep_Thought EXTERNAL NAME 'compute'; mysql> CREATE VIEW foobar AS SELECT * FROM TABLE(test.FooAnswer) AS wibble; mysql> SELECT * FROM foobar WHERE answer LIKE 'F%'; +-----------+ | answer | +-----------+ | Forty-Two | +-----------+
Ah, indeed. Thanks. It's still an unrelated feature - I mean, there is no reason why a table function must necessarily be impemented as an external procedure. And the implementation - I believe - creates a temporary table and puts all the data into it. I would rather prefer a generator-like interface, that produces rows one by one. It works much better for big data sets. Or, may be, we can do both. Either way - this implementation can be very helpful. But it'll need at least to be updated to work in 10.1. And may be we add the generator interface. Regards, Sergei
Hi, Right, my point was that UDF (classical UDF) really can't do arrays or table functions, you need a different type of UDF, and external procs are one type of "new udf". The other type of "new UDF" I would like to see my s Antony's other patch I linked to, which adds "pluggable item functions". Such functions could also use the same table function interface (be it via generator or temp table, or whatever). The biggest advantage over classic UDF is that pluggable item functions get a THD, can use SQL security, properly deal with character sets, etc, and it is a small patch. I'd kind of like two or three type of table func: One returns a SQL statement and is evaluated like a dependent subquery - it doesn't have to be materialized, but if it is, it can be indexed. This is like a "dynamic view" which is difficult to do without ugly hacks. His would be the easiest to implement. Another type would returns a temporary table name, and the temp table is used in the query. These would be evaluated at start of execution like old dependent subqueries. This would allow table functions with a varying number of columns, or that essentially can return more than one data type based in the input. While conceptually similar to the first type, this type can get data from external sources and store it in the table. The temporary would be dropped in deinit() function. Finally, generator functions would use fixed schema and could return big data sets, but they probably can't open any tables without locking problems (all table locks must be taken at query execution start). I hope that makes sense. Sent from my iPhone
On Mar 11, 2015, at 7:35 AM, Sergei Golubchik
wrote: Hi, Justin!
On Mar 10, Justin Swanhart wrote: wl#820 also has table functions in it: http://antbits.blogspot.com/2009/01/table-functions-in-mysql.html
mysql> INSTALL PLUGIN Deep_Thought SONAME 'psm_example.so'; mysql> CREATE FUNCTION test.FooAnswer() -> RETURNS TABLE(answer TEXT) -> NO SQL LANGUAGE Deep_Thought EXTERNAL NAME 'compute'; mysql> CREATE VIEW foobar AS SELECT * FROM TABLE(test.FooAnswer) AS wibble; mysql> SELECT * FROM foobar WHERE answer LIKE 'F%'; +-----------+ | answer | +-----------+ | Forty-Two | +-----------+
Ah, indeed. Thanks.
It's still an unrelated feature - I mean, there is no reason why a table function must necessarily be impemented as an external procedure.
And the implementation - I believe - creates a temporary table and puts all the data into it. I would rather prefer a generator-like interface, that produces rows one by one. It works much better for big data sets. Or, may be, we can do both.
Either way - this implementation can be very helpful. But it'll need at least to be updated to work in 10.1. And may be we add the generator interface.
Regards, Sergei
participants (5)
-
Delveri chick
-
Federico Razzoli
-
Justin Swanhart
-
Oleksandr Byelkin
-
Sergei Golubchik