[Maria-developers] Aggregate stored functions [MDEV-7773]
Well i have checked out the syntax for the databases for aggregate functions. I thought using the syntax similar to that of HSQL would be good. In it if we are having N tuples then we are computing the values for the N rows and then after that we make another call and return the value for the function. Well this seems very reasonable to do. Using PostgreSQL <http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html> we need to have two regular functions but if we have an aggregate function like Count(column_name) , then this would be done in just a single function , I guess we would not need two functions for it . Regards, Varun
Hi, Varun! On Mar 02, Varun Gupta wrote:
Well i have checked out the syntax for the databases for aggregate functions. I thought using the syntax similar to that of HSQL would be good. In it if we are having N tuples then we are computing the values for the N rows and then after that we make another call and return the value for the function. Well this seems very reasonable to do. Using PostgreSQL <http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html> we need to have two regular functions but if we have an aggregate function like Count(column_name) , then this would be done in just a single function , I guess we would not need two functions for it .
Yes, I agree about PostgreSQL syntax. An aggregate function there has "artefacts" - two regular functions that show up in the list of functions and can be called directly too. I think this is confusing. HSQL syntax, I mean, exactly as in HSQL, is full of random limitations. http://hsqldb.org/doc/guide/sqlroutines-chapt.html#src_aggregate_functions In HSQL an aggregate function can have only *one* argument, and the state is, always, exactly *two* variables. We, of course, won't have these limitation, if we'll do HSQL-style syntax. But this syntax is also kind of hackish. One function that has two different semantics and the 'flag' argument that selects which one to use. And many parameters in the function declaration, while only one is in the function invocation. Oracle uses an object-oriented syntax. Did you find any other DBMS that support this feature? Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Well in HSQL we have the limitation of having maximum of 4 arguments , but in our syntax we can increase the number of arguments.The main thing is what I understood from HSQL and PostgreSQL is that an aggregate function would have 2 states. 1) when we compute the result that is the result keeps on getting updated on various calls to the function 2) when we return the result HSQL does that using the flag so we need to call an additional time with flag == TRUE so that result is returned . While in PostgreSQL we split the work into two regular functions . The problem is that the second function in PostgreSQL is optional so that adds to a bit of confusion. The syntax for ORACLE I did not understand properly, would give it a try again . For SYBASE I have been going through: http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc0... For SQL server https://msdn.microsoft.com/en-in/library/ms182741.aspx DB2 does not support user-defined aggregate functions http://www.ibm.com/developerworks/data/library/techarticle/0309stolze/0309st... On Wed, Mar 2, 2016 at 4:10 PM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Varun!
On Mar 02, Varun Gupta wrote:
Well i have checked out the syntax for the databases for aggregate functions. I thought using the syntax similar to that of HSQL would be good. In it if we are having N tuples then we are computing the values for the N rows and then after that we make another call and return the value for the function. Well this seems very reasonable to do. Using PostgreSQL <http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html> we need to have two regular functions but if we have an aggregate function like Count(column_name) , then this would be done in just a single function , I guess we would not need two functions for it .
Yes, I agree about PostgreSQL syntax. An aggregate function there has "artefacts" - two regular functions that show up in the list of functions and can be called directly too. I think this is confusing.
HSQL syntax, I mean, exactly as in HSQL, is full of random limitations. http://hsqldb.org/doc/guide/sqlroutines-chapt.html#src_aggregate_functions
In HSQL an aggregate function can have only *one* argument, and the state is, always, exactly *two* variables. We, of course, won't have these limitation, if we'll do HSQL-style syntax.
But this syntax is also kind of hackish. One function that has two different semantics and the 'flag' argument that selects which one to use. And many parameters in the function declaration, while only one is in the function invocation.
Oracle uses an object-oriented syntax.
Did you find any other DBMS that support this feature?
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hi, Varun! On Mar 02, Varun Gupta wrote:
Well in HSQL we have the limitation of having maximum of 4 arguments , but in our syntax we can increase the number of arguments.The main thing is what I understood from HSQL and PostgreSQL is that an aggregate function would have 2 states. 1) when we compute the result that is the result keeps on getting updated on various calls to the function 2) when we return the result
HSQL does that using the flag so we need to call an additional time with flag == TRUE so that result is returned . While in PostgreSQL we split the work into two regular functions . The problem is that the second function in PostgreSQL is optional so that adds to a bit of confusion.
What about other ideas presented in MDEV-7773? I like the cursor one. Compare the standard SQL function: CREATE FUNCTION avg() RETURNS DOUBLE BEGIN DECLARE count INT DEFAULT 0; DECLARE sum DOUBLE DEFAULT 0; DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' RETURN sum/count; DECLARE x DOUBLE; DECLARE cur1 CURSOR FOR SELECT col1 FROM t1; OPEN cur1; LOOP FETCH cur1 INTO x; SET count:=count+1; SET sum:=sum+x; END LOOP; END This is a regular standard non-aggregate function that returns an average of all values in a column 'col1' of the table 't1'. The cursor idea of a syntax for an aggregate function uses the same familiar logic: CREATE AGGREGATE FUNCTION avg(x DOUBLE) RETURNS DOUBLE BEGIN DECLARE count INT DEFAULT 0; DECLARE sum DOUBLE DEFAULT 0; DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' RETURN sum/count; LOOP FETCH GROUP NEXT ROW; SET count:=count+1; SET sum:=sum+x; END LOOP; END This looks simple, familiar, natural, leaves no artefacts. What I don't like is that a function's declared argument is not used directly. Looks a bit artificial :(
The syntax for ORACLE I did not understand properly, would give it a try again.
Don't bother, it's an object oriented approach. An aggregate function is an *object*. Object has different methods for result and for getting updated. Object data fields used to store the intermediate state. This is quite natural, if the DBMS supports objects. But we don't.
For SYBASE I have been going through: http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc0...
Okay, so "External C/C++."
For SQL server https://msdn.microsoft.com/en-in/library/ms182741.aspx
Same, "implementation is defined in a class of an assembly in the .NET Framework"
DB2 does not support user-defined aggregate functions http://www.ibm.com/developerworks/data/library/techarticle/0309stolze/0309st...
Right. Regards, Sergei Chief Architect MariaDB and security@mariadb.org
With the cursor approach, I think it looks more easier to understand . Also it covers the point that the state has to be saved when the function is called for different values of x ,so we know the approach is correct. We are using the function attribute X in calculating the sum. I don't understand what you mean by not using declared argument directly. Well at least the cursor method looks far simpler from the ones we discussed earlier . On Wed, Mar 2, 2016 at 6:41 PM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Varun!
On Mar 02, Varun Gupta wrote:
Well in HSQL we have the limitation of having maximum of 4 arguments , but in our syntax we can increase the number of arguments.The main thing is what I understood from HSQL and PostgreSQL is that an aggregate function would have 2 states. 1) when we compute the result that is the result keeps on getting updated on various calls to the function 2) when we return the result
HSQL does that using the flag so we need to call an additional time with flag == TRUE so that result is returned . While in PostgreSQL we split the work into two regular functions . The problem is that the second function in PostgreSQL is optional so that adds to a bit of confusion.
What about other ideas presented in MDEV-7773? I like the cursor one. Compare the standard SQL function:
CREATE FUNCTION avg() RETURNS DOUBLE BEGIN DECLARE count INT DEFAULT 0; DECLARE sum DOUBLE DEFAULT 0; DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' RETURN sum/count; DECLARE x DOUBLE; DECLARE cur1 CURSOR FOR SELECT col1 FROM t1; OPEN cur1; LOOP FETCH cur1 INTO x; SET count:=count+1; SET sum:=sum+x; END LOOP; END
This is a regular standard non-aggregate function that returns an average of all values in a column 'col1' of the table 't1'. The cursor idea of a syntax for an aggregate function uses the same familiar logic:
CREATE AGGREGATE FUNCTION avg(x DOUBLE) RETURNS DOUBLE BEGIN DECLARE count INT DEFAULT 0; DECLARE sum DOUBLE DEFAULT 0; DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' RETURN sum/count; LOOP FETCH GROUP NEXT ROW; SET count:=count+1; SET sum:=sum+x; END LOOP; END
This looks simple, familiar, natural, leaves no artefacts. What I don't like is that a function's declared argument is not used directly. Looks a bit artificial :(
The syntax for ORACLE I did not understand properly, would give it a try again.
Don't bother, it's an object oriented approach. An aggregate function is an *object*. Object has different methods for result and for getting updated. Object data fields used to store the intermediate state. This is quite natural, if the DBMS supports objects. But we don't.
For SYBASE I have been going through:
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc0...
Okay, so "External C/C++."
For SQL server https://msdn.microsoft.com/en-in/library/ms182741.aspx
Same, "implementation is defined in a class of an assembly in the .NET Framework"
DB2 does not support user-defined aggregate functions
http://www.ibm.com/developerworks/data/library/techarticle/0309stolze/0309st...
Right.
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
With the syntax more or less clear to us , can we have a discussion about how we have to go on with the implementation for the aggregate functions. Firstly I thought I should look at the implementation of CREATE FUNCTION, any other suggestions ? On Thu, Mar 3, 2016 at 4:56 PM, Varun Gupta <varungupta1803@gmail.com> wrote:
With the cursor approach, I think it looks more easier to understand . Also it covers the point that the state has to be saved when the function is called for different values of x ,so we know the approach is correct. We are using the function attribute X in calculating the sum. I don't understand what you mean by not using declared argument directly. Well at least the cursor method looks far simpler from the ones we discussed earlier .
On Wed, Mar 2, 2016 at 6:41 PM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Varun!
On Mar 02, Varun Gupta wrote:
Well in HSQL we have the limitation of having maximum of 4 arguments , but in our syntax we can increase the number of arguments.The main thing is what I understood from HSQL and PostgreSQL is that an aggregate function would have 2 states. 1) when we compute the result that is the result keeps on getting updated on various calls to the function 2) when we return the result
HSQL does that using the flag so we need to call an additional time with flag == TRUE so that result is returned . While in PostgreSQL we split the work into two regular functions . The problem is that the second function in PostgreSQL is optional so that adds to a bit of confusion.
What about other ideas presented in MDEV-7773? I like the cursor one. Compare the standard SQL function:
CREATE FUNCTION avg() RETURNS DOUBLE BEGIN DECLARE count INT DEFAULT 0; DECLARE sum DOUBLE DEFAULT 0; DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' RETURN sum/count; DECLARE x DOUBLE; DECLARE cur1 CURSOR FOR SELECT col1 FROM t1; OPEN cur1; LOOP FETCH cur1 INTO x; SET count:=count+1; SET sum:=sum+x; END LOOP; END
This is a regular standard non-aggregate function that returns an average of all values in a column 'col1' of the table 't1'. The cursor idea of a syntax for an aggregate function uses the same familiar logic:
CREATE AGGREGATE FUNCTION avg(x DOUBLE) RETURNS DOUBLE BEGIN DECLARE count INT DEFAULT 0; DECLARE sum DOUBLE DEFAULT 0; DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' RETURN sum/count; LOOP FETCH GROUP NEXT ROW; SET count:=count+1; SET sum:=sum+x; END LOOP; END
This looks simple, familiar, natural, leaves no artefacts. What I don't like is that a function's declared argument is not used directly. Looks a bit artificial :(
The syntax for ORACLE I did not understand properly, would give it a try again.
Don't bother, it's an object oriented approach. An aggregate function is an *object*. Object has different methods for result and for getting updated. Object data fields used to store the intermediate state. This is quite natural, if the DBMS supports objects. But we don't.
For SYBASE I have been going through:
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc0...
Okay, so "External C/C++."
For SQL server https://msdn.microsoft.com/en-in/library/ms182741.aspx
Same, "implementation is defined in a class of an assembly in the .NET Framework"
DB2 does not support user-defined aggregate functions
http://www.ibm.com/developerworks/data/library/techarticle/0309stolze/0309st...
Right.
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hi, Varun! On Mar 04, Varun Gupta wrote:
With the syntax more or less clear to us, can we have a discussion about how we have to go on with the implementation for the aggregate functions. Firstly I thought I should look at the implementation of CREATE FUNCTION, any other suggestions ?
Sure, why not. But, as you've seen, that suggested syntax is almost identical to the standard one, so hopefully it'll need very little changes to the current CREATE FUNCTION implementation. More important would be to look at the function execution (not definition). This mainly happens in sql/sp_head.cc but other sql/sp_* files are also used. This cursor-like syntax introduces an new mode of executing stored routines. Currently, a routine (a procedure or a function) is executed completely from the beginning to the end. But with this cursor-like syntax, the server will need to start executing a stored function, stop at the FETCH statement, postpone the execution, then later continue from that point. There is no code to do it now. Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hi Sergei, I understand that having a cursor like syntax we would have to write some new code . What I would have liked to know is that the more details should be discussed which need to be incorporated in the implementation of the aggregate functions. Basically what I am asking is what all details have to be there in the implementation of the aggregate functions. On Fri, Mar 4, 2016 at 10:21 PM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Varun!
On Mar 04, Varun Gupta wrote:
With the syntax more or less clear to us, can we have a discussion about how we have to go on with the implementation for the aggregate functions. Firstly I thought I should look at the implementation of CREATE FUNCTION, any other suggestions ?
Sure, why not. But, as you've seen, that suggested syntax is almost identical to the standard one, so hopefully it'll need very little changes to the current CREATE FUNCTION implementation.
More important would be to look at the function execution (not definition). This mainly happens in sql/sp_head.cc but other sql/sp_* files are also used.
This cursor-like syntax introduces an new mode of executing stored routines. Currently, a routine (a procedure or a function) is executed completely from the beginning to the end. But with this cursor-like syntax, the server will need to start executing a stored function, stop at the FETCH statement, postpone the execution, then later continue from that point. There is no code to do it now.
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hi, Varun! On Mar 06, Varun Gupta wrote:
Hi Sergei, I understand that having a cursor like syntax we would have to write some new code . What I would have liked to know is that the more details should be discussed which need to be incorporated in the implementation of the aggregate functions. Basically what I am asking is what all details have to be there in the implementation of the aggregate functions.
I don't think I understand your question. But anyway to implement aggregate stored functions one would need 1. fix the parser to support the new syntax 2. fix the saving/loading code to store additional attributes (like, store the fact that the function is aggregate) 3. fix stored function execution code to be able to pause and resume the execution 4. create an Item_sum_sp to be able to use aggregate stored functions in queries. optional: 5. add support for window functions Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hi Sergei, Thanks that answers my question, I will soon get back with questions . On Sun, Mar 6, 2016 at 2:52 PM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Varun!
On Mar 06, Varun Gupta wrote:
Hi Sergei, I understand that having a cursor like syntax we would have to write some new code . What I would have liked to know is that the more details should be discussed which need to be incorporated in the implementation of the aggregate functions. Basically what I am asking is what all details have to be there in the implementation of the aggregate functions.
I don't think I understand your question. But anyway to implement aggregate stored functions one would need 1. fix the parser to support the new syntax 2. fix the saving/loading code to store additional attributes (like, store the fact that the function is aggregate) 3. fix stored function execution code to be able to pause and resume the execution 4. create an Item_sum_sp to be able to use aggregate stored functions in queries.
optional:
5. add support for window functions
Regards, Sergei Chief Architect MariaDB and security@mariadb.org
Hi Sergei, I have gone through the code you suggested. I have got a good understanding of what the code actually does. So now I wanted to start writing the proposal for the idea. Can you tell me what all is needed to written in the proposal. I am asking for a template or something that could guide me to write the proposal. Regards, Varun
Hi, Varun! On Mar 19, Varun Gupta wrote:
Hi Sergei, I have gone through the code you suggested. I have got a good understanding of what the code actually does. So now I wanted to start writing the proposal for the idea. Can you tell me what all is needed to written in the proposal. I am asking for a template or something that could guide me to write the proposal.
We don't have a template, but other organizations too, google "gsoc student template". Also there are many tutorials and guides about writing a proposal. And, of course, there's the official GSoC student manual: http://write.flossmanuals.net/gsocstudentguide/what-is-google-summer-of-code... (it's also linked from the main GSoC site, link "HELP" at the bottom). Regards, Sergei Chief Architect MariaDB and security@mariadb.org
participants (2)
-
Sergei Golubchik
-
Varun Gupta