[Maria-discuss] The future of Dynamic Columns
I've been excited about the potential uses of Maria's Dynamic Columns (MDC) since I first read about the them. Now I am starting a new application development and want to use MDC. It's a server-side PHP webapp using the Yii 2.0 MVC framework. These days, for typical webapp pages, I seldom write SQL and use the ORM instead. So I started writing an extension to the Yii 2.0 Active Record ORM to provide an abstract interface to read and write structured dynamic attributes as well as use them in queries. I'm trying to do it so that I can make the same API work using PostgreSQL jsonb and perhaps other things. While this is very interesting work, I have developed a vague uncomfortable feeling about MDC. I have the sense that this wicked powerful feature is suffering from atrophy and possibly apathy. Contributing to this sense: 1. I haven't been able to discuss MDC with anyone who uses it. (If you do, contacts below). 2. In at least two places (one was here <https://mariadb.com/kb/en/mariadb/indexoptimizations-in-dynamic-columns/>) I've read text with the implied meaning, "we may improve such-and-such about MDC if and when people start actually using it." 3. The MDC syntax is so clunky I worry that people who might find it useful are turned away. 4. Did the 5.3 version with numbers for column names turn people off MDC for good? 5. The requirement to specify datatype when querying a dynamic column is agonizing. The manual says it is a limitation of SQL itself (presumably immutable) but PostgreSQL manages to avoid it when querying jsonb doc properties. 6. The lack of indexing of dynamic columns is a severe limitation but PostgreSQL manages to index jsonb document properties. And then... 7. I head a rumor that some kind of JSON features are coming to Maria. So, do Maria's Dynamic Columns have a future? Are my fears that MDC will be neglected as some better solution to the same problem receives all the attention? Basically, I'm trying to gauge the risk vs. return in my investment in developing the ORM extension I mentioned at the top. Tom fsb@thefsb.org tom[] on Freenode
i use dynamic columns without problem, it don't have a good future cause it's only a lot of function to encode/decode data, in other words, the future is the today present + some new features to easly handle data or optimize space or optimize cpu/memory use make sure you are using a good encode system, dynamic columns is a encode system, like json, like php encode/decode, like base64 encode/decode, but with some pros/cons when using a encode system, check if cpu and data size used is ok to your problem, for example a "1024" integer, can be saved with a 4 bytes string, or a 2 bytes integer, if you have 1.000.000 rows, you can save 2.000.000 bytes (allowing a better cpu/memory use), think about encoding a 64bits integer with string, and with a bigint column, use it with 1M rows.... the problem of encoding data, isn't only encode, the problem is how database search this encoded values we don't have index over functions and optimizer don't rewrites virtual column and functions (yet), that's probably something that document based databases do very well something like ~ search inside documents --- some ideas.... 1) specific index idea we could have a dynamic columns index, instead of a btree index, something like full text indexes do... 2) general idea if we could have a "index( some_function(field) )", and could optimize "some_function(field) ='xxxx'" using the index, this could solve many problems of dynamic column and encoding data at database, with more complexity we could include some index optimizations to functions and "rewrite" the queries examples.... 2.1) the optimize should be inteligent to rewrite virtual columns using the same function we have a virtual_column_indexed = substring(column,1,20), let's execute: WHERE substring(column,1,20)='1234', <- full table scan + execute a function each row (is substring cpu/memory intensive?) optimizer should rewrite to WHERE virtual_column_indexed = '1234' <- using index without executing the substring function each row (less cpu/memory used) or if cpu time expent with substring function is very high and we have a virtual_column_not_index=substring(column,1,20), optimizer could rewrite to WHERE virtual_column_not_indexed = '1234' <- ful table scan - without executing SUBSTRING function each row 2.2) the functions should be indexed (only deterministic functions) example when we create a index over a function, we don`t need a new column, think about myisam storage... we don't need data at MYD file, we only need data at MYI file index functions ~= "indexed hidden virtual columns" when we execute WHERE substring(column,1,20)='1234' <- full table scan + cpu/memory use of substring function for each row optimizer should rewrite to WHERE internal_hidden_column = '1234' <- internal index 2.3) the functions could explain about others rewrite tips example... when we execute WHERE substring(any_column,1,20)='1234' we can rewrite it with some understand of substring function (value,start,end) we know that it return the same string, but starting at start position, and with a possible length of end-start, in other words, if we want to 'optimize' (or not) this function, we could rewrite to WHERE any_column LIKE '1234%' AND <-- (1) possible a column using index? length(any_column)<=20 AND <-- (2) length is 'faster' than compare strings substring(any_column,1,20)='1234' <-- (3) include the function to execute the last check if (1) and (2) are true check if we have many negative cases to (3), we can optimize some searchs with (1) and (2) if any_column is a index column, we have a optimization at LIKE operator (really fast) but if we have many positive cases to (1,2) we just include more cpu time to execute two functions, in this case only substring could work faster (cause it use less cpu than 3 functions) the same for WHERE substring(any_column,5,20)='1234' could be rewrite to WHERE any_column LIKE '_____1234%' AND length(any_column)<=20 AND substring(any_column,5,20)='1234' 2.4)if we have a very cpu intensive function, the SELECT part could be optimized too SELECT EXPENSIVE_FUNCTION() FROM table WHERE single where today we can optimize it with virtual columns... virtaul column = EXPENSIVE_FUNCTION() and instead of SELECT ENPENSIVE_FUNCTION we write SELECT virtual_column but since optimizer is inteligent, why it can't rewrite the query it self? ---- check that's not a limit of dynamic columns, only a limit of optimizer/index and rewrite function to search faster when using functions i don't know if we could include others ideas to tune/optimize database
On 12/26/14 I posted here I posted the question to this mailing list: *Dynamic columns support in libmaria client library?* *This page https://mariadb.com/kb/en/mariadb/documentation/nosql/dynamic-columns-api/ <https://mariadb.com/kb/en/mariadb/documentation/nosql/dynamic-columns-api/> does not mention anything about it. it just says* *The API is a part of libmysql C client library. In order to use it, one needs to include this header file* *#include <mysql/ma_dyncol.h>* *and link against libmysql.* *Now can ma_dyncol.h be linked against libmaria as well?* *There are 2 questions actually* *1) technical: will it work with libmaria? * *2) legal: can a client program using libmaria under LGPL license include ma_dyncol.h?* .. no reply till yet. Very bad! :-( :-( -- Peter -- Webyog On Tue, Jan 27, 2015 at 5:44 PM, Roberto Spadim <roberto@spadim.com.br> wrote:
i use dynamic columns without problem, it don't have a good future cause it's only a lot of function to encode/decode data, in other words, the future is the today present + some new features to easly handle data or optimize space or optimize cpu/memory use
make sure you are using a good encode system, dynamic columns is a encode system, like json, like php encode/decode, like base64 encode/decode, but with some pros/cons
when using a encode system, check if cpu and data size used is ok to your problem, for example a "1024" integer, can be saved with a 4 bytes string, or a 2 bytes integer, if you have 1.000.000 rows, you can save 2.000.000 bytes (allowing a better cpu/memory use), think about encoding a 64bits integer with string, and with a bigint column, use it with 1M rows....
the problem of encoding data, isn't only encode, the problem is how database search this encoded values we don't have index over functions and optimizer don't rewrites virtual column and functions (yet), that's probably something that document based databases do very well something like ~ search inside documents
--- some ideas....
1) specific index idea we could have a dynamic columns index, instead of a btree index, something like full text indexes do...
2) general idea if we could have a "index( some_function(field) )", and could optimize "some_function(field) ='xxxx'" using the index, this could solve many problems of dynamic column and encoding data at database, with more complexity we could include some index optimizations to functions and "rewrite" the queries
examples.... 2.1) the optimize should be inteligent to rewrite virtual columns using the same function
we have a virtual_column_indexed = substring(column,1,20), let's execute:
WHERE substring(column,1,20)='1234', <- full table scan + execute a function each row (is substring cpu/memory intensive?)
optimizer should rewrite to
WHERE virtual_column_indexed = '1234' <- using index without executing the substring function each row (less cpu/memory used)
or if cpu time expent with substring function is very high and we have a virtual_column_not_index=substring(column,1,20), optimizer could rewrite to
WHERE virtual_column_not_indexed = '1234' <- ful table scan - without executing SUBSTRING function each row
2.2) the functions should be indexed (only deterministic functions) example when we create a index over a function, we don`t need a new column, think about myisam storage... we don't need data at MYD file, we only need data at MYI file
index functions ~= "indexed hidden virtual columns" when we execute
WHERE substring(column,1,20)='1234' <- full table scan + cpu/memory use of substring function for each row
optimizer should rewrite to
WHERE internal_hidden_column = '1234' <- internal index
2.3) the functions could explain about others rewrite tips example... when we execute WHERE substring(any_column,1,20)='1234'
we can rewrite it with some understand of substring function (value,start,end)
we know that it return the same string, but starting at start position, and with a possible length of end-start, in other words, if we want to 'optimize' (or not) this function, we could rewrite to
WHERE any_column LIKE '1234%' AND <-- (1) possible a column using index? length(any_column)<=20 AND <-- (2) length is 'faster' than compare strings substring(any_column,1,20)='1234' <-- (3) include the function to execute the last check if (1) and (2) are true
check if we have many negative cases to (3), we can optimize some searchs with (1) and (2) if any_column is a index column, we have a optimization at LIKE operator (really fast) but if we have many positive cases to (1,2) we just include more cpu time to execute two functions, in this case only substring could work faster (cause it use less cpu than 3 functions)
the same for WHERE substring(any_column,5,20)='1234'
could be rewrite to
WHERE any_column LIKE '_____1234%' AND length(any_column)<=20 AND substring(any_column,5,20)='1234'
2.4)if we have a very cpu intensive function, the SELECT part could be optimized too
SELECT EXPENSIVE_FUNCTION() FROM table WHERE single where
today we can optimize it with virtual columns... virtaul column = EXPENSIVE_FUNCTION() and instead of SELECT ENPENSIVE_FUNCTION we write SELECT virtual_column
but since optimizer is inteligent, why it can't rewrite the query it self?
----
check that's not a limit of dynamic columns, only a limit of optimizer/index and rewrite function to search faster when using functions
i don't know if we could include others ideas to tune/optimize database
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi! On 27.01.15 16:39, Tom Worster wrote: [skip]
2. In at least two places (one was here <https://mariadb.com/kb/en/mariadb/indexoptimizations-in-dynamic-columns/>) I've read text with the implied meaning, "we may improve such-and-such about MDC if and when people start actually using it."
True. We can't develop features if there is no money or interest in them. We have some plans but have to be sure before implementing them.
3. The MDC syntax is so clunky I worry that people who might find it useful are turned away.
It is just functions there is nothing about additional syntax.
6. The lack of indexing of dynamic columns is a severe limitation but PostgreSQL manages to index jsonb document properties.
There is plans about functional indexes for dynamic columns and for virtual columns but plans should be supported by money or people.
And then...
7. I head a rumor that some kind of JSON features are coming to Maria.
All plans I know about JSON connected to dynamic columns (as a storage for JSON). But I have to say that plans have no dates and versions yet.
So, do Maria's Dynamic Columns have a future?
Are my fears that MDC will be neglected as some better solution to the same problem receives all the attention? If you or somebody else make something better for MariaDB then yes it is
Yes. possible that Dynamic columns can be neglected, but there is no such threat now. [skip]
Hi Oleksandr, Thank you for the response. It helps me make my decision by narrowing the options. I cannot help with money. My company doesn't have any (really! our market is non-commercial radio stations) but if the ORM extension turns out well then perhaps it helps some more people to use MDC. But I am a little sad to have confirmation that MDC will receive some love if and only if interest/ROI can be proven. It's weird. To webapp developers such as me (and I hang out on IRC with lots of others) MDC provides an excellent solution to an important and difficult problem that a lot of us face quite frequently. It provides what is effectively a hybrid of a relational SQL DBMS with a structured document store -- exactly the kind of hybrid we need for things like user or client profiles, account configurations, product tables and other such everyday things that are hard to map to tables. So why the lack of interest? I think this is a technical marketing problem. I can try to do my bit from my little corner of a branch of the web dev community if I can make a decent job with the ORM extension I'm working on. If successful, maybe some popular ORMs will follow the idea. Could the Maria community maybe do more to "sell" MDC as it currently exists? I really don't know. But nicely written examples, howtos and case studies are often a good way to promulgate new stuff like this. For what it's worth, here's my work at present https://github.com/tom--/dynamic-ar . I share it for the README which documents motivation, which may be of interest here. The implementation is very early, rough and wrong. Tom On 1/27/15, 5:18 PM, "Oleksandr Byelkin" <sanja@montyprogram.com> wrote:
Hi!
On 27.01.15 16:39, Tom Worster wrote: [skip]
2. In at least two places (one was here
<https://mariadb.com/kb/en/mariadb/indexoptimizations-in-dynamic-columns/
) I've read text with the implied meaning, "we may improve such-and-such about MDC if and when people start actually using it."
True. We can't develop features if there is no money or interest in them. We have some plans but have to be sure before implementing them.
3. The MDC syntax is so clunky I worry that people who might find it useful are turned away.
It is just functions there is nothing about additional syntax.
6. The lack of indexing of dynamic columns is a severe limitation but PostgreSQL manages to index jsonb document properties.
There is plans about functional indexes for dynamic columns and for virtual columns but plans should be supported by money or people.
And then...
7. I head a rumor that some kind of JSON features are coming to Maria.
All plans I know about JSON connected to dynamic columns (as a storage for JSON). But I have to say that plans have no dates and versions yet.
So, do Maria's Dynamic Columns have a future?
Are my fears that MDC will be neglected as some better solution to the same problem receives all the attention? If you or somebody else make something better for MariaDB then yes it is
Yes. possible that Dynamic columns can be neglected, but there is no such threat now.
[skip]
Hi, Tom! On 28.01.15 18:04, Tom Worster wrote:
Hi Oleksandr,
Thank you for the response. It helps me make my decision by narrowing the options.
I cannot help with money. My company doesn't have any (really! our market is non-commercial radio stations) but if the ORM extension turns out well then perhaps it helps some more people to use MDC. Money is good but ideas are also counted. :)
But I am a little sad to have confirmation that MDC will receive some love if and only if interest/ROI can be proven.
It's weird. To webapp developers such as me (and I hang out on IRC with lots of others) MDC provides an excellent solution to an important and difficult problem that a lot of us face quite frequently. It provides what is effectively a hybrid of a relational SQL DBMS with a structured document store -- exactly the kind of hybrid we need for things like user or client profiles, account configurations, product tables and other such everyday things that are hard to map to tables. So why the lack of interest? I think this is a technical marketing problem. If there is an interest it somehow should show itself, and this letter actually also counted. But also ideas and request of what really people lack or what they need is really appreciated.
I can try to do my bit from my little corner of a branch of the web dev community if I can make a decent job with the ORM extension I'm working on. If successful, maybe some popular ORMs will follow the idea.
Could the Maria community maybe do more to "sell" MDC as it currently exists? I really don't know. But nicely written examples, howtos and case studies are often a good way to promulgate new stuff like this.
For what it's worth, here's my work at present https://github.com/tom--/dynamic-ar . I share it for the README which documents motivation, which may be of interest here. The implementation is very early, rough and wrong.
Thank you a lot, that is what I meant in the previous paragraph! [skip]
Hi, Tom! On Jan 27, Tom Worster wrote:
2. In at least two places (one was here <https://mariadb.com/kb/en/mariadb/indexoptimizations-in-dynamic-columns/>) I've read text with the implied meaning, "we may improve such-and-such about MDC if and when people start actually using it."
While I know that there are users of dynamic columns, I don't remember seeing any feature requests for them. Okay, there was one, for JSON-like arrays, but it depends on having array as a data type, like in regular columns. Not a standalone feature for dynamic columns that we can just go no and implement.
3. The MDC syntax is so clunky I worry that people who might find it useful are turned away.
That could as well be. And it's bad. What is worse - they're turned away without complaining so we don't know what to do to fix dynamic columns. Any suggestions how we can make the syntax less clunky?
4. Did the 5.3 version with numbers for column names turn people off MDC for good?
As far as I have heard, those people who wanted dynamic columns in the firts place found the implementation with numbers quite sufficient for their needs. But new users probably didn't, I know :)
5. The requirement to specify datatype when querying a dynamic column is agonizing. The manual says it is a limitation of SQL itself (presumably immutable) but PostgreSQL manages to avoid it when querying jsonb doc properties.
We could workaround this requirement by assuming everything is a string, for example. This can be implemented, if that's what users want.
6. The lack of indexing of dynamic columns is a severe limitation but PostgreSQL manages to index jsonb document properties.
Right, I know. We've heard this feature request a few times already. That's why - no promises! - this feature is currently being discussed for 10.2.
7. I head a rumor that some kind of JSON features are coming to Maria.
I wonder what exactly you've heard and where :) There are many different ways of implementing "JSON features", some of them actually use dynamic columns as the underlying storage format. Currently there is no well defined JSON task in mariadb.org/jira, so nobody knows what exactly will be implemented yet. But now is the time to affect the 10.2 roadmap. What "JSON features" would you like to see in MariaDB? Regards, Sergei
i will give some ideas , some could be stupid but that's just ideas... 1)could be nice a json language as data results, for example SELECT {'some array':'some value',column1:column2} AS column1 FROM some_table the result of {} is a string utf8, but well know as json format, the sql server just interpret it and return to client as string, at client side we can include a json feature to auto translate or not, internally we could use MDC format, for example... 2)SELECT dynamic_column AS column1 <- that's the main problem FROM some_table it could return the json format of dynamic column how? i don't know..., maybe include a flag at column (CREATE TABLE) to check if the column value is dynamic column or not? or at client side, a funtion to translate it, check that in this example MDC format to client side isn't a "good" idea (since JSON is human readable), client side know better how to use json, but maybe at storage level MDC is smaller than json, must think about it... 3) INSERT INTO table (string_column) VALUES ({'some_array':'some value'}); in this example {} is auto translated to MDC others problems.... WHERE column = {'a':'b','c':'d'} <- auto translate to MDC string, check that we will have problems since: {'c':'d','a':'b'}!=={'a':'b','c':'d'}, maybe we should include a new operator? === <- identical, ==/= <- equal ? the diference is object keys order 4)WHERE column LIKE {'some_array':'abc'} <- what should be done? jpath search?! maybe we should include a new function "SEARCH" WHERE column SEARCH {'some_array':'abc'}? 2015-01-28 13:16 GMT-02:00 Sergei Golubchik <serg@mariadb.org>:
Hi, Tom!
On Jan 27, Tom Worster wrote:
2. In at least two places (one was here < https://mariadb.com/kb/en/mariadb/indexoptimizations-in-dynamic-columns/>) I've read text with the implied meaning, "we may improve such-and-such about MDC if and when people start actually using it."
While I know that there are users of dynamic columns, I don't remember seeing any feature requests for them.
Okay, there was one, for JSON-like arrays, but it depends on having array as a data type, like in regular columns. Not a standalone feature for dynamic columns that we can just go no and implement.
3. The MDC syntax is so clunky I worry that people who might find it useful are turned away.
That could as well be. And it's bad. What is worse - they're turned away without complaining so we don't know what to do to fix dynamic columns.
Any suggestions how we can make the syntax less clunky?
4. Did the 5.3 version with numbers for column names turn people off MDC for good?
As far as I have heard, those people who wanted dynamic columns in the firts place found the implementation with numbers quite sufficient for their needs. But new users probably didn't, I know :)
5. The requirement to specify datatype when querying a dynamic column is agonizing. The manual says it is a limitation of SQL itself (presumably immutable) but PostgreSQL manages to avoid it when querying jsonb doc properties.
We could workaround this requirement by assuming everything is a string, for example. This can be implemented, if that's what users want.
6. The lack of indexing of dynamic columns is a severe limitation but PostgreSQL manages to index jsonb document properties.
Right, I know. We've heard this feature request a few times already. That's why - no promises! - this feature is currently being discussed for 10.2.
7. I head a rumor that some kind of JSON features are coming to Maria.
I wonder what exactly you've heard and where :) There are many different ways of implementing "JSON features", some of them actually use dynamic columns as the underlying storage format.
Currently there is no well defined JSON task in mariadb.org/jira, so nobody knows what exactly will be implemented yet.
But now is the time to affect the 10.2 roadmap. What "JSON features" would you like to see in MariaDB?
Regards, Sergei
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
-- Roberto Spadim SPAEmpresarial - Software ERP Eng. Automação e Controle
This discussion reminds me of 6-8 years back when XML was introduced in MySQL ("LOAD DATA XML, "mysqldump --xml", XML functions (ExtractValue etc.). It was a *big hype* at the time it was introduced. Now it seems to be an almost forgotten feature used by close_to_nobody. What should a JSON option be used for? Passing data series to javascript/HTML5 would be relevant (?). But there are probably many more use cases - including sharing data with NoSQL systems (MongoDB etc.). And each use case may have its own requirements. I think the first thing to define is the*audience* and *target*/*purpose* for such feature and discuss external interfaces rather than internals at this point. -- Peter On Wed, Jan 28, 2015 at 4:47 PM, Roberto Spadim <roberto@spadim.com.br> wrote:
i will give some ideas , some could be stupid but that's just ideas...
1)could be nice a json language as data results, for example SELECT {'some array':'some value',column1:column2} AS column1 FROM some_table
the result of {} is a string utf8, but well know as json format, the sql server just interpret it and return to client as string, at client side we can include a json feature to auto translate or not, internally we could use MDC format, for example...
2)SELECT dynamic_column AS column1 <- that's the main problem FROM some_table
it could return the json format of dynamic column how? i don't know..., maybe include a flag at column (CREATE TABLE) to check if the column value is dynamic column or not? or at client side, a funtion to translate it, check that in this example MDC format to client side isn't a "good" idea (since JSON is human readable), client side know better how to use json, but maybe at storage level MDC is smaller than json, must think about it...
3) INSERT INTO table (string_column) VALUES ({'some_array':'some value'});
in this example {} is auto translated to MDC
others problems....
WHERE column = {'a':'b','c':'d'} <- auto translate to MDC string, check that we will have problems since: {'c':'d','a':'b'}!=={'a':'b','c':'d'}, maybe we should include a new operator? === <- identical, ==/= <- equal ? the diference is object keys order
4)WHERE column LIKE {'some_array':'abc'} <- what should be done? jpath search?! maybe we should include a new function "SEARCH" WHERE column SEARCH {'some_array':'abc'}?
2015-01-28 13:16 GMT-02:00 Sergei Golubchik <serg@mariadb.org>:
Hi, Tom!
On Jan 27, Tom Worster wrote:
2. In at least two places (one was here < https://mariadb.com/kb/en/mariadb/indexoptimizations-in-dynamic-columns/ ) I've read text with the implied meaning, "we may improve such-and-such about MDC if and when people start actually using it."
While I know that there are users of dynamic columns, I don't remember seeing any feature requests for them.
Okay, there was one, for JSON-like arrays, but it depends on having array as a data type, like in regular columns. Not a standalone feature for dynamic columns that we can just go no and implement.
3. The MDC syntax is so clunky I worry that people who might find it useful are turned away.
That could as well be. And it's bad. What is worse - they're turned away without complaining so we don't know what to do to fix dynamic columns.
Any suggestions how we can make the syntax less clunky?
4. Did the 5.3 version with numbers for column names turn people off MDC for good?
As far as I have heard, those people who wanted dynamic columns in the firts place found the implementation with numbers quite sufficient for their needs. But new users probably didn't, I know :)
5. The requirement to specify datatype when querying a dynamic column is agonizing. The manual says it is a limitation of SQL itself (presumably immutable) but PostgreSQL manages to avoid it when querying jsonb doc properties.
We could workaround this requirement by assuming everything is a string, for example. This can be implemented, if that's what users want.
6. The lack of indexing of dynamic columns is a severe limitation but PostgreSQL manages to index jsonb document properties.
Right, I know. We've heard this feature request a few times already. That's why - no promises! - this feature is currently being discussed for 10.2.
7. I head a rumor that some kind of JSON features are coming to Maria.
I wonder what exactly you've heard and where :) There are many different ways of implementing "JSON features", some of them actually use dynamic columns as the underlying storage format.
Currently there is no well defined JSON task in mariadb.org/jira, so nobody knows what exactly will be implemented yet.
But now is the time to affect the 10.2 roadmap. What "JSON features" would you like to see in MariaDB?
Regards, Sergei
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
-- Roberto Spadim SPAEmpresarial - Software ERP Eng. Automação e Controle
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
some points 2015-01-28 14:06 GMT-02:00 Peter Laursen <peter_laursen@webyog.com>:
This discussion reminds me of 6-8 years back when XML was introduced in MySQL ("LOAD DATA XML, "mysqldump --xml", XML functions (ExtractValue etc.). It was a *big hype* at the time it was introduced. Now it seems to be an almost forgotten feature used by close_to_nobody.
yeap i remember this, but this is now past... we have CONNECT storage engine =) working with xml and json =)
What should a JSON option be used for?
i think it's a nice human readable format for arrays/objects, nothing less nothing more, instead of ARRAY('blablabl') OBJECT('blablalba'), a easier to interpret language is ['blablabla'] {'blablabla'}, internally i don't know if this will include a very very big work at parser Passing data series to javascript/HTML5 would be relevant (?).
i think not..., i use dyanmic columns with historic data log, example: create table history(data_time datetime not null, values blob, primary key (date_time)); at values i insert data with dynamic columns, instead of create something like create table history(data_time datetime not null, key varchar(255) not null, value blob, primary key (date_time,key)); i don't use dynamic columns to document search, cause we don't have index, when i need it at log, i create a specific table, or specific column create table history(data_time datetime not null, key_1 varchar(255) not null, key_2 varchar(255), primary key (date_time), index(key_1,date_time)); But there are probably many more use cases - including sharing data with
NoSQL systems (MongoDB etc.).
yeap, but i think it's more related to storage engines... maybe just the array/object syntax could be nice to support at SQL language
And each use case may have its own requirements. I think the first thing to define is the*audience* and *target*/*purpose* for such feature and discuss external interfaces rather than internals at this point.
agree
-- Peter
-- Roberto Spadim SPAEmpresarial - Software ERP Eng. Automação e Controle
Hi Sergei, Your reply complements Oleksandr's and is welcome. Your remarks about tepid user response to Dynamic Columns makes me want to reiterate the comments about better marketing my reply to Oleksandr. But you make even more clear that the flip side is very important. Users need to show their love. I'll forge ahead with the ORM extension and perhaps we can generate some users in the Yii community. Maybe then we can write up something for mariadb.com. Maybe we can even make Rails and Doctrine users covet:-) Some more responses inline below.. On 1/28/15, 10:16 AM, "Sergei Golubchik" <serg@mariadb.org> wrote:
That could as well be. And it's bad. What is worse - they're turned away without complaining so we don't know what to do to fix dynamic columns.
Any suggestions how we can make the syntax less clunky?
I had to invent a "dynamic attribute token" notation for the ORM to make it work in general. It might be of interest: https://github.com/tom--/dynamic-ar#dynamicactivequery For the SQL, what PostgreSQL did looks reasonable, although I haven't used it in anger yet. Here's a brief intro: http://clarkdave.net/2013/06/what-can-you-do-with-postgresql-and-json/
5. The requirement to specify datatype when querying a dynamic column is agonizing. The manual says it is a limitation of SQL itself (presumably immutable) but PostgreSQL manages to avoid it when querying jsonb doc properties.
We could workaround this requirement by assuming everything is a string, for example. This can be implemented, if that's what users want.
This is better than nothing but perhaps doesn't add much. In the context of a webapp CHAR will very often be the correct choice. In my design, I use CHAR as default unless specified otherwise, as described in the README I linked above. Another possibility is to provide a function that returns a string that is an SQL representation of the dynamic column's value and its datatype. It would always involve a cast. If you were to use this string verbatim in an insert, you'd get exactly the same thing. The DB abstraction layer has the option map datatypes.
7. I head a rumor that some kind of JSON features are coming to Maria.
I wonder what exactly you've heard and where :)
On Jan 24th, I was told "json support is in the road map" in #maria on Freenode. Nothing more.
But now is the time to affect the 10.2 roadmap. What "JSON features" would you like to see in MariaDB?
None! Not if I'm going to be working on this :P Tom
i think a good start is "expression indexes" and after json syntax to SQL language, and some new functions to handle json/etc CREATE INDEX name ON table ( some_function(field) ) maybe dynamic column functions, or json functions? or any function deterministic (for example, NOW() function can't be used) and use it at WHERE clausules WHERE some_function(field) ='1234' <- use index WHERE some_function(table.field) ='1234' WHERE some_function(field) IN ( sub select ) WHERE some_function(field) LIKE "blabla%" WHERE some_function(field) LIKE "%blabla%" WHERE some_function(field) LIKE "%blabla"
@Roberto. A comment to "i think it's a nice human readable format for arrays/objects, nothing less nothing more". But why will you then need to **store** as JSON? Could not something like "SELECT .. INTO JSON ..." do the trick? (with hindsight: "SELECT .. INTO XML ..." could have been implemented in MySQL when other XML functionalities were implemented but never was, it seems. You can mysqldump to XML but not "SELECT .. INTO XML ..." - using OUTFILE or not - refer http://dev.mysql.com/doc/refman/5.6/en/select-into.html) -- Peter On Wed, Jan 28, 2015 at 7:29 PM, Roberto Spadim <roberto@spadim.com.br> wrote:
i think a good start is "expression indexes" and after json syntax to SQL language, and some new functions to handle json/etc
CREATE INDEX name ON table ( some_function(field) ) maybe dynamic column functions, or json functions? or any function deterministic (for example, NOW() function can't be used)
and use it at WHERE clausules
WHERE some_function(field) ='1234' <- use index WHERE some_function(table.field) ='1234' WHERE some_function(field) IN ( sub select ) WHERE some_function(field) LIKE "blabla%" WHERE some_function(field) LIKE "%blabla%" WHERE some_function(field) LIKE "%blabla"
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
While functions for getting and setting via json might be useful, I think they are peripheral to the core datatype problem here. The interface to dynamic columns in Maria is SQL. The internal datatypes are SQLish. We can be completely sure of what was set, in SQL terms, by a COLUMN_CREATE(). But if we use a JSON getter then we might get something different because JSON datatypes don't match SQL datatypes. Sometimes it's no problem but not in general -- depends on the datatypes involved and the application's tolerance of differences. What's missing is a getter that tells us the value and datatype of a dynamic column, a getter that tells us what was set. What if we had a getter that returns a string like: COLUMN_GET_SQL(`product`, 'price')
"CAST(123.456 AS DECIMAL(11,3))"
And Maria promises us that we could use that expression to write the value back and the result will be the exact same thing. A sort of idempotency promise (kinda). This doesn't make the interface any less clunky (on the contrary) but I have resigned myself to a clunky interface. The only alternative is to forget all this and start something different. SQL is an insanely clunky old language and that's why we use some kind of abstraction software to make it nicer. If we're going to use SQL we may as well try to use what's there. Something like this COLUMN_GET_SQL() would at least allow the application the possibility to get back what it put in. Tom From: Peter Laursen <peter_laursen@webyog.com> Date: Wednesday, January 28, 2015 at 4:25 PM To: Roberto Spadim <roberto@spadim.com.br> Cc: Tom Worster <fsb@thefsb.org>, maria-discuss email list <maria-discuss@lists.launchpad.net> Subject: Re: [Maria-discuss] The future of Dynamic Columns @Roberto. A comment to "i think it's a nice human readable format for arrays/objects, nothing less nothing more". But why will you then need to *store* as JSON? Could not something like "SELECT .. INTO JSON ..." do the trick? (with hindsight: "SELECT .. INTO XML ..." could have been implemented in MySQL when other XML functionalities were implemented but never was, it seems. You can mysqldump to XML but not "SELECT .. INTO XML ..." - using OUTFILE or not - refer http://dev.mysql.com/doc/refman/5.6/en/select-into.html) -- Peter On Wed, Jan 28, 2015 at 7:29 PM, Roberto Spadim <roberto@spadim.com.br> wrote:
i think a good start is "expression indexes" and after json syntax to SQL language, and some new functions to handle json/etc
CREATE INDEX name ON table ( some_function(field) ) maybe dynamic column functions, or json functions? or any function deterministic (for example, NOW() function can't be used)
and use it at WHERE clausules
WHERE some_function(field) ='1234' <- use index WHERE some_function(table.field) ='1234' WHERE some_function(field) IN ( sub select ) WHERE some_function(field) LIKE "blabla%" WHERE some_function(field) LIKE "%blabla%" WHERE some_function(field) LIKE "%blabla"
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi, Tom! On 28.01.15 23:01, Tom Worster wrote:
While functions for getting and setting via json might be useful, I think they are peripheral to the core datatype problem here.
The interface to dynamic columns in Maria is SQL. The internal datatypes are SQLish. We can be completely sure of what was set, in SQL terms, by a COLUMN_CREATE(). But if we use a JSON getter then we might get something different because JSON datatypes don't match SQL datatypes. Sometimes it's no problem but not in general -- depends on the datatypes involved and the application's tolerance of differences. AFAIK we lack only boolean type and arrays (could be emulated now but you can't say if it is array or object with numeric keys) to fit json types.
column_create/alter do not require type (detect automatically), so type there can be mentioned to enforce some encoding you want (store dates as dates but not string) column_get() require type of output, if you do not care use CHAR. But this is required by SQL expression implementation. There is json output which do not need type of columns.
What's missing is a getter that tells us the value and datatype of a dynamic column, a getter that tells us what was set.
What if we had a getter that returns a string like:
COLUMN_GET_SQL(`product`, 'price')
"CAST(123.456 AS DECIMAL(11,3))"
And Maria promises us that we could use that expression to write the value back and the result will be the exact same thing. A sort of idempotency promise (kinda).
This doesn't make the interface any less clunky (on the contrary) but I have resigned myself to a clunky interface. The only alternative is to forget all this and start something different. SQL is an insanely clunky old language and that's why we use some kind of abstraction software to make it nicer. If we're going to use SQL we may as well try to use what's there.
Something like this COLUMN_GET_SQL() would at least allow the application the possibility to get back what it put in.
I am not sure that I understand what you need. Is it ability to know internal storage type? [skip]
On 1/28/15, 5:41 PM, "Oleksandr Byelkin" <sanja@montyprogram.com> wrote:
Hi, Tom!
On 28.01.15 23:01, Tom Worster wrote:
While functions for getting and setting via json might be useful, I think they are peripheral to the core datatype problem here.
The interface to dynamic columns in Maria is SQL. The internal datatypes are SQLish. We can be completely sure of what was set, in SQL terms, by a COLUMN_CREATE(). But if we use a JSON getter then we might get something different because JSON datatypes don't match SQL datatypes. Sometimes it's no problem but not in general -- depends on the datatypes involved and the application's tolerance of differences. AFAIK we lack only boolean type and arrays (could be emulated now but you can't say if it is array or object with numeric keys) to fit json types.
Numeric types don't always match very well. JSON, like JavaScript, has only one numeric type: number. Unfortunately it is the wrong type: float. In the JSON spec it is decimal with an exponent but in real life is nearly always an IEEE double on at least one side of the serialization. When you do exact numerical work, that's not acceptable. I suppose the database server has fulfilled its obligation if the decimal JSON representation of the number is, on read, an exactly equivalent representation of the number literal that was in the corresponding COLUMN_CREATE() expression.
column_create/alter do not require type (detect automatically), so type there can be mentioned to enforce some encoding you want (store dates as dates but not string)
Yes.
column_get() require type of output, if you do not care use CHAR. But this is required by SQL expression implementation. There is json output which do not need type of columns.
Yes.
What's missing is a getter that tells us the value and datatype of a dynamic column, a getter that tells us what was set.
What if we had a getter that returns a string like:
COLUMN_GET_SQL(`product`, 'price')
"CAST(123.456 AS DECIMAL(11,3))"
And Maria promises us that we could use that expression to write the value back and the result will be the exact same thing. A sort of idempotency promise (kinda).
This doesn't make the interface any less clunky (on the contrary) but I have resigned myself to a clunky interface. The only alternative is to forget all this and start something different. SQL is an insanely clunky old language and that's why we use some kind of abstraction software to make it nicer. If we're going to use SQL we may as well try to use what's there.
Something like this COLUMN_GET_SQL() would at least allow the application the possibility to get back what it put in.
I am not sure that I understand what you need. Is it ability to know internal storage type?
No. When I use COLUMN_CREATE('x', ___ ) I provide in the ___ a literal that has, in its SQL context, both a value and a datatype. So I have a way to put something into the database that I cannot read back out. COLUMN_CREATE() is the SQL way to deposit (value, datatype) into the database. There is no corresponding way SQL way to make the withdrawal of (value, datatype). My proposal was aimed at resolving that. My understanding is that we have to fix the datatype of what will be returned on a reading before any data is read. So we have been talking about just fixing it as string and encode the (value, datatype) tuple into it. OK. But why a JavaScript literal? In documenting my ORM stuff, I wrote "In Maria the situation is worse. Data is saved via SQL and retrieved via JSON. This may seem perverse but..." And I end up with a lifecycle for a model in the ORM that's just strange: Start in PHP machine encoding. Convert to SQL and send to DB. Request JSON representation of the data from the DB. Convert the JSON back to machine encoding. PHP isn't doing anything weird. It uses very conventional representations just like in C. PHP -> SQL -> JSON -> PHP. There is exactly one too many data representations here. Do I explain myself any better now? Tom
Hi, Tom! On 29.01.15 01:12, Tom Worster wrote: [skip]
And Maria promises us that we could use that expression to write the value back and the result will be the exact same thing. A sort of idempotency promise (kinda).
This doesn't make the interface any less clunky (on the contrary) but I have resigned myself to a clunky interface. The only alternative is to forget all this and start something different. SQL is an insanely clunky old language and that's why we use some kind of abstraction software to make it nicer. If we're going to use SQL we may as well try to use what's there.
Something like this COLUMN_GET_SQL() would at least allow the application the possibility to get back what it put in. I am not sure that I understand what you need. Is it ability to know internal storage type? No.
When I use COLUMN_CREATE('x', ___ ) I provide in the ___ a literal that has, in its SQL context, both a value and a datatype. So I have a way to put something into the database that I cannot read back out. COLUMN_CREATE() is the SQL way to deposit (value, datatype) into the database. There is no corresponding way SQL way to make the withdrawal of (value, datatype). My proposal was aimed at resolving that.
My understanding is that we have to fix the datatype of what will be returned on a reading before any data is read. So we have been talking about just fixing it as string and encode the (value, datatype) tuple into it. OK. But why a JavaScript literal?
In documenting my ORM stuff, I wrote "In Maria the situation is worse. Data is saved via SQL and retrieved via JSON. This may seem perverse but..." And I end up with a lifecycle for a model in the ORM that's just strange: Start in PHP machine encoding. Convert to SQL and send to DB. Request JSON representation of the data from the DB. Convert the JSON back to machine encoding. PHP isn't doing anything weird. It uses very conventional representations just like in C.
PHP -> SQL -> JSON -> PHP. There is exactly one too many data representations here.
Do I explain myself any better now?
So the problem is to get the same data after conversion via JSON (Right?). Data are stored with its type (not exactly SQL type, but close (codding aimed to keep data as compact as possible). But if we need to have Dynamic Columns -> JSON -> Dynamic Columns types conversion in the way that after conversion we will have the same data as we had (not internally but from point of view of external requests) it is doable because get_column() has also type casting and we can have exact results even if representation differs. There are problems of mapping JSON types back to columns I can see: 1) numeric types (here we can store in the most compact ways without loosing precission) trying to use: unsigned -> signed -> decimal(with limit on digits) -> double(for everything which can not be represented by decimal) Dynamic columns make automatic conversion so even if the number will be requested in other way (get_column()) we still get the same data except floating point but floating point is something which always should be compared with precision taking into account: (it is not recommended to compare a==b but abs(a-b) < precision_delta ) 2) date/time JSON have no dates (AFAIK) so string is safe (but not compact, that is disadvantage) 3) boolean/array as I told is a problem and here dynamic columns can be extended to support them, so conversion to json and back will not change data representation [skip]
Hi Oleksandr, I'm going to stop bitching about Maria now. My needs are in fact met by the current dyncol functions: - It's not so hard to write methods that generate CREATE_COLUMN() expressions from a general data structure. - COLUMN_JSON() is just fine for all my reading purposes. I see no need for a naked COLUMN_GET() in a SELECT. - Generating COLUMN_GET() expressions for use in queries ends up being a regex replace with some packaging. - As I said in the email to Frederico this morning, indexes are not unimportant but a lot of apps can live without and many will be fine. When I wrote my original email, that list of seven concerns were factors in my vague worry that dyncols are going to remain ignored and unloved. That original email is adequately answered and I have the information I need. Tom
2015-01-29 11:57 GMT-02:00 Tom Worster <fsb@thefsb.org>:
Hi Oleksandr,
I'm going to stop bitching about Maria now. My needs are in fact met by the current dyncol functions:
- It's not so hard to write methods that generate CREATE_COLUMN() expressions from a general data structure.
- COLUMN_JSON() is just fine for all my reading purposes. I see no need for a naked COLUMN_GET() in a SELECT.
take care, when you return more data than you really need, you use more network, memory, cpu, cache, buffers... something like SELECT * FROM table, when you only need SELECT column_a FROM table with big system or small hardware you may have problems
- Generating COLUMN_GET() expressions for use in queries ends up being a regex replace with some packaging.
take care with the size of final query... for example SELECT COLUMN_GET(),COLUMN_GET(),COLUMN_GET()..... 256 times FROM table if your network is a problem, maybe is better SELECT column FROM table, and execute the column_get at client side, this reduce network, and memory used at client and server side with the SQL string (query)
- As I said in the email to Frederico this morning, indexes are not unimportant but a lot of apps can live without and many will be fine.
When I wrote my original email, that list of seven concerns were factors in my vague worry that dyncols are going to remain ignored and unloved.
i like and use them =] but i don't love dynamic column without index, today i execute workarounds to solve this
That original email is adequately answered and I have the information I need.
nice =]
Tom
I'm going to respond to your previous two emails with a generality: It is sometimes better to optimize the software development process than the software's data processing. If this were not true, I would not be using this ORM at all and nobody should. But the converse is also clear: sometimes the layers of indirection and abstraction that make the life of an application developer more comfortable and productive are not appropriate for the reasons you mentioned, among others. Knowing which of the many tools to choose is part of being a carpenter.
The answer is: SQL is a statically-typed language. The SQL interpreter needs to know the datatypes of all expressions before the query is run (for example, when one is using prepared statements and runs "select COLUMN_GET(...)", the prepared statement API requires the server to inform the client about the datatype of the column being read before
I have one final technical question. It is for my understanding and does not reflect a problem I have as a user. The manual says: the query is executed and the server can see what datatype the column actually has). This appears to say that the constraint is that the server needs to determine the type of each column in the data sent to the client. SQL such as: SELECT x, y, COLUMN_GET(stuff, 'z') FROM t can't work because the datatype of the 3rd column in "table" returned to the client can't be determined before reading data. I think I can follow this explanation. But I have no use for SQL like that (the naked COLUMN_GET, i.e. a COLUMN_GET that has no context). I use COLUMN_JSON() to load data into memory. It's a better fit for the AR ORM I use. I only need COLUMN_GET in the context of a larger expression, such as: SELECT x, y FROM t WHERE COLUMN_GET(stuff, 'z') = 5 In this select, the datatypes for the returned data are clear before reading any data. So why do I have to explicitly cast the dyncol 'z'? Or what if there is no returned data at all: DELETE FROM t WHERE COLUMN_GET(stuff, 'z') = 5 If the need to cast is to determine the type of data sent to the client, that argument surely doesn't apply here. It really wouldn't apply to this either: SELECT x, y, CONCAT('Hello. My name is ', COLUMN_GET(stuff, 'z')) FROM t If I understand correctly (doubtful) and the cast is only necessary for the naked COLUMN_GET, then isn't the cast in the wrong place? The cast should be put after the COLUMN_GET function to remove the nakedness. The server should throw an SQL error if it encounters a naked COLUMN_GET. In other words, COLUMN_GET can be used only as _part_of_ an expression. Tom
2015-01-29 12:34 GMT-02:00 Tom Worster <fsb@thefsb.org>:
I have one final technical question. It is for my understanding and does not reflect a problem I have as a user.
The manual says:
The answer is: SQL is a statically-typed language. The SQL interpreter needs to know the datatypes of all expressions before the query is run (for example, when one is using prepared statements and runs "select COLUMN_GET(...)", the prepared statement API requires the server to inform the client about the datatype of the column being read before the query is executed and the server can see what datatype the column actually has).
This appears to say that the constraint is that the server needs to determine the type of each column in the data sent to the client. SQL such as:
SELECT x, y, COLUMN_GET(stuff, 'z') FROM t
can't work because the datatype of the 3rd column in "table" returned to the client can't be determined before reading data.
hum... no at 3rd column, you have a function returning a value, the function have a know data type, check this: http://dev.mysql.com/doc/refman/5.1/en/udf-calling.html that's how user defined functions are implemented at mysql, think about dynamic column as many functions to handle a string and extract/include information about variables with defined data type for example... when you create a dynamic column with: -- MariaDB 5.3+:INSERT INTO tbl SET dyncol_blob=COLUMN_CREATE(1 /*column id*/, "value");-- MariaDB 10.0.1+:INSERT INTO tbl SET dyncol_blob=COLUMN_CREATE("column_name", "value"); check: https://mariadb.com/kb/en/mariadb/dynamic-columns/#column_create you have a STRING as return of function COLUMN_CREATE() The return value is suitable for - - storing in a table - further modification with other dynamic columns functions at "value" parameter, COLUMN_CREATE know that we are using a STRING, but should be any other data type The *as type* part allows one to specify the value type. In most cases, this is redundant because MariaDB will be able to deduce the type of the value. Explicit type specification may be needed when the type of the value is not apparent. For example, a literal '2012-12-01' has a CHAR type by default, one will need to specify'2012-12-01' AS DATE to have it stored as a date. See the Datatypes <https://mariadb.com/kb/en/mariadb/dynamic-columns/#Datatypes> section for further details.
I think I can follow this explanation. But I have no use for SQL like that (the naked COLUMN_GET, i.e. a COLUMN_GET that has no context). I use COLUMN_JSON() to load data into memory. It's a better fit for the AR ORM I use.
I only need COLUMN_GET in the context of a larger expression, such as:
SELECT x, y FROM t WHERE COLUMN_GET(stuff, 'z') = 5
this is a example of why we need index =)
In this select, the datatypes for the returned data are clear before reading any data.
hummm in this example, the COLUMN_GET will check if stuff column have the 'z' key each row, and if it exists, it will get the value/data type, if it not exist it will return NULL and compare NULL = 5? check: COLUMN_GET COLUMN_GET(dyncol_blob, column_nr as type); COLUMN_GET(dyncol_blob, column_name as type); Get the value of a dynamic column by its name. If no column with the given name exists, NULL will be returned. *column_name as type* requires that one specify the datatype of the dynamic column they are reading. This may seem counter-intuitive: why would one need to specify which datatype they're retrieving? Can't the dynamic columns system figure the datatype from the data being stored? The answer is: SQL is a statically-typed language. The SQL interpreter needs to know the datatypes of all expressions before the query is run (for example, when one is using prepared statements and runs"select COLUMN_GET(...)", the prepared statement API requires the server to inform the client about the datatype of the column being read before the query is executed and the server can see what datatype the column actually has). See the Datatypes <https://mariadb.com/kb/en/mariadb/dynamic-columns/#datatypes> section for more information about datatypes. THAT'S A BIG PROBLEM FOR EXPRESSION INDEX! since COLUMN_GET could return STRING OR DECIMAL OR INTEGER OR .... we need only one data type to create the index, instead of many data type + many data, since today index at mysql/mariadb only accept one datatype for each index, for example if we only have COLUMN_GET() returning INTEGER, well let's create a INTEGER index... but if COLUMN_GET() return STRING AND/OR INTEGER, what should we do? create a STRING index? i think yes... but check that big STRINGs are a problem to index, we have a limit, today we can't create index over BLOBs columns
So why do I have to explicitly cast the dyncol 'z'?
well you dont need, but if you want good and deterministic results you should cast
Or what if there is no returned data at all:
DELETE FROM t WHERE COLUMN_GET(stuff, 'z') = 5
COLUMN_GET should return NULL in this case, if WHERE part return true the row will be deleted
If the need to cast is to determine the type of data sent to the client,
yes and no... mysql use the data type internally too
that argument surely doesn't apply here. It really wouldn't apply to this either:
SELECT x, y, CONCAT('Hello. My name is ', COLUMN_GET(stuff, 'z')) FROM t
in this case if COLUMN_GET return INTEGER, CONCAT funciton will cast it to string
If I understand correctly (doubtful) and the cast is only necessary for the naked COLUMN_GET, then isn't the cast in the wrong place?
the cast if for COLUMN_GET return, each dynamic column could have different keys/values from others dynamic columns... think about a field where you create a table each row/column you use dynamic columns
The cast should be put after the COLUMN_GET function to remove the nakedness.
well, from what i understand COLUMN_GET cast as a parameter of COLUMN_GET funciton, is to predict what COLUMN_GET function should we use, think about a function like this: int function(parameter) string function(parameter) what function should be used in this cases? function()+0 <- the int function(parameter) function()+" " <- the string function(parameter) if you don't include the cast, the COLUMN_GET will return the data type based on dynamic column information
The server should throw an SQL error if it encounters a naked COLUMN_GET.
well.. i don't know, but if you want a error, the error should be generated by COLUMN_CREATE, but that's a problem... each row/column have a dynamic column value, the data type of one row don't need to be the same of another row, in a row/column you could use 'z' as integer, at another row/column you could use 'z' as string
In other words, COLUMN_GET can be used only as _part_of_ an expression.
COLUMN_GET is a function, use as a function =]
Tom
one point... at the COLUMN_GET() at SELECT part of query mysql protocol, need one specific data type to send to client side this example: SELECT CONCAT('blabla', COLUMN_GET(....)), COLUMN_GET(....) 1st column: CONCAT return string, column_get isn't a problem here 2nd column: COLUMN_GET will return what? mysql protocol must know what data type it will return http://dev.mysql.com/doc/internals/en/com-query-response.html http://dev.mysql.com/doc/internals/en/com-query-response.html#packet-Protoco...
Hi, Roberto! On 29.01.15 15:55, Roberto Spadim wrote:
2015-01-29 12:34 GMT-02:00 Tom Worster <fsb@thefsb.org <mailto:fsb@thefsb.org>>:
I have one final technical question. It is for my understanding and does not reflect a problem I have as a user.
The manual says:
> The answer is: SQL is a statically-typed language. The SQL interpreter needs to know the datatypes of all expressions before the query is run (for example, when one is using prepared statements and runs "select COLUMN_GET(...)", the prepared statement API requires the server to inform the client about the datatype of the column being read before the query is executed and the server can see what datatype the column actually has).
This appears to say that the constraint is that the server needs to determine the type of each column in the data sent to the client. SQL such as:
SELECT x, y, COLUMN_GET(stuff, 'z') FROM t
can't work because the datatype of the 3rd column in "table" returned to the client can't be determined before reading data.
hum... no at 3rd column, you have a function returning a value, the function have a know data type, check this: http://dev.mysql.com/doc/refman/5.1/en/udf-calling.html that's how user defined functions are implemented at mysql, think about dynamic column as many functions to handle a string and extract/include information about variables with defined data type
for example... when you create a dynamic column with: -- MariaDB 5.3+: INSERT INTO tbl SET dyncol_blob=COLUMN_CREATE(1 /*column id*/, "value"); -- MariaDB 10.0.1+: INSERT INTO tbl SET dyncol_blob=COLUMN_CREATE("column_name", "value"); check: https://mariadb.com/kb/en/mariadb/dynamic-columns/#column_create you have a STRING as return of function COLUMN_CREATE()
The return value is suitable for
* o storing in a table o further modification with other dynamic columns functions
at "value" parameter, COLUMN_CREATE know that we are using a STRING, but should be any other data type
The *|as type|* part allows one to specify the value type. In most cases, this is redundant because MariaDB will be able to deduce the type of the value. Explicit type specification may be needed when the type of the value is not apparent. For example, a literal |'2012-12-01'| has a CHAR type by default, one will need to specify|'2012-12-01' AS DATE| to have it stored as a date. See the Datatypes <https://mariadb.com/kb/en/mariadb/dynamic-columns/#Datatypes> section for further details.
You are wrong, with UDF function result type known because then goes something like: mysql>*|CREATE FUNCTION metaphon RETURNS STRING SONAME 'udf_example.so';|* mysql>*|CREATE FUNCTION myfunc_double RETURNS REAL SONAME 'udf_example.so';|* mysql>*|CREATE FUNCTION myfunc_int RETURNS INTEGER SONAME 'udf_example.so';|* ( http://dev.mysql.com/doc/refman/5.1/en/udf-compiling.html ) Type just have to be sent to the client. Some functions required argument of certain type so ask correcpondent val*() method of Item object but top most function (or constant or field) determinate type of the column in SELECT list and the type should be known. So previous statement about 3rd column was very precise. [skip]
Hi Oleksandr!! :) Hi, Roberto!
at "value" parameter, COLUMN_CREATE know that we are using a STRING, but should be any other data type
The *|as type|* part allows one to specify the value type. In most cases, this is redundant because MariaDB will be able to deduce the type of the value. Explicit type specification may be needed when the type of the value is not apparent. For example, a literal |'2012-12-01'| has a CHAR type by default, one will need to specify|'2012-12-01' AS DATE| to have it stored as a date. See the Datatypes <https://mariadb.com/kb/en/ mariadb/dynamic-columns/#Datatypes> section for further details.
You are wrong, with UDF function result type known because then goes something like:
hum nice, but COLUMN_CREATE should return *always* string data type right? and the second parameter of function, in this case "value", is a parameter with a value="value" and data type=string , right? i'm not a internal expert, i'm trying to understand some doubts, check i'm my affirmations are right: 1) UPDATE/DELETE/SELECT, at WHERE "part", data type is a problem to operators for example "=", cause it need a cast before compare... for example 3="03", it must know if we will convert interger to string, or string to integer, that's why a CAST at COLUMN_GET is important 2) at SELECT "part" data type is a problem to mysql protocol, cause we must send to client what type each column will receive, at this part i don't know how COLUMN_GET should work... internally it return the data type before receiving data? or it will read some data and after inform the data type being used? or each row it return a data type, and internally mysql buffer the SELECT result, and identify after SELECT what data type should be used? 3) at UPDATE/INSERT/REPLACE data type is important to storage don't truncate,overflow or any other possible data loss
mysql>*|CREATE FUNCTION metaphon RETURNS STRING SONAME 'udf_example.so';|* mysql>*|CREATE FUNCTION myfunc_double RETURNS REAL SONAME 'udf_example.so';|* mysql>*|CREATE FUNCTION myfunc_int RETURNS INTEGER SONAME 'udf_example.so';|*
hum, if we have a function that could return INT or STRING or REAL, we must have 3 functions, right? i'm trying to search at mariadb source code how COLUMN_GET is implemented, should a internall function return >1 datatypes and UDF not? (i'm begginer to internall functions)
( http://dev.mysql.com/doc/refman/5.1/en/udf-compiling.html )
Type just have to be sent to the client. Some functions required argument of certain type so ask correcpondent val*() method of Item object but top most function (or constant or field) determinate type of the column in SELECT list and the type should be known.
nice, in this case CONCAT('abc',any_function()) the any_function could return any data type, but client/server must know what datatype CONCAT return, right?
So previous statement about 3rd column was very precise.
your comments make things easier to understand each day :) many thanks
Hi! On 30.01.15 00:03, Roberto Spadim wrote:
Hi Oleksandr!! :)
Hi, Roberto!
at "value" parameter, COLUMN_CREATE know that we are using a STRING, but should be any other data type
The *|as type|* part allows one to specify the value type. In most cases, this is redundant because MariaDB will be able to deduce the type of the value. Explicit type specification may be needed when the type of the value is not apparent. For example, a literal |'2012-12-01'| has a CHAR type by default, one will need to specify|'2012-12-01' AS DATE| to have it stored as a date. See the Datatypes <https://mariadb.com/kb/en/mariadb/dynamic-columns/#Datatypes> section for further details.
You are wrong, with UDF function result type known because then goes something like:
hum nice, but COLUMN_CREATE should return *always* string data type right? and the second parameter of function, in this case "value", is a parameter with a value="value" and data type=string , right?
i'm not a internal expert, i'm trying to understand some doubts, check i'm my affirmations are right: 1) UPDATE/DELETE/SELECT, at WHERE "part", data type is a problem to operators for example "=", cause it need a cast before compare... for example 3="03", it must know if we will convert interger to string, or string to integer, that's why a CAST at COLUMN_GET is important Yes, there are some rules of type casting if they are not the same for expression like this and even for SELECT UNION. 2) at SELECT "part" data type is a problem to mysql protocol, cause we must send to client what type each column will receive, at this part i don't know how COLUMN_GET should work... internally it return the data type before receiving data? or it will read some data and after inform the data type being used? or each row it return a data type, and internally mysql buffer the SELECT result, and identify after SELECT what data type should be used? Internally on preparing it ask what type the function (or field or constant) returns "naturally"
3) at UPDATE/INSERT/REPLACE data type is important to storage don't truncate,overflow or any other possible data loss It depend on SQL_MODE, so it can be warning or error if data can't be converted or data lost
mysql>*|CREATE FUNCTION metaphon RETURNS STRING SONAME 'udf_example.so';|* mysql>*|CREATE FUNCTION myfunc_double RETURNS REAL SONAME 'udf_example.so';|* mysql>*|CREATE FUNCTION myfunc_int RETURNS INTEGER SONAME 'udf_example.so';|*
hum, if we have a function that could return INT or STRING or REAL, we must have 3 functions, right? i'm trying to search at mariadb source code how COLUMN_GET is implemented, should a internall function return
1 datatypes and UDF not? (i'm begginer to internall functions) All function internally can return any internal type (integer (boolean), double, decimal, string (date/time)), but sometimes it cause an error or require conversion.
( http://dev.mysql.com/doc/refman/5.1/en/udf-compiling.html )
Type just have to be sent to the client. Some functions required argument of certain type so ask correcpondent val*() method of Item object but top most function (or constant or field) determinate type of the column in SELECT list and the type should be known.
nice, in this case CONCAT('abc',any_function()) the any_function could return any data type, but client/server must know what datatype CONCAT return, right? Yes, and it return string. Even more, on preparation it will try to guess maximum possible length of the string. but concat() can return numeric types also, the problem is that data loss or errors possible in
For example all string manipulation functions return string, but they can return number if it will be able to convert string. this case (depends on SQL_MODE as I told)
So previous statement about 3rd column was very precise.
your comments make things easier to understand each day :) many thanks
you are welcome
Just wanted to add, any application that wants to store forms data without having to create a new table every time you create a new form could benefit from dynamic columns. In the database design world, any design that is of the EAV (Entity-Attribute-Value) style would benefit from dynamic columns. Having to extract and analyze EAV tables, for the developer, is tricky and burdensome. This model seems to show up in the literature first in Medical settings and is used in basic science research as well. (see http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model) For those that need it, dynamic columns is powerful and makes life much easier. Unfortunately, often those that need dynamic columns do not know it exists. If they did, they would jump for joy. Adam On Wed, Jan 28, 2015 at 3:41 PM, Oleksandr Byelkin <sanja@montyprogram.com> wrote:
Hi, Tom!
On 28.01.15 23:01, Tom Worster wrote:
While functions for getting and setting via json might be useful, I think they are peripheral to the core datatype problem here.
The interface to dynamic columns in Maria is SQL. The internal datatypes are SQLish. We can be completely sure of what was set, in SQL terms, by a COLUMN_CREATE(). But if we use a JSON getter then we might get something different because JSON datatypes don't match SQL datatypes. Sometimes it's no problem but not in general -- depends on the datatypes involved and the application's tolerance of differences.
AFAIK we lack only boolean type and arrays (could be emulated now but you can't say if it is array or object with numeric keys) to fit json types.
column_create/alter do not require type (detect automatically), so type there can be mentioned to enforce some encoding you want (store dates as dates but not string)
column_get() require type of output, if you do not care use CHAR. But this is required by SQL expression implementation. There is json output which do not need type of columns.
What's missing is a getter that tells us the value and datatype of a dynamic column, a getter that tells us what was set.
What if we had a getter that returns a string like:
COLUMN_GET_SQL(`product`, 'price')
"CAST(123.456 AS DECIMAL(11,3))"
And Maria promises us that we could use that expression to write the value back and the result will be the exact same thing. A sort of idempotency promise (kinda).
This doesn't make the interface any less clunky (on the contrary) but I have resigned myself to a clunky interface. The only alternative is to forget all this and start something different. SQL is an insanely clunky old language and that's why we use some kind of abstraction software to make it nicer. If we're going to use SQL we may as well try to use what's there.
Something like this COLUMN_GET_SQL() would at least allow the application the possibility to get back what it put in.
I am not sure that I understand what you need. Is it ability to know internal storage type?
[skip]
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
yeap, i agree in my opnion... today without "expression index" to allow a faster search (on attribute or value) it's not possible to only use dynamic columns cause we get full table scan, today a second table with attribute-value should be necessary in this case to have a better performace what we can do today is virtual column + index, but this consume more space with data+index store, instead of only index store, and we must rewrite queries to use virtual column 2015-01-28 23:20 GMT-02:00 Adam Scott <adam.c.scott@gmail.com>:
Just wanted to add, any application that wants to store forms data without having to create a new table every time you create a new form could benefit from dynamic columns.
In the database design world, any design that is of the EAV (Entity-Attribute-Value) style would benefit from dynamic columns. Having to extract and analyze EAV tables, for the developer, is tricky and burdensome. This model seems to show up in the literature first in Medical settings and is used in basic science research as well. (see http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model)
For those that need it, dynamic columns is powerful and makes life much easier. Unfortunately, often those that need dynamic columns do not know it exists. If they did, they would jump for joy.
Adam
On Wed, Jan 28, 2015 at 3:41 PM, Oleksandr Byelkin <sanja@montyprogram.com
wrote:
Hi, Tom!
On 28.01.15 23:01, Tom Worster wrote:
While functions for getting and setting via json might be useful, I think they are peripheral to the core datatype problem here.
The interface to dynamic columns in Maria is SQL. The internal datatypes are SQLish. We can be completely sure of what was set, in SQL terms, by a COLUMN_CREATE(). But if we use a JSON getter then we might get something different because JSON datatypes don't match SQL datatypes. Sometimes it's no problem but not in general -- depends on the datatypes involved and the application's tolerance of differences.
AFAIK we lack only boolean type and arrays (could be emulated now but you can't say if it is array or object with numeric keys) to fit json types.
column_create/alter do not require type (detect automatically), so type there can be mentioned to enforce some encoding you want (store dates as dates but not string)
column_get() require type of output, if you do not care use CHAR. But this is required by SQL expression implementation. There is json output which do not need type of columns.
What's missing is a getter that tells us the value and datatype of a dynamic column, a getter that tells us what was set.
What if we had a getter that returns a string like:
COLUMN_GET_SQL(`product`, 'price')
"CAST(123.456 AS DECIMAL(11,3))"
And Maria promises us that we could use that expression to write the value back and the result will be the exact same thing. A sort of idempotency promise (kinda).
This doesn't make the interface any less clunky (on the contrary) but I have resigned myself to a clunky interface. The only alternative is to forget all this and start something different. SQL is an insanely clunky old language and that's why we use some kind of abstraction software to make it nicer. If we're going to use SQL we may as well try to use what's there.
Something like this COLUMN_GET_SQL() would at least allow the application the possibility to get back what it put in.
I am not sure that I understand what you need. Is it ability to know internal storage type?
[skip]
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
-- Roberto Spadim SPAEmpresarial - Software ERP Eng. Automação e Controle
Hi I agree that EAV is a pain in the ass. Having several very similar tables that represent flavors of the same "parent entity" is a pain too. Dynamic Columns are logically simple. I think that the problem is that people don't really know them. And if one tries to explore them, he finds a syntax that is not really easy. Also they are slow, unless you index them, but indexing a dynamic column is not easy - also, requires more space than a normal index, because you are supposed to create a persistent column first. I don't have any suggestions, except that importing data from JSON would be easier and probably useful (something like a JSON_TO_DYNCOL function). Regards Federico -------------------------------------------- Gio 29/1/15, Adam Scott <adam.c.scott@gmail.com> ha scritto: Oggetto: Re: [Maria-discuss] The future of Dynamic Columns A: "Oleksandr Byelkin" <sanja@montyprogram.com> Cc: "Maria Discuss" <maria-discuss@lists.launchpad.net> Data: Giovedì 29 gennaio 2015, 02:20 Just wanted to add, any application that wants to store forms data without having to create a new table every time you create a new form could benefit from dynamic columns. In the database design world, any design that is of the EAV (Entity-Attribute-Value) style would benefit from dynamic columns. Having to extract and analyze EAV tables, for the developer, is tricky and burdensome. This model seems to show up in the literature first in Medical settings and is used in basic science research as well. (see http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model) For those that need it, dynamic columns is powerful and makes life much easier. Unfortunately, often those that need dynamic columns do not know it exists. If they did, they would jump for joy. Adam On Wed, Jan 28, 2015 at 3:41 PM, Oleksandr Byelkin <sanja@montyprogram.com> wrote: Hi, Tom! On 28.01.15 23:01, Tom Worster wrote: While functions for getting and setting via json might be useful, I think they are peripheral to the core datatype problem here. The interface to dynamic columns in Maria is SQL. The internal datatypes are SQLish. We can be completely sure of what was set, in SQL terms, by a COLUMN_CREATE(). But if we use a JSON getter then we might get something different because JSON datatypes don't match SQL datatypes. Sometimes it's no problem but not in general -- depends on the datatypes involved and the application's tolerance of differences. AFAIK we lack only boolean type and arrays (could be emulated now but you can't say if it is array or object with numeric keys) to fit json types. column_create/alter do not require type (detect automatically), so type there can be mentioned to enforce some encoding you want (store dates as dates but not string) column_get() require type of output, if you do not care use CHAR. But this is required by SQL expression implementation. There is json output which do not need type of columns. What's missing is a getter that tells us the value and datatype of a dynamic column, a getter that tells us what was set. What if we had a getter that returns a string like: COLUMN_GET_SQL(`product`, 'price')
"CAST(123.456 AS DECIMAL(11,3))"
And Maria promises us that we could use that expression to write the value back and the result will be the exact same thing. A sort of idempotency promise (kinda). This doesn't make the interface any less clunky (on the contrary) but I have resigned myself to a clunky interface. The only alternative is to forget all this and start something different. SQL is an insanely clunky old language and that's why we use some kind of abstraction software to make it nicer. If we're going to use SQL we may as well try to use what's there. Something like this COLUMN_GET_SQL() would at least allow the application the possibility to get back what it put in. I am not sure that I understand what you need. Is it ability to know internal storage type? [skip] _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp -----Segue allegato----- _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi Frederico, You identify three problems. 1. Developers are mostly ignorant of dynamic column or that they can solve this problem. I agree. 2. Maria's interface is not easy. I agree but my opinion has changed during this discussion. There's a limit to what the server can do about this. Also, the CRUD app developer is unlikely to be writing SQL. So I think we need to look farther up the stack for the solution(s). 3. No indexes. I partly agree. On one hand, if you really want to sell the feature to the broad user community then yes, this is an obvious hole in your marketing story. But the feature is still very useful without it. Many applications will function fine without indexes. The general problem (a type with too many properties) is old and has been solved in the past without indexes. Dynamic columns are a big step forwards even without the indexes. The way I see it now, we are complaining to the wrong people. We have to do some work on the client side (where exactly, I'm not sure) and we need better marketing. Tom On 1/29/15, 2:56 AM, "Federico Razzoli" <federico_raz@yahoo.it> wrote:
Hi
I agree that EAV is a pain in the ass. Having several very similar tables that represent flavors of the same "parent entity" is a pain too. Dynamic Columns are logically simple.
I think that the problem is that people don't really know them. And if one tries to explore them, he finds a syntax that is not really easy. Also they are slow, unless you index them, but indexing a dynamic column is not easy - also, requires more space than a normal index, because you are supposed to create a persistent column first.
I don't have any suggestions, except that importing data from JSON would be easier and probably useful (something like a JSON_TO_DYNCOL function).
Regards Federico
-------------------------------------------- Gio 29/1/15, Adam Scott <adam.c.scott@gmail.com> ha scritto:
Oggetto: Re: [Maria-discuss] The future of Dynamic Columns A: "Oleksandr Byelkin" <sanja@montyprogram.com> Cc: "Maria Discuss" <maria-discuss@lists.launchpad.net> Data: Giovedì 29 gennaio 2015, 02:20
Just wanted to add, any application that wants to store forms data without having to create a new table every time you create a new form could benefit from dynamic columns.
In the database design world, any design that is of the EAV (Entity-Attribute-Value) style would benefit from dynamic columns. Having to extract and analyze EAV tables, for the developer, is tricky and burdensome. This model seems to show up in the literature first in Medical settings and is used in basic science research as well. (see http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model)
For those that need it, dynamic columns is powerful and makes life much easier. Unfortunately, often those that need dynamic columns do not know it exists. If they did, they would jump for joy.
Adam
On Wed, Jan 28, 2015 at 3:41 PM, Oleksandr Byelkin <sanja@montyprogram.com> wrote: Hi, Tom!
On 28.01.15 23:01, Tom Worster wrote:
While functions for getting and setting via json might be useful, I think they are peripheral to the core datatype problem here.
The interface to dynamic columns in Maria is SQL. The internal datatypes are SQLish. We can be completely sure of what was set, in SQL terms, by a COLUMN_CREATE(). But if we use a JSON getter then we might get something different because JSON datatypes don't match SQL datatypes. Sometimes it's no problem but not in general -- depends on the datatypes involved and the application's tolerance of differences.
AFAIK we lack only boolean type and arrays (could be emulated now but you can't say if it is array or object with numeric keys) to fit json types.
column_create/alter do not require type (detect automatically), so type there can be mentioned to enforce some encoding you want (store dates as dates but not string)
column_get() require type of output, if you do not care use CHAR. But this is required by SQL expression implementation. There is json output which do not need type of columns.
What's missing is a getter that tells us the value and datatype of a dynamic column, a getter that tells us what was set.
What if we had a getter that returns a string like:
COLUMN_GET_SQL(`product`, 'price')
"CAST(123.456 AS DECIMAL(11,3))"
And Maria promises us that we could use that expression to write the value back and the result will be the exact same thing. A sort of idempotency promise (kinda).
This doesn't make the interface any less clunky (on the contrary) but I have resigned myself to a clunky interface. The only alternative is to forget all this and start something different. SQL is an insanely clunky old language and that's why we use some kind of abstraction software to make it nicer. If we're going to use SQL we may as well try to use what's there.
Something like this COLUMN_GET_SQL() would at least allow the application the possibility to get back what it put in.
I am not sure that I understand what you need. Is it ability to know internal storage type?
[skip]
_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help : https://help.launchpad.net/ListHelp
-----Segue allegato-----
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
2015-01-29 11:38 GMT-02:00 Tom Worster <fsb@thefsb.org>:
Hi Frederico,
You identify three problems.
1. Developers are mostly ignorant of dynamic column or that they can solve this problem. I agree.
hum, i will point some problems and ideas... normally developers don't think about how data is saved and handled by database / storage engine / storage media / cached, etc... that's something that not all developers take care, today we have 1TB hdd easily, some years ago 250GB was very much, and some years ago 4GB was very very big (we couldn't use 32bits machines with more than 4GB) with hdd we have problem with data access... no more, ssd is something that will save many guys 250mb/s of random access is very good, some years ago we only have this with hdd+raid systems dynamic column consume more space, but are more flexible, i think the use of dynamic column are more related to the developer knowledge... for example, let's talk about a issue tracker system where we don't know what we will expand let's get mariadb issue tracker exeample... Jira, create a new variable at Jira system, does jira use a table with "issue_id / variable/ value" columns to save it? or it use a dynamic column at "issues" table? both works, but what is better? that's where json/yaml/xml/dynamic columns can help, today i see many many many developers using "issue_id / variable / value" cause we don't have a index to fast query system, other point is... with a single table (issues table) we need more lock control probably, updating a row constantly is slower than updating just the table that we need if you don't tune database well... i think users want options to solve problems, dynamic columns is a option (very good), but without index, i think we can't increase the use of dynamic columns, users will use dynamic columns, but when they get a table with >1M rows, they will think about well that query with dynamic column take 2seconds to return, how could i optimize it?... create a new field? create a table? we can't index it easly.. that's the main problem that i see with dynamic columns
2. Maria's interface is not easy.
I think mariadb/mysql is easier to use than pgsql, oracle, sql server, odbc, and some others
I agree but my opinion has changed during this discussion. There's a limit to what the server can do about this. Also, the CRUD app developer is unlikely to be writing SQL.
yes, that's a big problem in my opnion, i prefer write each sql with strings and stress with sql injection tests
So I think we need to look farther up the stack for the solution(s).
that's a nice solution when you use it :), but think about where to solve this problem... at client side or at server side? is your server better than your client (cpu/memory/network/cache/etc)?
3. No indexes. I partly agree. On one hand, if you really want to sell the feature to the broad user community then yes, this is an obvious hole in your marketing story.
hum... well many years ago i used postgresql, interbase, sql server and oracle... i prefer mysql cause it's easy to use and fast to solve (simple) queries, with dynamic columns i don't see 'simple' queries running fast, if i really need dynamic columns in my project where i can't live without it, i will use another database, or probably will use a virtual column or something to solve this and use more disk/cache/buffer than i should
But the feature is still very useful without it.
yes! increase number of columns is a problem without online alter table, when you have more than 1 dimension inside dynamic column, a table with id-key-value can't handle well
Many applications will function fine without indexes.
yeap
The general problem (a type with too many properties) is old and has been solved in the past without indexes. Dynamic columns are a big step forwards even without the indexes.
yes and no... dynamic columns today are nice, cause we have space, and big machines, but without index we can't use it with big tables >1M rows for example...
The way I see it now, we are complaining to the wrong people. We have to do some work on the client side (where exactly, I'm not sure) and we need better marketing.
well if you want to handle data (convert to/from json, include a column, remove a column) probably it's nice to do something at client side, but if you want to optimize queires , optimize space, and others things you will need to improve database side, index is a first good step to optimize any table scan query
Tom
On 1/29/15, 2:56 AM, "Federico Razzoli" <federico_raz@yahoo.it> wrote:
Hi
I agree that EAV is a pain in the ass. Having several very similar tables that represent flavors of the same "parent entity" is a pain too.
Dynamic Columns are logically simple.
yeap, agree
I think that the problem is that people don't really know them. And if one tries to explore them, he finds a syntax that is not really easy. Also they are slow, unless you index them, but indexing a dynamic column is not easy - also, requires more space than a normal index, because you are supposed to create a persistent column first.
yeap, agree
I don't have any suggestions, except that importing data from JSON would be easier and probably useful (something like a JSON_TO_DYNCOL function).
nice, that's a example of data handle, instead of storage optimization (specific field type to save data with less space, or save data with a faster method to query, like geometry index) or query optimization (index + optimizer changes)
Regards Federico
Hi Adam, Yes, application developers encounter that problem a lot. Please read the Motivation section here: https://github.com/tom--/dynamic-ar/blob/master/README.md If that describes the general problem adequately to you, maybe give it a star? Tom From: Adam Scott <adam.c.scott@gmail.com> Date: Wednesday, January 28, 2015 at 8:20 PM To: Oleksandr Byelkin <sanja@montyprogram.com> Cc: Tom Worster <fsb@thefsb.org>, maria-discuss email list <maria-discuss@lists.launchpad.net> Subject: Re: [Maria-discuss] The future of Dynamic Columns Just wanted to add, any application that wants to store forms data without having to create a new table every time you create a new form could benefit from dynamic columns. In the database design world, any design that is of the EAV (Entity-Attribute-Value) style would benefit from dynamic columns. Having to extract and analyze EAV tables, for the developer, is tricky and burdensome. This model seems to show up in the literature first in Medical settings and is used in basic science research as well. (see http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model) For those that need it, dynamic columns is powerful and makes life much easier. Unfortunately, often those that need dynamic columns do not know it exists. If they did, they would jump for joy. Adam On Wed, Jan 28, 2015 at 3:41 PM, Oleksandr Byelkin <sanja@montyprogram.com> wrote:
Hi, Tom!
On 28.01.15 23:01, Tom Worster wrote:
While functions for getting and setting via json might be useful, I think they are peripheral to the core datatype problem here.
The interface to dynamic columns in Maria is SQL. The internal datatypes are SQLish. We can be completely sure of what was set, in SQL terms, by a COLUMN_CREATE(). But if we use a JSON getter then we might get something different because JSON datatypes don't match SQL datatypes. Sometimes it's no problem but not in general -- depends on the datatypes involved and the application's tolerance of differences. AFAIK we lack only boolean type and arrays (could be emulated now but you can't say if it is array or object with numeric keys) to fit json types.
column_create/alter do not require type (detect automatically), so type there can be mentioned to enforce some encoding you want (store dates as dates but not string)
column_get() require type of output, if you do not care use CHAR. But this is required by SQL expression implementation. There is json output which do not need type of columns.
What's missing is a getter that tells us the value and datatype of a dynamic column, a getter that tells us what was set.
What if we had a getter that returns a string like:
COLUMN_GET_SQL(`product`, 'price')
"CAST(123.456 AS DECIMAL(11,3))"
And Maria promises us that we could use that expression to write the value back and the result will be the exact same thing. A sort of idempotency promise (kinda).
This doesn't make the interface any less clunky (on the contrary) but I have resigned myself to a clunky interface. The only alternative is to forget all this and start something different. SQL is an insanely clunky old language and that's why we use some kind of abstraction software to make it nicer. If we're going to use SQL we may as well try to use what's there.
Something like this COLUMN_GET_SQL() would at least allow the application the possibility to get back what it put in.
I am not sure that I understand what you need. Is it ability to know internal storage type?
[skip]
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss <https://launchpad.net/~maria-discuss> Post to : maria-discuss@lists.launchpad.net <mailto:maria-discuss@lists.launchpad.net> Unsubscribe : https://launchpad.net/~maria-discuss <https://launchpad.net/~maria-discuss> More help : https://help.launchpad.net/ListHelp <https://help.launchpad.net/ListHelp>
Hi Peter :) i will talk about what i used and what i consider interesting to have, 2015-01-28 19:25 GMT-02:00 Peter Laursen <peter_laursen@webyog.com>:
@Roberto. A comment to "i think it's a nice human readable format for arrays/objects, nothing less nothing more".
But why will you then need to **store** as JSON?
about "store": 1) it's nice to read data without any tool (function), it's easy to read a object or a array, maybe could use YAML or any other format (i don't like XML cause it consume too many space) 2) saving json could optimize cpu use at php/application server without decode data every read (if you use json a lot, not my case) 3) could be nice to export/import data? i don't know, today i use table dump when i need, or a small script or pre defined data format about "json internally implemented at database / store": 1)seach with index, jpath/xpath or others functions, consume less space when possible json/xml specific field type 2)today when i use json i save using a blob column and execute a compress() function when possible, sometimes i don't care about datasize, and only save at blob columns (or maybe varchar when data is small) about "using json as part of sql language" 1) could be nice use field->something or {asdf:'abc','dfs':'asd'}, to create/handle dynamic columns instead of using functions Could not something like "SELECT .. INTO JSON ..." do the trick?
yeap, that's not a problem, today i execute json encode/decode at client side only when i need (<10% of all selects/insert/update) i prefer each column with one value, i don't need dynamic columns every time and i can create more columns with spider vp/online alter table when i really need maybe this (INTO JSON) could be nice if we have a http interface to mysql?! or maybe it's nice to someone that use ajax everytime, not my case
(with hindsight: "SELECT .. INTO XML ..." could have been implemented in MySQL when other XML functionalities were implemented but never was, it seems. You can mysqldump to XML but not "SELECT .. INTO XML ..." - using OUTFILE or not - refer http://dev.mysql.com/doc/refman/5.6/en/select-into.html)
after heidisql allowing export data to html and others formats, i stoped use of INTO part of SELECT, that wasn't usefull to me i think it's not usefull using INTO JSON too, maybe if we could import it to excell or other tool that really use it, but i think it's only to ajax applications, or some export method (convert from one database to another?), maybe a connect table with json could be more interesting? i don't know, today i only see that i could optimize some queires using dynamic columns if i could index it, that's my today big problem with cpu use and disk use, not all rows i have the column, in this cases it could save as NULL, that's my today big problem i see dynamic columns, json, yaml, xml as encoding methods with some functions,without options to optimize the use of encoded data and optimize the store, for example, we could optimize dynamic columsn if we have a new field type? we could optimize yaml if we have a new field type? today i'm using blob + compress() when i need
-- Peter
On Wed, Jan 28, 2015 at 7:29 PM, Roberto Spadim <roberto@spadim.com.br> wrote:
i think a good start is "expression indexes" and after json syntax to SQL language, and some new functions to handle json/etc
CREATE INDEX name ON table ( some_function(field) ) maybe dynamic column functions, or json functions? or any function deterministic (for example, NOW() function can't be used)
and use it at WHERE clausules
WHERE some_function(field) ='1234' <- use index WHERE some_function(table.field) ='1234' WHERE some_function(field) IN ( sub select ) WHERE some_function(field) LIKE "blabla%" WHERE some_function(field) LIKE "%blabla%" WHERE some_function(field) LIKE "%blabla"
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
-- Roberto Spadim SPAEmpresarial - Software ERP Eng. Automação e Controle
Hi, Roberto! On Jan 28, Roberto Spadim wrote:
i think a good start is "expression indexes" and after json syntax to SQL language, and some new functions to handle json/etc
CREATE INDEX name ON table ( some_function(field) ) maybe dynamic column functions, or json functions? or any function deterministic (for example, NOW() function can't be used)
and use it at WHERE clausules
WHERE some_function(field) ='1234' <- use index WHERE some_function(table.field) ='1234' WHERE some_function(field) IN ( sub select ) WHERE some_function(field) LIKE "blabla%" WHERE some_function(field) LIKE "%blabla%" WHERE some_function(field) LIKE "%blabla"
See MDEV-6017 and linked issues. Regards, Sergei
Hi, Tom! To sum this thread up: * clunky syntax is unfortunate, but that's what SQL is, that'll probably stay that way :( * A function to query a type of a value for a particular dynamic column, COLUMN_GET_SQL() as you call it, that should be easy to do. I've just created an issue for that: https://mariadb.atlassian.net/browse/MDEV-7535 feel free to add your comments there. * indexes for dynamic columns (even if not "as important"). This is already existing MDEV-6017, but it's a much larger task than COLUMN_GET_SQL. * Input JSON directly - an existing task too, MDEV-7524 * And arrays were mentioned too, it's MDEV-6632 And while I'm not promising anything, I think that MDEV-7535 (COLUMN_GET_SQL or something similar) has a good chance of being implemented in 10.2. Regards, Sergei
Note, you can use Flexvie.ws to materialize the get_column() output into another table, index that and use an inexpensive join to retrieve the data from the original table. It isn't a perfect solution, but it is a stop-gap to function based indexes. Sent from my iPhone
On Feb 1, 2015, at 3:23 PM, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Tom!
To sum this thread up:
* clunky syntax is unfortunate, but that's what SQL is, that'll probably stay that way :(
* A function to query a type of a value for a particular dynamic column, COLUMN_GET_SQL() as you call it, that should be easy to do. I've just created an issue for that: https://mariadb.atlassian.net/browse/MDEV-7535 feel free to add your comments there.
* indexes for dynamic columns (even if not "as important"). This is already existing MDEV-6017, but it's a much larger task than COLUMN_GET_SQL.
* Input JSON directly - an existing task too, MDEV-7524
* And arrays were mentioned too, it's MDEV-6632
And while I'm not promising anything, I think that MDEV-7535 (COLUMN_GET_SQL or something similar) has a good chance of being implemented in 10.2.
Regards, Sergei
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
participants (8)
-
Adam Scott
-
Federico Razzoli
-
Justin Swanhart
-
Oleksandr Byelkin
-
Peter Laursen
-
Roberto Spadim
-
Sergei Golubchik
-
Tom Worster