[Maria-developers] More enum values in enum fields
Hi guys, i have a problem, that i don't know how to easly solve (ok i know but i want the user easy way) i have a ENUM column with some values (today 248 distinct values) i see that in a near future i will have more than 65535 different values inside this enum column, the first solution is change it to varchar, but my problem will be data size the enum is very optimized in size, a char is 'converted' to a smallint value my second solution is a int column, and a external table with int/text values, but here i need to change apps, i don't want the second one, but i don't want a very big database (char column) how could i change the limit of enum size? maybe a new enum2 type? with a limit of bigint value (64bits), ok maybe less (32bits) today (reading code) i see that big enum is a problem for .frm file that don't allow this size of enums and i need to get back to a second table, but what should be done to allow a very big enum field? there's a "easy" way to change mariadb code to allow a bigger enum size? thanks guys -- Roberto Spadim
Roberto Spadim <roberto@spadim.com.br> writes:
i see that in a near future i will have more than 65535 different values inside this enum column, the first solution is change it to varchar, but my problem will be data size the enum is very optimized in size, a char is 'converted' to a smallint value
Note that the real limit is much less than 65535. See https://www.flamingspork.com/blog/2010/06/29/enum-now-works-properly-in-driz...
my second solution is a int column, and a external table with int/text values, but here i need to change apps, i don't want the second one, but i don't want a very big database (char column)
how could i change the limit of enum size? maybe a new enum2 type? with a limit of bigint value (64bits), ok maybe less (32bits)
It's probably not a good idea as the whole ENUM values need to always be kept in memory, and going to that many is going to mean a *big* chunk of memory that will be always used. It's much better to do the join and be done with it, it'll end up being a lot more efficient. -- Stewart Smith
humm interesting, does mariadb have the same bugs of mysql? or it's updated like drizzle? 2013/7/9 Stewart Smith <stewart@flamingspork.com>
Roberto Spadim <roberto@spadim.com.br> writes:
i see that in a near future i will have more than 65535 different values inside this enum column, the first solution is change it to varchar, but my problem will be data size the enum is very optimized in size, a char is 'converted' to a smallint value
Note that the real limit is much less than 65535. See
https://www.flamingspork.com/blog/2010/06/29/enum-now-works-properly-in-driz...
my second solution is a int column, and a external table with int/text values, but here i need to change apps, i don't want the second one, but i don't want a very big database (char column)
how could i change the limit of enum size? maybe a new enum2 type? with a limit of bigint value (64bits), ok maybe less (32bits)
It's probably not a good idea as the whole ENUM values need to always be kept in memory, and going to that many is going to mean a *big* chunk of memory that will be always used. It's much better to do the join and be done with it, it'll end up being a lot more efficient.
-- Stewart Smith
-- Roberto Spadim SPAEmpresarial
well, another doubts that i had some time ago reading unireg source... 1)why it have a 64k limit? that's a logical limit (a limit in int variable or char or something like it)? or just a "well i think 64k is a good 'constant' limit"? 2)why we "still" with this "old" unireg format? i know that someone must start the new format and it will not be compatible with mysql... but there's any raw idea or a issue in mdev or lauchpad about changing .frm format or creating a new one? maybe something more human readable YAML or JSON, and a space for engine information https://mariadb.atlassian.net/browse/MDEV-3936
i will put this in discussion list, sorry wrong mail list for this topic 2013/7/10 Roberto Spadim <roberto@spadim.com.br>
well, another doubts that i had some time ago reading unireg source...
1)why it have a 64k limit? that's a logical limit (a limit in int variable or char or something like it)? or just a "well i think 64k is a good 'constant' limit"?
2)why we "still" with this "old" unireg format? i know that someone must start the new format and it will not be compatible with mysql... but there's any raw idea or a issue in mdev or lauchpad about changing .frm format or creating a new one?
maybe something more human readable YAML or JSON, and a space for engine information https://mariadb.atlassian.net/browse/MDEV-3936
-- Roberto Spadim SPAEmpresarial
Roberto Spadim <roberto@spadim.com.br> writes:
well, another doubts that i had some time ago reading unireg source...
1)why it have a 64k limit? that's a logical limit (a limit in int variable or char or something like it)? or just a "well i think 64k is a good 'constant' limit"?
2^16 is why there's that limit.
2)why we "still" with this "old" unireg format? i know that someone must start the new format and it will not be compatible with mysql... but there's any raw idea or a issue in mdev or lauchpad about changing .frm format or creating a new one?
maybe something more human readable YAML or JSON, and a space for engine information https://mariadb.atlassian.net/browse/MDEV-3936
See my other post: Drizzle has a protobuf based format. It's non trivial to make the server behave with this though. It appears there's some code in MariaDB to construct server data structures from a SQL statement though, so perhaps that's more usable. -- Stewart Smith
2013/7/10 Stewart Smith <stewart@flamingspork.com>
Roberto Spadim <roberto@spadim.com.br> writes:
well, another doubts that i had some time ago reading unireg source...
1)why it have a 64k limit? that's a logical limit (a limit in int variable or char or something like it)? or just a "well i think 64k is a good 'constant' limit"?
2^16 is why there's that limit.
ok but why 2^16? it's based in some structure with a variable type that can't handler > 2^16 bytes? or just a number like 65555 65554 65553... ?
2)why we "still" with this "old" unireg format? i know that someone must start the new format and it will not be compatible with mysql... but there's any raw idea or a issue in mdev or lauchpad about changing .frm format or creating a new one?
maybe something more human readable YAML or JSON, and a space for engine information https://mariadb.atlassian.net/browse/MDEV-3936
See my other post: Drizzle has a protobuf based format. It's non trivial to make the server behave with this though. It appears there's some code in MariaDB to construct server data structures from a SQL statement though, so perhaps that's more usable.
-- Stewart Smith
humm i read something about protobuf, but is it human readable? i see yaml and json more common to web standards but they don't have a 'xml schema', while protobuf have a optional / required other doubt, does this increase open/close expend time? for example, today frm = 0.1ms, protobuf=1ms, or something like it? or file parse time isn't the main time consume open/close expend time? -- Roberto Spadim SPAEmpresarial
Roberto Spadim <roberto@spadim.com.br> writes:
ok but why 2^16? it's based in some structure with a variable type that can't handler > 2^16 bytes? or just a number like 65555 65554 65553... ?
uses 16 bits to represent it. It's just the way it is.. hysterical raisins we used to call it.
humm i read something about protobuf, but is it human readable?
You can outptu it as human readable fairly easily, it's otherwise a compact binary format that's generally much quicker to parse than human readable. k -- Stewart Smith
Hi, Roberto! On Jul 10, Roberto Spadim wrote:
2)why we "still" with this "old" unireg format?
That was in the agenda for *at least* 10 years. May be 15 already. Like "it's simple, let's do it next month, shouldn't take more than a couple of weeks". Still, never happened.
i know that someone must start the new format and it will not be compatible with mysql... but there's any raw idea or a issue in mdev or lauchpad about changing .frm format or creating a new one?
Yes, create a view and look at its frm file. The code for writing and reading new frm files is already there. Regards, Sergei
hi sergei, well i start thinking about the frm file after i had a crash in my server and a aria table was read as a connect table, i removed the .frm and put another one in place, and i get my data back if i had a human readable file, that could be easier to understand but what mariadb guys think about standards? stewart talked about protobuf (i really like this idea, but don't know how to use), what about json and yaml? i think that the main problem is parse time, maybe a jornaling in this file to avoid crash? and easy implement new features, example in json we can add a new item in object just put ", 'new item':'some value'", i don
ops, i don't know how this happen in protobuf, or other languages could we add this in jira? or lauch pad, just to don't forget this feature
hum... i saw now the view .frm file it's nice... i don't read the internal source code, but is it easy to extend? don't have a limit like today .frm file?
Hi, Roberto! On Jul 10, Roberto Spadim wrote:
hum... i saw now the view .frm file it's nice... i don't read the internal source code, but is it easy to extend? don't have a limit like today .frm file?
Shouldn't have limits, it's just a plain-text CREATE TABLE statement. Regards, Sergei
Roberto Spadim <roberto@spadim.com.br> writes:
hi sergei, well i start thinking about the frm file after i had a crash in my server and a aria table was read as a connect table, i removed the .frm and put another one in place, and i get my data back if i had a human readable file, that could be easier to understand
but what mariadb guys think about standards? stewart talked about protobuf (i really like this idea, but don't know how to use), what about json and yaml? i think that the main problem is parse time, maybe a jornaling in this file to avoid crash? and easy implement new features, example in json we can add a new item in object just put ", 'new item':'some value'", i don
Schemaless isn't necessarily a feature - protobuf has the benefit of enforcing a schema (XML would work too... parsing time was a concern when I wrote the code for Drizzle) -- Stewart Smith
Hi, Maybe a transitional setting that would create/update both the "legacy" frm file and a file with the new format when required could be a solution for backward compatibility. In that case, it would read the new format definition file if present. Another possibility could be that we could choose the default format and only newly created tables would use that format but it wouldnt modify the existing ones (or maybe via an ALTER). It also could be of use to have a tool that could offline convert the files between the two format in case MariaDB would have to be downgraded or replaced by MySQL or another fork that does not support the new definition file format. Regards. Le 10/07/2013 08:37, Sergei Golubchik a écrit :
Hi, Roberto!
On Jul 10, Roberto Spadim wrote:
2)why we "still" with this "old" unireg format? That was in the agenda for *at least* 10 years. May be 15 already.
Like "it's simple, let's do it next month, shouldn't take more than a couple of weeks".
Still, never happened.
i know that someone must start the new format and it will not be compatible with mysql... but there's any raw idea or a issue in mdev or lauchpad about changing .frm format or creating a new one? Yes, create a view and look at its frm file. The code for writing and reading new frm files is already there.
Regards, Sergei
2013/7/10 Jean Weisbuch <jean@phpnet.org>
Hi,
Maybe a transitional setting that would create/update both the "legacy" frm file and a file with the new format when required could be a solution for backward compatibility. In that case, it would read the new format definition file if present.
Another possibility could be that we could choose the default format and only newly created tables would use that format but it wouldnt modify the existing ones (or maybe via an ALTER).
i like the ALTER idea, something like ALTER TABLE xxxx FRM_FORMAT='OLD/JSON/YAML/TEXT/xyz/protobuff/.... etc'
It also could be of use to have a tool that could offline convert the files between the two format in case MariaDB would have to be downgraded or replaced by MySQL or another fork that does not support the new definition file format.
it's nice too, i put a MDEV when my frm file crashed, maybe we could put this idea there: https://mariadb.atlassian.net/browse/MDEV-4637 ----- what format we should use? what format is easy to develop, fast, compact, human compatible, etc...? json,yaml,protobuf,text file like view format? any idea? i think that at least file should have a hash (md5 for example) to easly find crashed files, should be easly readable by humans (json/yaml/view like) and fast to read/write (protobuf for example or some binary format) i was thinking about something like (json example): "{some json definition}\n BINARY FORMAT\n MD5SUM" the MD5SUM is a MD5 value from "{" to end of BINARY FORMAT (before "\n") if we remove the BINARY FORMAT, MD5 will change, if we don't have the BINARY FORMAT mariadb read the json, and save the binary format if BINARY FORMAT exists we should use it, if some user changed the json it should remove binary format and md5 or mysql should report a error "No MD5SUM on frm file %s", "Wrong MD5SUM on frm file %s", when file don't have a BINARY_FORMAT we could log it as a warning "No binary format at frm file %s, creating..." I think it's a easy format to change (add coments) and repair file via "vi","nano","notepad" etc, i don't know how easy it's to implement at source code (json + protobuf/binary format + logs) what you think guys?
i create a MDEV-4783 for a new .frm file format, maybe we could report ideas / patchs there? thanks guys
Jean Weisbuch <jean@phpnet.org> writes:
Maybe a transitional setting that would create/update both the "legacy" frm file and a file with the new format when required could be a solution for backward compatibility. In that case, it would read the new format definition file if present.
For about the 6 months of development of the protobuf based format in Drizzle this is what I did, with a healthy number of asserts() comparing the results of reading both. For MariaDB, it would mean that when you were doing this you had two things that could break and when you finally stopped the legacy format, that would be a very clear break in any on-disk compatibility with MySQL. -- Stewart Smith
Sergei Golubchik <serg@mariadb.org> writes:
Hi, Roberto!
On Jul 10, Roberto Spadim wrote:
2)why we "still" with this "old" unireg format?
That was in the agenda for *at least* 10 years. May be 15 already.
Like "it's simple, let's do it next month, shouldn't take more than a couple of weeks".
Still, never happened.
Happened in Drizzle, IIRC it was about 6 months of work for the bulk of it. -- Stewart Smith
participants (4)
-
Jean Weisbuch
-
Roberto Spadim
-
Sergei Golubchik
-
Stewart Smith