[Maria-developers] [Spatial] On current implementation approach
Hi, I'm going to ask question about how the current Spatial Extensions are implemented. I have spent some time reading the source code in the current trunk (spatial.h|cc, gcal*.h|cc, related Field and Item definitions, etc.), so I have a rough understanding of the overall structure, how the geometry data types are implemented and exposed to SQL, how the spatial functions are defined and registered. I did not looked into details of implementation of geospatial algorithms, but that's too low level for the question I'm going to ask. Initially, I was going to ask very detailed question, listing all the relevant code definitions and asking separately about each of them, but that's not necessary at this stage, I think. Instead, I'm going to simplify and ask about the bigger picture, more about MariaDB extensions API: 1. Is it possible to implement MariaDB extensions like Spatial (custom type + set of functions) without such a tight coupling with the internal implementation of the type system (without messing Field class with geometry types directly, etc.)? 2. Is it possible to implement Spatial using User-Defined Functions (UDF) defined in shared binary? 3. What is the reason behind using Well-Known-Binary (WKB) stream of bytes to transport geometry values into/from functions? Is it due to limitations of MariaDB type system where String is the only universal carrier for complex data? This concern is related to necessity of encoding/decoding WKB when chaining spatial function calls, and possibilities to avoid it. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net "Participation in this whole process is a form of torture" ~~ Szalony
Hi, Mateusz.
1. Is it possible to implement MariaDB extensions like Spatial (custom type + set of functions) without such a tight coupling with the internal implementation of the type system (without messing Field class with geometry types directly, etc.)?
Yes, it is possible. The core algorithms are separated from the Field structure and any other database internals. They are placed in sql/gcalc_slicescan.cc and sql/gcalc_tools.cc files.
2. Is it possible to implement Spatial using User-Defined Functions (UDF) defined in shared binary?
The spatial functions/operations can be implemented with UDF, but that makes query optimization and using Spatial keys problemmatic.
3. What is the reason behind using Well-Known-Binary (WKB) stream of bytes to transport geometry values into/from functions? Is it due to limitations of MariaDB type system where String is the only universal carrier for complex data? This concern is related to necessity of encoding/decoding WKB when chaining spatial function calls, and possibilities to avoid it.
The reason was mostly historical. It was sufficient for the first implementations of the Geometry field types and somewhat convenient as we don't need to perform conversions when we need to import/export features in their WKB representation. But yes, that format is inefficient and difficult to handle properly. I plan to get rid of it internally - only support importing-exporting it. Best regards. HF 23.09.2013 18:46, Mateusz Loskot wrote:
Hi,
I'm going to ask question about how the current Spatial Extensions are implemented. I have spent some time reading the source code in the current trunk (spatial.h|cc, gcal*.h|cc, related Field and Item definitions, etc.), so I have a rough understanding of the overall structure, how the geometry data types are implemented and exposed to SQL, how the spatial functions are defined and registered. I did not looked into details of implementation of geospatial algorithms, but that's too low level for the question I'm going to ask.
Initially, I was going to ask very detailed question, listing all the relevant code definitions and asking separately about each of them, but that's not necessary at this stage, I think.
Instead, I'm going to simplify and ask about the bigger picture, more about MariaDB extensions API:
1. Is it possible to implement MariaDB extensions like Spatial (custom type + set of functions) without such a tight coupling with the internal implementation of the type system (without messing Field class with geometry types directly, etc.)?
2. Is it possible to implement Spatial using User-Defined Functions (UDF) defined in shared binary?
3. What is the reason behind using Well-Known-Binary (WKB) stream of bytes to transport geometry values into/from functions? Is it due to limitations of MariaDB type system where String is the only universal carrier for complex data? This concern is related to necessity of encoding/decoding WKB when chaining spatial function calls, and possibilities to avoid it.
Best regards,
On 23 September 2013 22:10, Alexey Botchkov <holyfoot@askmonty.org> wrote:
1. Is it possible to implement MariaDB extensions like Spatial (custom type + set of functions) without such a tight coupling with the internal implementation of the type system (without messing Field class with geometry types directly, etc.)?
Yes, it is possible. The core algorithms are separated from the Field structure and any other database internals. They are placed in sql/gcalc_slicescan.cc and sql/gcalc_tools.cc files.
Yes, but my question is not really about location of computational geometry bits, but about the data management: SQL data type for geometry objects, input/output routines. Due to my lack of experience with MariaDB/MySQL UDF, I simply assumed that if: 1. Field is the only place that defines GEOMETRY type (and there is no CREATE TYPE support) 2. UDF prototypes will use of GEOMETRY in their prototypes to declare input/output parameters then I couldn't understand how it is possible to remove geometry definitions from Field and other internal definitions. But, I've just found this project [1] with extra spatial UDFs, so I think I understand the UDF protocol regarding I/O arguments would not require explicit GEOMETRY type making it possible to move Spatial Extensions completely out of built-ins (trunk/sql/ files). [1] https://github.com/krandalf75/MySQL-Spatial-UDF
2. Is it possible to implement Spatial using User-Defined Functions (UDF) defined in shared binary?
The spatial functions/operations can be implemented with UDF, but that makes query optimization and using Spatial keys problemmatic.
So, for real use case, the idea I brainstormed above would not make sense. Unless, there is workaround for those problems you mean.
3. What is the reason behind using Well-Known-Binary (WKB) stream of bytes to transport geometry values into/from functions? Is it due to limitations of MariaDB type system where String is the only universal carrier for complex data? This concern is related to necessity of encoding/decoding WKB when chaining spatial function calls, and possibilities to avoid it.
The reason was mostly historical. It was sufficient for the first implementations of the Geometry field types and somewhat convenient as we don't need to perform conversions when we need to import/export features in their WKB representation. But yes, that format is inefficient and difficult to handle properly. I plan to get rid of it internally - only support importing-exporting it.
I roughly understand, but how do you plan to pass geometry data around, in what format? AFAIU, it is not possible to pass user-defined types into/from SQL functions, so geometries would have to be passed as String objects anyway, wouldn't they? IOW, there are only 3 types available (integer, real, string), so String is the only one usable to pass geometry objects around, regardless of actual encoding format, WKB, WKT, any other binary stream... It means, that if I want to pass geometry to my_foo UDF: MSUDF_API char* my_foo(UDF_INIT *initid,UDF_ARGS *args, char *buf, unsigned long *length, char *is_null, char *error); the only option available is to make geometry into a kind of stream of bytes and passed as one of args item. So, a kind of serialising/deserialising is in fact unavoidable. Is my understanding correct? Best regards, -- Mateusz Loskot, http://mateusz.loskot.net "Participation in this whole process is a form of torture" ~~ Szalony
hi again :) i'm not a mariadb team developer, so please consider me as an user/udf developer =] 2013/9/23 Mateusz Loskot <mateusz@loskot.net>
On 23 September 2013 22:10, Alexey Botchkov <holyfoot@askmonty.org> wrote:
1. Is it possible to implement MariaDB extensions like Spatial (custom type + set of functions) without such a tight coupling with the internal implementation of the type system (without messing Field class with geometry types directly, etc.)?
Yes, it is possible. The core algorithms are separated from the Field structure and any other database internals. They are placed in sql/gcalc_slicescan.cc and sql/gcalc_tools.cc files.
Yes, but my question is not really about location of computational geometry bits, but about the data management: SQL data type for geometry objects, input/output routines.
Due to my lack of experience with MariaDB/MySQL UDF, I simply assumed that if: 1. Field is the only place that defines GEOMETRY type (and there is no CREATE TYPE support)
create type probably will be a 10.1 feature: https://mariadb.atlassian.net/browse/MDEV-4912 and maybe you will not have a spatial key optimization in the first version of this feature in my opnion if you start a new udf today with gis, you should use the WKB + a second lib (geos is very good) to handle spatial data geos can use the WKB with a fast "unserialize": GEOSGeomFromWKB_buf
2. UDF prototypes will use of GEOMETRY in their prototypes to declare input/output parameters then I couldn't understand how it is possible to remove geometry definitions from Field and other internal definitions.
But, I've just found this project [1] with extra spatial UDFs, so I think I understand the UDF protocol regarding I/O arguments would not require explicit GEOMETRY type
yes, you don't have a GEOMTRY_TYPE for arg_type[] at udf check your example at your mysql-spatial-udf git project: my_bool msudf_within_init(UDF_INIT *initid,UDF_ARGS *args,char *message) ... * args->arg_type[0] = STRING_RESULT;* ... long long msudf_within(UDF_INIT *initid,UDF_ARGS *args,char *is_null, char *error) ... geom1 = msudf_getGeometry((unsigned char *)args->args[0],args->lengths[0]); set arg_type to STRING_RESULT, and use a cast (unsigned char *) to handle raw geometry data making it possible to move Spatial Extensions completely out of
built-ins (trunk/sql/ files).
mariadb 10.0 have plans about OPENGIS: https://mariadb.com/kb/en/plans-for-10x/#opengis-compliance but i didn't found JIRA report about it, or another worklog or something similar (must check if it's in lauchpad bug track or another lauchpad branch) and i don't know if mariadb will use GEOS... but from what i know, geos is the best opengis lib today, why not use it at mariadb?! =)
2. Is it possible to implement Spatial using User-Defined Functions (UDF) defined in shared binary?
The spatial functions/operations can be implemented with UDF, but that makes query optimization and using Spatial keys problemmatic.
So, for real use case, the idea I brainstormed above would not make sense. Unless, there is workaround for those problems you mean.
well i don't know what problemmatic means at high/low level, but i think it's something like this at sql layer: WHERE udf_function(x) in theory this udf_function() could be optimized with rtree index Y... but it will do a table scan... optimizer don't know how to use index with udf functions yet :( check that some internal functions don't have optimizations too, like: SUBSTRING(indexed_field,1,4)='abcd' could be rewrite as (indexed_field LIKE "abcd%" OR indexed_field='abcd') a workaround about index should be done at application side, could be something like: WHERE udf_function(x) and other_builtin_function_that_use_index(xxxx) with this "other_builtin_function_that_use_index" function (envelop funciont for example) , you could use the spatial index and optimize the query... but it's not the "best" solution at server side, but the only i can think as a udf developer :) well if you know how to code at mysql/mariadb server side... you can patch the optimizer, but i think it's a hard work, optimizer is black magic to me yet =]
3. What is the reason behind using Well-Known-Binary (WKB) stream of bytes to transport geometry values into/from functions? Is it due to limitations of MariaDB type system where String is the only universal carrier for complex data? This concern is related to necessity of encoding/decoding WKB when chaining spatial function calls, and possibilities to avoid it.
The reason was mostly historical. It was sufficient for the first implementations of the Geometry field types and somewhat convenient as we don't need to perform conversions when we need to import/export features in their WKB representation. But yes, that format is inefficient and difficult to handle properly. I plan to get rid of it internally - only support importing-exporting it.
hum... what could be a better format? i think that at disk level (storage) WKB is a small format, not? maybe we could do a better data type? but why? the mysql built-in functions can't handle GIS fast? the RTREE is slow with this datatype? any good point to consider with a new data type?
I roughly understand, but how do you plan to pass geometry data around,
in what format?
AFAIU, it is not possible to pass user-defined types into/from SQL functions, so geometries would have to be passed as String objects anyway, wouldn't they?
well from what i know, mysql "built-in" geometric function use ~wkb format too
IOW, there are only 3 types available (integer, real, string), so String is the only one
usable to pass geometry objects around, regardless of actual encoding
format, WKB, WKT, any other binary stream...
when using GEOMETRIC data type at sql functions, it's always ~WKB from your example at git, internally WKB start at position 4, and SRID is the first 4 bytes, check: GEOSGeom msudf_getGeometry(unsigned char *buf,unsigned int length) { int srid; GEOSGeom geom; geom = GEOSGeomFrom*WKB*_buf(*buf + 4*,length - 4); if (geom != NULL) { srid = *msudf_getInt*(buf); GEOSSetSRID(geom,srid); } return geom; } int *msudf_getInt*(const unsigned char *buf) { int result; unsigned char *p = (unsigned char *)&result; * p[0] = buf[0];* * p[1] = buf[1];* * p[2] = buf[2];* * p[3] = buf[3];* return result; }
It means, that if I want to pass geometry to my_foo UDF:
MSUDF_API char* my_foo(UDF_INIT *initid,UDF_ARGS *args, char *buf, unsigned long *length, char *is_null, char *error);
the only option available is to make geometry into a kind of stream of bytes and passed as one of args item. So, a kind of serialising/deserialising is in fact unavoidable.
unserialize/serialize only if you use different string format than mariadb/mysql source code =) like geos...
Is my understanding correct?
well i think yes :) did you checked that dev.mysql.com have a poll about "What are the top 3 GIS related features that you are most interested in for your new and existing MySQL projects?" maybe we are talking about something in the near future right now... mariadb 10.0 have this 'problem' too check the mariadb 10.0 plans
Best regards, -- Mateusz Loskot, http://mateusz.loskot.net "Participation in this whole process is a form of torture" ~~ Szalony
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
thanks reading this email :) sorry, i can't help a lot here :/ i want some opengis features too, but my apps aren't high read/write yet, parsing the geometry at php (client side) is ok to me well... mariadb team is very good and create many features that mysql don't create in 7 years with just weeks :) like my base64 functions... wait is a option at mariadb world :) at mysql you could forget it hehe :) if you don't have time.... i don't know what to say ='( the best opensource database with many opengis features today is postgresql, sorry mariadb team :( well this project (https://github.com/krandalf75/MySQL-Spatial-UDF) is very interesting, maybe we could port manythings... i will report it at jira to don't forget it -- Roberto Spadim
On 24 September 2013 04:17, Roberto Spadim <roberto@spadim.com.br> wrote:
2013/9/23 Mateusz Loskot <mateusz@loskot.net>
On 23 September 2013 22:10, Alexey Botchkov <holyfoot@askmonty.org> wrote:
1. Is it possible to implement MariaDB extensions like Spatial (custom type + set of functions) without such a tight coupling with the internal implementation of the type system (without messing Field class with geometry types directly, etc.)?
Yes, it is possible. The core algorithms are separated from the Field structure and any other database internals. They are placed in sql/gcalc_slicescan.cc and sql/gcalc_tools.cc files.
Yes, but my question is not really about location of computational geometry bits, but about the data management: SQL data type for geometry objects, input/output routines.
Due to my lack of experience with MariaDB/MySQL UDF, I simply assumed that if: 1. Field is the only place that defines GEOMETRY type (and there is no CREATE TYPE support)
create type probably will be a 10.1 feature: https://mariadb.atlassian.net/browse/MDEV-4912 and maybe you will not have a spatial key optimization in the first version of this feature
This sounds very promising.
in my opnion if you start a new udf today with gis, you should use the WKB + a second lib (geos is very good) to handle spatial data geos can use the WKB with a fast "unserialize": GEOSGeomFromWKB_buf
No, I don't want to go through WKB (Fast? Perhaps, not fast enough, for me :-)), that is the whole point. This is waste of time and will never perform as good as going through structured type straight away. (FYI, I've been part of PostGIS dev team, I've been also member of GEOS dev team, so it's fair to say I'm a bit experienced with spatial database extensions, WKB, coding, encoding).
2. UDF prototypes will use of GEOMETRY in their prototypes to declare input/output parameters then I couldn't understand how it is possible to remove geometry definitions from Field and other internal definitions.
But, I've just found this project [1] with extra spatial UDFs, so I think I understand the UDF protocol regarding I/O arguments would not require explicit GEOMETRY type
yes, you don't have a GEOMTRY_TYPE for arg_type[] at udf check your example at your mysql-spatial-udf git project: [...]
Yes, as I mentioned, I have read through the code of that project, so now it's clear to me what it currently takes to pass geometry into/form spatial functions in MariaDB/MySQL.
making it possible to move Spatial Extensions completely out of built-ins (trunk/sql/ files).
mariadb 10.0 have plans about OPENGIS: https://mariadb.com/kb/en/plans-for-10x/#opengis-compliance
Thanks for this pointer, I'll be watching it.
but i didn't found JIRA report about it, or another worklog or something similar (must check if it's in lauchpad bug track or another lauchpad branch)
I tried to find something on that topic myself too.
and i don't know if mariadb will use GEOS... but from what i know, geos is the best opengis lib today, why not use it at mariadb?! =)
There are alternatives.
2. Is it possible to implement Spatial using User-Defined Functions (UDF) defined in shared binary?
The spatial functions/operations can be implemented with UDF, but that makes query optimization and using Spatial keys problemmatic.
So, for real use case, the idea I brainstormed above would not make sense. Unless, there is workaround for those problems you mean.
well i don't know what problemmatic means at high/low level, but i think it's something like this at sql layer:
WHERE udf_function(x) in theory this udf_function() could be optimized with rtree index Y... but it will do a table scan...
Yes, of course, that's the sensible approach.
optimizer don't know how to use index with udf functions yet :(
Ok, I think I understand it now.
a workaround about index should be done at application side, could be something like: WHERE udf_function(x) and other_builtin_function_that_use_index(xxxx) with this "other_builtin_function_that_use_index" function (envelop funciont for example) , you could use the spatial index and optimize the query... but it's not the "best" solution at server side, but the only i can think as a udf developer :) [...]
Right.
3. What is the reason behind using Well-Known-Binary (WKB) stream of bytes to transport geometry values into/from functions? Is it due to limitations of MariaDB type system where String is the only universal carrier for complex data? This concern is related to necessity of encoding/decoding WKB when chaining spatial function calls, and possibilities to avoid it.
The reason was mostly historical. It was sufficient for the first implementations of the Geometry field types and somewhat convenient as we don't need to perform conversions when we need to import/export features in their WKB representation. But yes, that format is inefficient and difficult to handle properly. I plan to get rid of it internally - only support importing-exporting it.
hum... what could be a better format? i think that at disk level (storage) WKB is a small format, not? [...]
It's not about size, but WKB may lack of some metadata required (i.e. SRID) That's why in PostGIS we have, Extended WKB used internally http://postgis.net/docs/manual-2.0/using_postgis_dbmanagement.html#EWKB_EWKT
IOW, there are only 3 types available (integer, real, string), so String is the only one
usable to pass geometry objects around, regardless of actual encoding format, WKB, WKT, any other binary stream...
when using GEOMETRIC data type at sql functions, it's always ~WKB from your example at git, internally WKB start at position 4, and SRID is the first 4 bytes, check: [...]
Then, this is not WKB format (AFAIU OGC 06-103r3 specification). There is no SRID embedded in OGC WKB and throughout the entire discussion, I have assumed WKB means OGC WKB. Certainly, using custom extended form of OGC WKB solves many problems (see PostGIS manual referred earlier).
It means, that if I want to pass geometry to my_foo UDF:
MSUDF_API char* my_foo(UDF_INIT *initid,UDF_ARGS *args, char *buf, unsigned long *length, char *is_null, char *error);
the only option available is to make geometry into a kind of stream of bytes and passed as one of args item. So, a kind of serialising/deserialising is in fact unavoidable.
unserialize/serialize only if you use different string format than mariadb/mysql source code =) like geos...
Not really, encoding/decoding is necessary whenever you need to access structured geometry objects, means, for every geometry algorithm.
Is my understanding correct?
well i think yes :)
did you checked that dev.mysql.com have a poll about "What are the top 3 GIS related features that you are most interested in for your new and existing MySQL projects?"
http://dev.mysql.com/tech-resources/quickpolls/index.php?pollname=top3-gis-f... Interesting indeed. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net "Participation in this whole process is a form of torture" ~~ Szalony
Hi Mateusz! Well i have many to say but not as a mariadb team, but as a mariadb user, 'hobby' developer, and a DBA and app developer with problems hehe =) Nice to know about geos and postgis devel skills!!! I think there's a big space where you could help mariadb =] Here (mariadb lists) ideas are wellcome (at least some ideas that i posted was accepted or discussed) and patchs are *very* wellcome i send some 'beginner programmer' plugins and they was approved too, for this i used JIRA instead of send patchs to email list... the point about patchs is do a relevant work to community, for example a patch that allow a OTP password login, and not a patch to tell what time is with mysql_connect function (the first example of something that will not be accepted in my mind =] ) I like how mariadb team works, it's better than mysql team (sorry oracle, but mariadb is better here) With your knowledge you could help at opengis plans :) I don't know what you can do, but if you can, please help mariadb =) i really like mysql/mariadb and i don't want a postgresql just for postgis, a better gis feature in mariadb is wellcome in my opnion For example, there is a parallel question about storage plugin, that looks very nice here (mail list) maybe we will have nice features in future, at least a decent discussion is done about what's difficult to implement and what's not Well I will talk like an mariadb user now and report my experience here... Today I'm using JIRA to report feature requests when i think it's relevant, and with a small discuss at maria-discuss mail list After a report a mariadb guy could mark it as won't fix, if anyone read it you can send ONE email to maria-developers list or maybe at IRC channel, asking about your idea and someone will look it and talk "hum nice, relevant feature, let's do it at version xxx, or let's solve it now" i think it's more interesting than a public pool like mysql since the problem start from mariadb users and not from mariadb developers, and developers do the best implementing this features with a good discussion with users, well just my opnion it's the best model to solve software bugs/features Well, i don't know what mariadb team developers will say, and how opengis could help mariadb, but i think you are an important guy here that could help =) don't loose your knowledge, share it with mariadb if possible =) Thanks =) i will leave space to mariadb team, sorry many emails, this topic is interesting to me 2013/9/24 Mateusz Loskot <mateusz@loskot.net>:
On 24 September 2013 04:17, Roberto Spadim <roberto@spadim.com.br> wrote:
2013/9/23 Mateusz Loskot <mateusz@loskot.net>
On 23 September 2013 22:10, Alexey Botchkov <holyfoot@askmonty.org> wrote:
1. Is it possible to implement MariaDB extensions like Spatial (custom type + set of functions) without such a tight coupling with the internal implementation of the type system (without messing Field class with geometry types directly, etc.)?
Yes, it is possible. The core algorithms are separated from the Field structure and any other database internals. They are placed in sql/gcalc_slicescan.cc and sql/gcalc_tools.cc files.
Yes, but my question is not really about location of computational geometry bits, but about the data management: SQL data type for geometry objects, input/output routines.
Due to my lack of experience with MariaDB/MySQL UDF, I simply assumed that if: 1. Field is the only place that defines GEOMETRY type (and there is no CREATE TYPE support)
create type probably will be a 10.1 feature: https://mariadb.atlassian.net/browse/MDEV-4912 and maybe you will not have a spatial key optimization in the first version of this feature
This sounds very promising.
in my opnion if you start a new udf today with gis, you should use the WKB + a second lib (geos is very good) to handle spatial data geos can use the WKB with a fast "unserialize": GEOSGeomFromWKB_buf
No, I don't want to go through WKB (Fast? Perhaps, not fast enough, for me :-)), that is the whole point. This is waste of time and will never perform as good as going through structured type straight away.
(FYI, I've been part of PostGIS dev team, I've been also member of GEOS dev team, so it's fair to say I'm a bit experienced with spatial database extensions, WKB, coding, encoding).
2. UDF prototypes will use of GEOMETRY in their prototypes to declare input/output parameters then I couldn't understand how it is possible to remove geometry definitions from Field and other internal definitions.
But, I've just found this project [1] with extra spatial UDFs, so I think I understand the UDF protocol regarding I/O arguments would not require explicit GEOMETRY type
yes, you don't have a GEOMTRY_TYPE for arg_type[] at udf check your example at your mysql-spatial-udf git project: [...]
Yes, as I mentioned, I have read through the code of that project, so now it's clear to me what it currently takes to pass geometry into/form spatial functions in MariaDB/MySQL.
making it possible to move Spatial Extensions completely out of built-ins (trunk/sql/ files).
mariadb 10.0 have plans about OPENGIS: https://mariadb.com/kb/en/plans-for-10x/#opengis-compliance
Thanks for this pointer, I'll be watching it.
but i didn't found JIRA report about it, or another worklog or something similar (must check if it's in lauchpad bug track or another lauchpad branch)
I tried to find something on that topic myself too.
and i don't know if mariadb will use GEOS... but from what i know, geos is the best opengis lib today, why not use it at mariadb?! =)
There are alternatives.
2. Is it possible to implement Spatial using User-Defined Functions (UDF) defined in shared binary?
The spatial functions/operations can be implemented with UDF, but that makes query optimization and using Spatial keys problemmatic.
So, for real use case, the idea I brainstormed above would not make sense. Unless, there is workaround for those problems you mean.
well i don't know what problemmatic means at high/low level, but i think it's something like this at sql layer:
WHERE udf_function(x) in theory this udf_function() could be optimized with rtree index Y... but it will do a table scan...
Yes, of course, that's the sensible approach.
optimizer don't know how to use index with udf functions yet :(
Ok, I think I understand it now.
a workaround about index should be done at application side, could be something like: WHERE udf_function(x) and other_builtin_function_that_use_index(xxxx) with this "other_builtin_function_that_use_index" function (envelop funciont for example) , you could use the spatial index and optimize the query... but it's not the "best" solution at server side, but the only i can think as a udf developer :) [...]
Right.
3. What is the reason behind using Well-Known-Binary (WKB) stream of bytes to transport geometry values into/from functions? Is it due to limitations of MariaDB type system where String is the only universal carrier for complex data? This concern is related to necessity of encoding/decoding WKB when chaining spatial function calls, and possibilities to avoid it.
The reason was mostly historical. It was sufficient for the first implementations of the Geometry field types and somewhat convenient as we don't need to perform conversions when we need to import/export features in their WKB representation. But yes, that format is inefficient and difficult to handle properly. I plan to get rid of it internally - only support importing-exporting it.
hum... what could be a better format? i think that at disk level (storage) WKB is a small format, not? [...]
It's not about size, but WKB may lack of some metadata required (i.e. SRID) That's why in PostGIS we have, Extended WKB used internally
http://postgis.net/docs/manual-2.0/using_postgis_dbmanagement.html#EWKB_EWKT
IOW, there are only 3 types available (integer, real, string), so String is the only one
usable to pass geometry objects around, regardless of actual encoding format, WKB, WKT, any other binary stream...
when using GEOMETRIC data type at sql functions, it's always ~WKB from your example at git, internally WKB start at position 4, and SRID is the first 4 bytes, check: [...]
Then, this is not WKB format (AFAIU OGC 06-103r3 specification). There is no SRID embedded in OGC WKB and throughout the entire discussion, I have assumed WKB means OGC WKB.
Certainly, using custom extended form of OGC WKB solves many problems (see PostGIS manual referred earlier).
It means, that if I want to pass geometry to my_foo UDF:
MSUDF_API char* my_foo(UDF_INIT *initid,UDF_ARGS *args, char *buf, unsigned long *length, char *is_null, char *error);
the only option available is to make geometry into a kind of stream of bytes and passed as one of args item. So, a kind of serialising/deserialising is in fact unavoidable.
unserialize/serialize only if you use different string format than mariadb/mysql source code =) like geos...
Not really, encoding/decoding is necessary whenever you need to access structured geometry objects, means, for every geometry algorithm.
Is my understanding correct?
well i think yes :)
did you checked that dev.mysql.com have a poll about "What are the top 3 GIS related features that you are most interested in for your new and existing MySQL projects?"
http://dev.mysql.com/tech-resources/quickpolls/index.php?pollname=top3-gis-f...
Interesting indeed.
Best regards, -- Mateusz Loskot, http://mateusz.loskot.net "Participation in this whole process is a form of torture" ~~ Szalony
-- Roberto Spadim SPAEmpresarial
On 24 September 2013 19:28, Roberto Spadim <roberto@spadim.com.br> wrote:
Here (mariadb lists) ideas are wellcome (at least some ideas that i posted was accepted or discussed) and patchs are *very* wellcome
Roberto, Good to know. I will use JIRA if I ever have anything relevant.
[...] I like how mariadb team works, it's better than mysql team (sorry oracle, but mariadb is better here)
I have no idea about differences between the two development models and frankly, as long as the source code is open, I don't really care who drives the upstream wheel.
With your knowledge you could help at opengis plans :) I don't know what you can do, but if you can, please help mariadb =)
For now, I'm just learning, thus I appreciate your comments.
Well I will talk like an mariadb user now and report my experience here... Today I'm using JIRA to report feature requests when i think it's relevant, and with a small discuss at maria-discuss mail list After a report a mariadb guy could mark it as won't fix, if anyone read it you can send ONE email to maria-developers list or maybe at IRC channel, asking about your idea and someone will look it and talk "hum nice, relevant feature, let's do it at version xxx, or let's solve it now" i think it's more interesting than a public pool like mysql since the problem start from mariadb users and not from mariadb developers, and developers do the best implementing this features with a good discussion with users, well just my opnion it's the best model to solve software bugs/features
I understand. After initial confusion, which mailing list to choose, I deliberately decided to target MariaDB list with my questions. I may repeat the same discussion roll on MySQL one, but I wouldn't relate it MariaDB plans. Certainly, as both projects share the same (initial) codebase, it's nothing wrong to ask both camps about their plans regarding features of interest :-)
Thanks =) i will leave space to mariadb team, sorry many emails, this topic is interesting to me
I appreciate your comments Roberto, thanks. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net "Participation in this whole process is a form of torture" ~~ Szalony
Yes, but my question is not really about location of computational geometry bits, but about the data management: SQL data type for geometry objects, input/output routines.
That what i meant. That code do not use any Field data structures.
1. Field is the only place that defines GEOMETRY type (and there is no CREATE TYPE support)
For the Field the GEOMETRY is just the string of bytes.
2. UDF prototypes will use of GEOMETRY in their prototypes to declare input/output parameters then I couldn't understand how it is possible to remove geometry definitions from Field and other internal definitions.
That what i guess i missed in your question. Yes if we remove the GEOMETRY from the server, there's no way to specify the GEOMETRY type for the UDF. Though instead we can specify strings as parameters and treat them as GEOMETRY inside. Best regards. HF 24.09.2013 4:38, Mateusz Loskot wrote:
On 23 September 2013 22:10, Alexey Botchkov <holyfoot@askmonty.org> wrote:
1. Is it possible to implement MariaDB extensions like Spatial (custom type + set of functions) without such a tight coupling with the internal implementation of the type system (without messing Field class with geometry types directly, etc.)?
Yes, it is possible. The core algorithms are separated from the Field structure and any other database internals. They are placed in sql/gcalc_slicescan.cc and sql/gcalc_tools.cc files. Yes, but my question is not really about location of computational geometry bits, but about the data management: SQL data type for geometry objects, input/output routines.
Due to my lack of experience with MariaDB/MySQL UDF, I simply assumed that if: 1. Field is the only place that defines GEOMETRY type (and there is no CREATE TYPE support) 2. UDF prototypes will use of GEOMETRY in their prototypes to declare input/output parameters then I couldn't understand how it is possible to remove geometry definitions from Field and other internal definitions.
But, I've just found this project [1] with extra spatial UDFs, so I think I understand the UDF protocol regarding I/O arguments would not require explicit GEOMETRY type making it possible to move Spatial Extensions completely out of built-ins (trunk/sql/ files).
[1] https://github.com/krandalf75/MySQL-Spatial-UDF
2. Is it possible to implement Spatial using User-Defined Functions (UDF) defined in shared binary?
The spatial functions/operations can be implemented with UDF, but that makes query optimization and using Spatial keys problemmatic. So, for real use case, the idea I brainstormed above would not make sense. Unless, there is workaround for those problems you mean.
3. What is the reason behind using Well-Known-Binary (WKB) stream of bytes to transport geometry values into/from functions? Is it due to limitations of MariaDB type system where String is the only universal carrier for complex data? This concern is related to necessity of encoding/decoding WKB when chaining spatial function calls, and possibilities to avoid it.
The reason was mostly historical. It was sufficient for the first implementations of the Geometry field types and somewhat convenient as we don't need to perform conversions when we need to import/export features in their WKB representation. But yes, that format is inefficient and difficult to handle properly. I plan to get rid of it internally - only support importing-exporting it. I roughly understand, but how do you plan to pass geometry data around, in what format?
AFAIU, it is not possible to pass user-defined types into/from SQL functions, so geometries would have to be passed as String objects anyway, wouldn't they? IOW, there are only 3 types available (integer, real, string), so String is the only one usable to pass geometry objects around, regardless of actual encoding format, WKB, WKT, any other binary stream...
It means, that if I want to pass geometry to my_foo UDF:
MSUDF_API char* my_foo(UDF_INIT *initid,UDF_ARGS *args, char *buf, unsigned long *length, char *is_null, char *error);
the only option available is to make geometry into a kind of stream of bytes and passed as one of args item. So, a kind of serialising/deserialising is in fact unavoidable.
Is my understanding correct?
Best regards,
On 25 September 2013 19:27, Alexey Botchkov <holyfoot@askmonty.org> wrote:
Yes, but my question is not really about location of computational geometry bits, but about the data management: SQL data type for geometry objects, input/output routines.
That what i meant. That code do not use any Field data structures.
Gotcha,
1. Field is the only place that defines GEOMETRY type (and there is no CREATE TYPE support)
For the Field the GEOMETRY is just the string of bytes.
Right.
2. UDF prototypes will use of GEOMETRY in their prototypes to declare input/output parameters then I couldn't understand how it is possible to remove geometry definitions from Field and other internal definitions.
That what i guess i missed in your question. Yes if we remove the GEOMETRY from the server, there's no way to specify the GEOMETRY type for the UDF. Though instead we can specify strings as parameters and treat them as GEOMETRY inside.
That completes the whole picture for me, thanks for the clarifications. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net "Participation in this whole process is a form of torture" ~~ Szalony
Hi mateusz, i'm a user and a hobby developer... i will answer with what i know 2013/9/23 Mateusz Loskot <mateusz@loskot.net>
Hi,
I'm going to ask question about how the current Spatial Extensions are implemented. I have spent some time reading the source code in the current trunk (spatial.h|cc, gcal*.h|cc, related Field and Item definitions, etc.), so I have a rough understanding of the overall structure, how the geometry data types are implemented and exposed to SQL, how the spatial functions are defined and registered. I did not looked into details of implementation of geospatial algorithms, but that's too low level for the question I'm going to ask.
Initially, I was going to ask very detailed question, listing all the relevant code definitions and asking separately about each of them, but that's not necessary at this stage, I think.
Instead, I'm going to simplify and ask about the bigger picture, more about MariaDB extensions API:
1. Is it possible to implement MariaDB extensions like Spatial (custom type + set of functions) without such a tight coupling with the internal implementation of the type system (without messing Field class with geometry types directly, etc.)?
i think this is something interesting, check this idea: https://mariadb.atlassian.net/browse/MDEV-4912 well this MDEV is too far from today (it's an idea to mariadb 10.1 we are at 10.0.5 today...) forget this MDEV... you can extend mariadb with your hands too :)
2. Is it possible to implement Spatial using User-Defined Functions (UDF) defined in shared binary?
yes, the field type probably will be a WKB value and a STRING/BLOB (char) type, must check i never used a UDF for spatial data
3. What is the reason behind using Well-Known-Binary (WKB) stream of bytes to transport geometry values into/from functions? Is it due to limitations of MariaDB type system where String is the only universal carrier for complex data? This concern is related to necessity of encoding/decoding WKB when chaining spatial function calls, and possibilities to avoid it.
internally (at memory / disk) you have a string (for spatial type it's a WKB format string), and at run/compile/design time you have a class (many functions) that handle this type (string in this case) check that at cpu we have only primitive types (java work with something similar to this too), in other words at memory we have string/uint/boolean/structs/etc, there's no 'geometric' type at hardware level... if you have contact with microcontrollers or linux kernel source or gcc source, you will know how a C++ language is 'converted' to a assembler/machine language, and you will understand how a C++ object is implemented at low level... in "low words": a class in any computer language is: "protected" memory + "protected" functions that handle this memory about why use WKB or other type? well i don't know if we have hardware (cpu) optimizations here, but at least we have a small memory footprint (and probably a smaller cpu consume), some languages/cpu have optimizations to handle for example a math function using int*int, check about SIMD at wikipedia (http://en.wikipedia.org/wiki/SIMD), or maybe you remember the pentium with "MMX", it's a hardware optimization for some especific operations, check SSE, SSE2, etc... the goal is a faster execution, it's not a limit at mysql/mariadb source code, but a optimization knowing how a C/C++ language work at low level is the best way to optimize memory and cpu
Best regards, -- Mateusz Loskot, http://mateusz.loskot.net "Participation in this whole process is a form of torture" ~~ Szalony
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
well i don't know if every thing i wrote is ok hehe, but it's a high level guide line :) -- Roberto Spadim
On 24 September 2013 02:49, Roberto Spadim <roberto@spadim.com.br> wrote:
2013/9/23 Mateusz Loskot <mateusz@loskot.net>
I'm going to ask question about how the current Spatial Extensions are implemented. I have spent some time reading the source code in the current trunk (spatial.h|cc, gcal*.h|cc, related Field and Item definitions, etc.), so I have a rough understanding of the overall structure, how the geometry data types are implemented and exposed to SQL, how the spatial functions are defined and registered. I did not looked into details of implementation of geospatial algorithms, but that's too low level for the question I'm going to ask.
Initially, I was going to ask very detailed question, listing all the relevant code definitions and asking separately about each of them, but that's not necessary at this stage, I think.
Instead, I'm going to simplify and ask about the bigger picture, more about MariaDB extensions API:
1. Is it possible to implement MariaDB extensions like Spatial (custom type + set of functions) without such a tight coupling with the internal implementation of the type system (without messing Field class with geometry types directly, etc.)?
i think this is something interesting, check this idea: https://mariadb.atlassian.net/browse/MDEV-4912 well this MDEV is too far from today (it's an idea to mariadb 10.1 we are at 10.0.5 today...) forget this MDEV... you can extend mariadb with your hands too :)
This something interesting indeed.
2. Is it possible to implement Spatial using User-Defined Functions (UDF) defined in shared binary?
yes, the field type probably will be a WKB value and a STRING/BLOB (char) type, must check i never used a UDF for spatial data
That answers my question. So, it also means serialising/deserialising is unavoidable.
3. What is the reason behind using Well-Known-Binary (WKB) stream of bytes to transport geometry values into/from functions? Is it due to limitations of MariaDB type system where String is the only universal carrier for complex data? This concern is related to necessity of encoding/decoding WKB when chaining spatial function calls, and possibilities to avoid it.
internally (at memory / disk) you have a string (for spatial type it's a WKB format string), and at run/compile/design time you have a class (many functions) that handle this type (string in this case) [...]
I do understand those principles. My question however along this lines: is there a type system in MariaDB/MySQL which allows to avoid going through WKB stream as the only input/output data into/from functions? IOW, instead of slinging WKB around the spatial functions, optimally if it was possible to pass ready to use structured objects. See this PostGIS wiki page about LWGEOM which is a working structure for all spatial functions: http://trac.osgeo.org/postgis/wiki/DevWikiPostGISCoding So, instead of (pseudo-code): foo(String s1, String s2) { g1 = decode_geometry_from_wkb(s1) g2 = decode_geometry_from_wkb(s2); g3 = call_fancy_algorithm(g1, g2); return encode_geometry_to_wkb(g3) } optimal protocol is this: foo(MyGeometry g1, MyGeometry g2) { g3 = call_fancy_algorithm(g1, g2); return g3; } Is his feasible in MariaDB extensions? Best regards, -- Mateusz Loskot, http://mateusz.loskot.net "Participation in this whole process is a form of torture" ~~ Szalony
participants (3)
-
Alexey Botchkov
-
Mateusz Loskot
-
Roberto Spadim