Re: [Maria-developers] MDEV-13467 review.
One thing i'd still like you to do differently. + if (g->get_class_info()->m_type_id == Geometry::wkb_multipoint) + { + const char point_size= 4 + WKB_HEADER_SIZE + POINT_DATA_SIZE+1; //1 for the type + char point_temp[point_size]; + memset(point_temp+4, Geometry::wkb_point, 1); + memcpy(point_temp+5, static_cast<const Gis_multi_point *>(g)->get_data_ptr()+5, 4); + memcpy(point_temp+4+WKB_HEADER_SIZE, g->get_data_ptr()+4+WKB_ HEADER_SIZE, + POINT_DATA_SIZE); + point_temp[point_size-1]= '\0'; + Geometry_buffer gbuff; + Geometry *gg= Geometry::construct(&gbuff, point_temp, point_size-1);
Gis_multi_point::geometry_n(uint32 num, String *result) stores the point in String *, but I need Geometry* to use further, so still would need some recalculation.
You could call the ::construct with the string->prr() instead of point_temp. Like str->q_append(0); //SRID g->geometry_n(str, 1); gg= Geometry::construct(&gbuff, str->ptr(), str->length()); But I propose an even simpler method. { Gis_pont p; p.set_data_ptr(g->get_data_ptr()+4+WKB_HEADER_SIZE, POINT_DATA_SIZE); calculate_haversine(&p, radius); }
Regarding radius, Daniel has suggestion: when r=NULL, result= NULL, but r=0, how should that be treated? Here is MySQL example (no error, but warning)
There i see that MySQL returns the error when the radius is 0 or below. Right? I think it's ok for us to do same.
In a single function we are calculating radian for x,y, I think it is faster than to call function 2 times for x and y. I doubt so. Really would like to compare the speed here. Still i'm ok with that single function if you like it better.
Best regards. HF On Tue, Mar 2, 2021 at 1:16 PM Anel Husakovic <anel@mariadb.org> wrote:
Hi Alexey, thanks for your review, hope everything is ok now. Here is the new patch 1f618e36577a2a322807 <https://github.com/MariaDB/server/commit/1f618e36577a2a32280705c640d6f9a74861a3c6> according to your review and comments below.
On Sat, Jan 9, 2021 at 2:09 PM Alexey Botchkov <holyfoot@mariadb.com> wrote:
Hi, Anel!
Sorry for that long delay with this review. Had real difficult weeks of my life :( Now it seems to be over, so getting back to normal. Below is my opinions of the code i see in the branch bb-10.1-anel-MDEV-13467-gis-feature-st_sphere-v2 Hope it's the recent version.
+ Adapter for functions that takes two or three arguments. +*/ + +class Create_func_arg2_or_3 : public Create_func +{ Do you think we need the separate Create_fund_arg2_or_3 class? If you don't see any other class to inherit from it, i'd recommend to leave the Create_func_distance_sphere alone. No need to have two constructors in the Item_func_sphere_distance too. Just use the list argument as it's done for the Item_func_json_contains (in 10.2).
Ack ^ droped the Create_func_arg2_or_3 and inherited from Create_native_function which can use a variable number of parameters. Thanks
Anyway i belive we should get rid of duplicating code about the 'param1' and 'param2'. They can be popped/checked with no if (arg_count == 2) condition.
+ double distance= 0.0; + null_value= (args[0]->null_value || args[1]->null_value); + if (null_value) + { + // Returned item must be set to Null + DBUG_ASSERT(maybe_null); don't think it's needed. + return 0; Should be return 0.0 i belive. Or goto handle_errors. Thet goes to all but one 'return' statements in this function.
+ }
... if (!arg1 || !arg2)
This last condition should never happen since we have not-null_value. So i'd remove that 'if' section completely. And anyway it should do the 'return' in that branch.
+ null_value= args[2]->null_value; + if (null_value) + { + return 0; + }
seems nicer to me this way if (args[2]->null_value) { null_value= true; goto handle_errors; }
Ack. Regarding radius, Daniel has suggestion <https://github.com/MariaDB/server/commit/e42b69136a31cb596e940d1ceafffa5c1163fc19#r45124982>: when r=NULL, result= NULL, but r=0, how should that be treated? Here is MySQL <https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=d467903dce23464df0ebf396220da886>example (no error, but warning)
+ if (!(g1= Geometry::construct(&buffer1, arg1->ptr(), arg1->length())) || + !(g2= Geometry::construct(&buffer2, arg2->ptr(), arg2->length()))) + { + goto handle_errors;
we should launch the ER_GIS_INVALID_DATA here.
ER_GIS_INVALID_DATA is introduced in 10.2, I will change this (and test case) when pushing to 10.2, but I would also push to 10.1 since there may be users using it in 10.1. Also when inspecting this error in 10.2 noted that lot tests have commented this error, so will remove that comment in the same commit.
mysql-test/suite/innodb_gis/t/gis.test
mysql-test/suite/innodb_gis/t/1.test
mysql-test/suite/innodb_gis/t/bug16236208.test
mysql-test/suite/innodb_gis/t/precise.test
mysql-test/suite/innodb_gis/t/rtree.test
+ // Generate error message in case different geometry is used?
Yes, i think the explainig message here is good to have.
+double Item_func_sphere_distance::spherical_distance(Geometry *g1, + Geometry *g2, + const double sphere_radius) +double Item_func_sphere_distance::spherical_distance_points(Geometry *g1, + Geometry *g2, + const double r) Why these two are member static functions, not simply static? If you plan any 'external' use for them, please add comments about what exacly they do.
Changed function to be member function only.
+double Item_func_sphere_distance::spherical_distance(Geometry *g1, + Geometry *g2, + const double sphere_radius) +{ + double res= 0.0; + switch (g1->get_class_info()->m_type_id) + { + case Geometry::wkb_point: + res= spherical_distance_points(g1, g2, sphere_radius); + break; + + case Geometry::wkb_multipoint: + res= spherical_distance_points(g1, g2, sphere_radius); + break;
Why do we need this function? We confirmed already that the type is either wkb_point or wkb_multipolygon, so just can call spherical_distance_points. No?
Indeed, left from earlier implementation (v1).
>
+class Item_func_sphere_distance: public Item_real_func +{ + double sphere_radius= 6370986.0; // Default radius equals Earth radius
Some compilers don't like this syntax, and we didn't use it yet, so i'd recomment to move this to the constructor or even better to the ::val_real().
+double Gis_point::calculate_haversine(const Geometry *g, + const double sphere_radius) +{ + DBUG_ASSERT(sphere_radius > 0); + double x1r= 0.0; + double x2r= 0.0; + double y1r= 0.0; + double y2r= 0.0; That lot of unnecessary initializations seems excessive.
Ack
+ if (g->get_class_info()->m_type_id == Geometry::wkb_multipoint) + { + const char point_size= 4 + WKB_HEADER_SIZE + POINT_DATA_SIZE+1; //1 for the type + char point_temp[point_size]; + memset(point_temp+4, Geometry::wkb_point, 1); + memcpy(point_temp+5, static_cast<const Gis_multi_point *>(g)->get_data_ptr()+5, 4); + memcpy(point_temp+4+WKB_HEADER_SIZE, g->get_data_ptr()+4+WKB_HEADER_SIZE, + POINT_DATA_SIZE); + point_temp[point_size-1]= '\0'; + Geometry_buffer gbuff; + Geometry *gg= Geometry::construct(&gbuff, point_temp, point_size-1);
There is the Gis_multi_point::geometry_n() function, should be used here to get the point.
Gis_multi_point::geometry_n(uint32 num, String *result) stores the point in String *, but I need Geometry* to use further, so still would need some recalculation.
I didn’t find a multipoint API to convert String * to Geometry* so I did my own calculation after debugging from row data.
I don't like that lot of static_cast-s.
Isn't it nicer to have just the function of double calculate_harvesine_points(Geometry *p1, Geometry *p2, double radius) double calculate_harvesine_multipoint_point(Geometry *mp, Geometry *p, double radius) double calculate_harvesine_multipoint_multipoint(Geometry *mp1, Geometry *mp2, double radius) which would call each other?
I would prefer to have 2 static member functions and use static cast where needed, if you don't mind. The same could be done with 3 functions but with the code repeated.
The ::get_xy_radian() method looks unneeded to me. It's enough to have the to_radian(double d) function.
In a single function we are calculating radian for x,y, I think it is faster than to call function 2 times for x and y.
BTW it's possible to do the multipoint/multipoint calculations faster than n^2. Not necessary to do it right now, just not to remember.
Regarding this there are some implementations that don't use Multipoints <https://postgis.net/docs/manual-1.4/ST_Distance_Sphere.html> as both arguments, MySQL <https://dev.mysql.com/doc/refman/8.0/en/spatial-convenience-functions.html>also use only Multipoints,Points combination, not MultiPoints,MultiPoints, so we need to decide what to support. However, there are algorithms to calculate max <http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=CA2AE9B932194B2970659C7F0CA610AB?doi=10.1.1.54.5073&rep=rep1&type=pdf> ,min <https://core.ac.uk/download/pdf/82507217.pdf> distance and O(nlong) using convex hull <https://en.wikipedia.org/wiki/Convex_hull> (which we have implemented <https://mariadb.com/kb/en/st_convexhull/>) and using rotating calipers <https://en.wikipedia.org/wiki/Rotating_calipers> from N points from convex hull it could be O(n).
Also there is suggested usage of iteration <https://github.com/MariaDB/server/commit/e42b69136a31cb596e940d1ceafffa5c1163fc19#r45127877>from Daniel.
<https://github.com/MariaDB/server/commit/e42b69136a31cb596e940d1ceafffa5c1163fc19#r45127877>
Best regards. HF
-- *Anel Husaković, * *Software developer* *MariaDB Foundation*
participants (1)
-
Alexey Botchkov