[Maria-developers] Refactoring in LEX_STRING and LEX_CSTRING
Hello all, Monty is now doing refactoring: 1. Replacing a lot of LEX_STRING to LEX_CSTRING. 2. Fixing functions and methods that have arguments of types: a. pointer "const LEX_STRING *name" and b. reference "const LEX_STRING &name". to the same style. Monty chose passing by pointer (2a). I think that: - #1 is good. - #2 is good, in the sense of changing to the same style. But I think that the choice of "passing by pointer" in #2 is bad. Passing by reference for LEX_STRING/LEX_CSTRING related things would be better. With the "passing by pointer" approach whenever one needs to pass a LEX_STRING to a function accepting LEX_CSTRING, one will have to do: void func1(const LEX_STRING *str) { LEX_CSTRING tmp= {str->str, str->length}; func2(&tmp); } I'd propose two things instead: 1. Add a class: class Lex_cstring: public LEX_CSTRING { public: Lex_cstring() {} /*Uninitialized */ Lex_cstring(const char *str, size_t length) { LEX_STRING::str= str; LEX_STRING::length= length; } Lex_cstring(const LEX_STRING *str) { LEX_STRING::str= str->str; LEX_STRING::length= str->length; } }; 2. Instead of "const LEX_CSTRING *str" we use "const Lex_cstring &str" all around C++ code. So the above example with passing a LEX_STRING to something that expect a LEX_CSTRING would be simplified to void func1(const LEX_STRING *str) { func2(str); } 3. In some classes now we have duplicate methods and constructors like this: class XXX { public: XXX(const char *str, size_t length) { ... } XXX(const LEX_STRING *str) { ... } void method(const char *str, size_t length) {...} void method(const LEX_STRING *str) {...} }; to make it possible to call like this: new XXX(str, length); new XXX(lex_str_ptr) new XXX(&lex_str) method(str, length); method(lex_str_ptr); method(&lex_str); We change this to "passing by reference" and remove duplicates: class XXX { public: XXX(const Lex_cstring &str) { ... } void method(const Lex_cstring &str) {...} }; After this change we can call like this: new XXX(Lex_cstring(str, length)); new XXX(*lex_str_ptr) new XXX(lex_str); method(Lex_cstring(str, length)); method(*lex_str_ptr); method(lex_str); Notice, not needs for separate methods accepting two arguments "const char *str, size_t length".
Hi! I would like to add a few more arguments in favour of using const references and not const pointers. I do not endorse non-const references, that makes the code silly and confusing a lot of the times. When we need output or input-output parameters we should keep using pointers. For strictly mandatory (must never be null) input parameters, const references are the way to go, unless there is a strong reason not to. The guarantee that references enforce is "never null". This is a very important and useful feature of C++. We can completely skip any checks such as: func1(LEX_CSTRING *str) { if (!str) { .. } } Or DBUG_ASSERTS such as: func1(LEX_CSTRING *str) { DBUG_ASSERT(str); ... } The code with *const references* is self documenting in this sense. It makes it very clear that the function does not own the object it is being passed (no questions wether the pointer should be freed or not) and that the reference is valid. Other arguments, not strictly related to this use case, but are in favor of const references: * By limiting our use of references to const references we avoid the ambiguity between regular pass-by-value arguments and non-const reference arguments. You work with them as if they are pass-by-value. Since they are const you can't do anything to them anyway, you're not changing any of the caller's data. * Pointers are unclear wether they point to a buffer array or a single item. References do not allow you to do: ptr[0], ptr[1], ptr[2], etc. * If we chose to make use of STL algorithms or C++11 features down the line, references tend to work easier than pointers. Can't come up with a good example right this moment, but most stl algorithms expect reference-based parameters, ex: std::max(*int_ptr_a, *int_ptr_b) vs std::max(int_a, int_b) * If some legacy code does need a pointer parameter, using the unary & operator on the reference will produce the required pointer type. If we are doing refactoring anyway, I strongly request we reconsider the policy of avoiding references completely. I suggest we at least make use of them as input parameters to functions and methods in new code. Vicențiu On Thu, 20 Apr 2017 at 00:10 Alexander Barkov <bar@mariadb.org> wrote:
Hello all,
Monty is now doing refactoring:
1. Replacing a lot of LEX_STRING to LEX_CSTRING.
2. Fixing functions and methods that have arguments of types: a. pointer "const LEX_STRING *name" and b. reference "const LEX_STRING &name". to the same style. Monty chose passing by pointer (2a).
I think that:
- #1 is good. - #2 is good, in the sense of changing to the same style.
But I think that the choice of "passing by pointer" in #2 is bad. Passing by reference for LEX_STRING/LEX_CSTRING related things would be better.
With the "passing by pointer" approach whenever one needs to pass a LEX_STRING to a function accepting LEX_CSTRING, one will have to do:
void func1(const LEX_STRING *str) { LEX_CSTRING tmp= {str->str, str->length}; func2(&tmp); }
I'd propose two things instead:
1. Add a class:
class Lex_cstring: public LEX_CSTRING { public: Lex_cstring() {} /*Uninitialized */ Lex_cstring(const char *str, size_t length) { LEX_STRING::str= str; LEX_STRING::length= length; } Lex_cstring(const LEX_STRING *str) { LEX_STRING::str= str->str; LEX_STRING::length= str->length; } };
2. Instead of "const LEX_CSTRING *str" we use "const Lex_cstring &str" all around C++ code.
So the above example with passing a LEX_STRING to something that expect a LEX_CSTRING would be simplified to
void func1(const LEX_STRING *str) { func2(str); }
3. In some classes now we have duplicate methods and constructors like this:
class XXX { public: XXX(const char *str, size_t length) { ... } XXX(const LEX_STRING *str) { ... } void method(const char *str, size_t length) {...} void method(const LEX_STRING *str) {...} };
to make it possible to call like this:
new XXX(str, length); new XXX(lex_str_ptr) new XXX(&lex_str) method(str, length); method(lex_str_ptr); method(&lex_str);
We change this to "passing by reference" and remove duplicates:
class XXX { public: XXX(const Lex_cstring &str) { ... } void method(const Lex_cstring &str) {...} };
After this change we can call like this:
new XXX(Lex_cstring(str, length)); new XXX(*lex_str_ptr) new XXX(lex_str); method(Lex_cstring(str, length)); method(*lex_str_ptr); method(lex_str);
Notice, not needs for separate methods accepting two arguments "const char *str, size_t length".
Hi! On 20 Apr 2017 06:10, "Alexander Barkov" <bar@mariadb.org> wrote: Hello all, Monty is now doing refactoring: 1. Replacing a lot of LEX_STRING to LEX_CSTRING. 2. Fixing functions and methods that have arguments of types: a. pointer "const LEX_STRING *name" and b. reference "const LEX_STRING &name". to the same style. Monty chose passing by pointer (2a). I think that: - #1 is good. - #2 is good, in the sense of changing to the same style. But I think that the choice of "passing by pointer" in #2 is bad. Passing by reference for LEX_STRING/LEX_CSTRING related things would be better. With the "passing by pointer" approach whenever one needs to pass a LEX_STRING to a function accepting LEX_CSTRING, one will have to do: void func1(const LEX_STRING *str) { LEX_CSTRING tmp= {str->str, str->length}; func2(&tmp); } The above is not a problem: - We don't have less than 5 places in the code where LEX_STRING is changed to LEX_CSTRING. - In these places one can do this with a simple cast, nothing elaborated. I'd propose two things instead: 1. Add a class: No reason to make a complex solution for a problem that doesn't exist. 2. Instead of "const LEX_CSTRING *str" we use "const Lex_cstring &str" all around C++ code. No, no, no. I do not want to use & anywhere as arguments in the code. It hides things and lends people to do errors like we had before where they declared some functions to take full structure she references in other places, which creates a mess. Regards, Monty
Hi Monty, On 04/20/2017 09:35 AM, Michael Widenius wrote:
Hi!
On 20 Apr 2017 06:10, "Alexander Barkov" <bar@mariadb.org <mailto:bar@mariadb.org>> wrote:
Hello all,
Monty is now doing refactoring:
1. Replacing a lot of LEX_STRING to LEX_CSTRING.
2. Fixing functions and methods that have arguments of types: a. pointer "const LEX_STRING *name" and b. reference "const LEX_STRING &name". to the same style. Monty chose passing by pointer (2a).
I think that:
- #1 is good. - #2 is good, in the sense of changing to the same style.
But I think that the choice of "passing by pointer" in #2 is bad. Passing by reference for LEX_STRING/LEX_CSTRING related things would be better.
With the "passing by pointer" approach whenever one needs to pass a LEX_STRING to a function accepting LEX_CSTRING, one will have to do:
void func1(const LEX_STRING *str) { LEX_CSTRING tmp= {str->str, str->length}; func2(&tmp); }
The above is not a problem:
- We don't have less than 5 places in the code where LEX_STRING is changed to LEX_CSTRING. - In these places one can do this with a simple cast, nothing elaborated.
There are more examples: Item_str_func(THD *thd): Item_func(thd) { decimals=NOT_FIXED_DEC; } Item_str_func(THD *thd, Item *a): Item_func(thd, a) {decimals=NOT_FIXED_DEC; } Item_str_func(THD *thd, Item *a, Item *b): Item_func(thd, a, b) { decimals=NOT_FIXED_DEC; } Item_str_func(THD *thd, Item *a, Item *b, Item *c): Item_func(thd, a, b, c) { decimals=NOT_FIXED_DEC; } Item_str_func(THD *thd, Item *a, Item *b, Item *c, Item *d): Item_func(thd, a, b, c, d) { decimals=NOT_FIXED_DEC; } Item_str_func(THD *thd, Item *a, Item *b, Item *c, Item *d, Item* e): Item_func(thd, a, b, c, d, e) { decimals=NOT_FIXED_DEC; } Item_str_func(THD *thd, List<Item> &list): Item_func(thd, list) { decimals=NOT_FIXED_DEC; } This is terrible. The same thing in Item_int_func, Item_real_func, Item_decimal_func, etc, etc. There is only one constructor needed for all of them. For example, for Item_str_func: Item_str_func(THD *thd, const Item_args &args): Item_func(thd, args) { decimals=NOT_FIXED_DEC; } And when you need to call new, you do: Item *item= new (thd->mem_root) Item_func_somefunc(thd, Item_args(a,b,c)); You cannot do this with the "pass by pointer" approach.
I'd propose two things instead:
1. Add a class:
No reason to make a complex solution for a problem that doesn't exist.
It's a trivial solution. The problem does exists. The current code absolutely is excessive and inconvenient to use.
2. Instead of "const LEX_CSTRING *str" we use "const Lex_cstring &str" all around C++ code.
No, no, no.
I do not want to use & anywhere as arguments in the code. It hides things and lends people to do errors like we had before where they declared some functions to take full structure she references in other places, which creates a mess.
I don't propose to use pass-by-value: func(Some_class arg); I propose to use pass-by-reference: func(const Some_class &) When you do: func(some_class_instance); pass-by-value requires a constructor call, indeed. but pass-by-reference does not need a constructor call! pass-by-reference compiles to exactly the same thing with pass-by-pointer: it puts the address of the object to the stack. Nothing else.
Regards, Monty
Hi, Monty, Marko, Vicentiu, thanks for your replies. I collected all input to: https://jira.mariadb.org/browse/TODO-916 Please have a look. Input from other developers would be very welcome! Thanks. On 04/20/2017 09:57 AM, Alexander Barkov wrote: <cut>
I'd propose two things instead:
1. Add a class:
No reason to make a complex solution for a problem that doesn't exist.
It's a trivial solution.
The problem does exists. The current code absolutely is excessive and inconvenient to use.
We can go without a class actually. As Marko mentioned, there is another safer option, to have conversion functions. See an example to_lex_cstring() in https://jira.mariadb.org/browse/TODO-916 So there is no a need for "class Lex_cstring". Using just "const LEX_CSTING &name" to pass names should be very good. <cut>
Hi Monty, I think that we should follow the modern conventions here. The Google style guide (which the MySQL group at Oracle is trying to follow) says that it is OK to pass references to const objects, but it generally forbids references to non-const: https://google.github.io/styleguide/cppguide.html#Reference_Arguments It also gives examples on when pointers to const could be preferred over references to const. Like Bar noted, it is a well-known fact that using const references allows some compiler optimizations. Here is some discussion: http://stackoverflow.com/questions/13318257/const-reference-to-temporary-vs-... On a related note, I hope that 10.3 can switch the language dialect to C++11, and that we can have a session on modern C++ in the next developer meeting. I would like to contribute some detailed comments below. On Thu, Apr 20, 2017 at 8:35 AM, Michael Widenius < michael.widenius@gmail.com> wrote:
On 20 Apr 2017 06:10, "Alexander Barkov" <bar@mariadb.org> wrote:
With the "passing by pointer" approach whenever one needs to pass a LEX_STRING to a function accepting LEX_CSTRING, one will have to do:
void func1(const LEX_STRING *str) { LEX_CSTRING tmp= {str->str, str->length}; func2(&tmp); }
The above is not a problem:
- We don't have less than 5 places in the code where LEX_STRING is changed to LEX_CSTRING. - In these places one can do this with a simple cast, nothing elaborated.
My preference for cases like this would be an inline wrapper function, even with the same name as the wrapped function (different call signature). Simple inline functions do not bloat the generated code (provided that they do not do any stupid things such as copying large objects), and they also keep the source code tidier. With wrapper functions, the number of type casts can be minimized. Type casts are inherently dangerous, especially the C-style casts. By the way, C++11 introduces the typename{} cast where information cannot be lost, e.g., long a{x} is fine if x can be converted to long without loss, but throws an error if not (say, sizeof(x) > sizeof(a)).' For the "but C is more efficient than C++" crowd, I would recommend watching the CppCon 2016 presentation "Rich code for tiny machines" https://www.youtube.com/watch?v=zBkNBP00wJE and looking at the code generated from https://github.com/lefticus/x86-to-6502/blob/master/examples/pong.cpp Also, you can have a look at the puzzle solvers http://www.iki.fi/~msmakela/software/pentomino/ that I recently converted from C to modern C++. To my surprise, the C++ version compiled by clang -O3 was slightly faster than the C version compiled by GCC. So, lightweight objects do not necessarily add any overhead. While I was doing the conversion to C++, I fixed a bug in the symmetry reduction of the 3D puzzle solver. This demonstrates that breaking the problem into more manageable pieces (abstract data types) can improve code quality. If it is done properly, any run-time overhead can be avoided. we use "const Lex_cstring &str" all around C++ code.
2. Instead of "const LEX_CSTRING *str" No, no, no.
I do not want to use & anywhere as arguments in the code. It hides things and lends people to do errors like we had before where they declared some functions to take full structure she references in other places, which creates a mess.
The Google coding style generally agrees with you, by forbidding references to non-const objects. I cannot see any harm in passing references to const. (Of course, the code must be well-written: the passed object must not be modified, not even via anything that casts the constness away and then modifies the object. const_cast is only OK to use when the object is truly not modified, and C-style casts should be avoided altogether in C++ code.) Best regards, Marko
participants (4)
-
Alexander Barkov
-
Marko Mäkelä
-
Michael Widenius
-
Vicențiu Ciorbaru