Re: [Maria-developers] [Commits] Rev 3353: Performed re-factoring and re-structuring of the code for mwl#248: in file:///home/igor/maria/maria-5.5-mwl248-changes/
Davi Arnaut <davi@twitter.com> writes:
On Wed, Aug 1, 2012 at 12:46 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Davi Arnaut <davi@twitter.com> writes:
On Thu, Jul 26, 2012 at 5:50 PM, Igor Babaev <igor@askmonty.org> wrote:
+ bzero(column_stats, sizeof(Column_statistics) * cnt);
That's not allowed for non-POD types.
Can you elaborate on exactly when bzero() (or memset() or similar) is allowed or not allowed?
Ok, so I got time to research this a bit more. The class in question is like this, the original has more members but this is the essential structure: class Column_statistics { private: static const uint Scale_factor_nulls_ratio= 100000; public: Field *min_value; private: ulong nulls_ratio; public: void set_all_nulls() { ... } }; The question is whether it is ok to allocate this with alloc_root() or similar, and initialise it with bzero(pointer, sizeof(Column_statistics)). (The issue of bzero vs. memset is a separate one, not discussed in this mail). Davi points out that this is not a POD. This is because a POD must be a C++ aggregate, and aggregates can not have private members. Unfortunately, it seems the question of whether bzero is allowed for a non-POD, is more difficult to answer. If we look to the standards, the answer is no. It is not well-defined to bzero() any data except integers. This is what the standard says. However, this is unhelpful for the case at hand, as bzero to initialise a simple struct is a well-established practice in systems programming and commonly accepted to be valid. It seems generally accepted that bzero() of a POD is acceptable. Since the only thing that makes Column_statistics not a POD is the presence of private: members, the question becomes: Should bzero initialisation of classes with private members be disallowed? I could not find any definitive reference to support such disallowance. I am not sure why private members would be disallowed in aggretate, and hence POD. My guess is because aggregates may be initialised with an initialiser list, which makes no sense for private member. Or maybe compilers could re-arrance private members to be placed after public members. Or compilers could optimise based on knowledge that no external code can modify a private member (so it does not need to be spilled over a call to pthread_mutex_lock() or similar). But none of this would give a problem with bzero-initialisation after alloc. So I do not know of a language standard/compiler argument to disallow bzero-initialisation of non-POD. ---- But I would still strongly advice to be very careful with such code, and I think Davi's original concern is similar to mine. The problem is that Column_statistics already looks a lot like a nicely encapsulated proper C++ class. It has private members, accessors, it even defines some constants. One could easily imagine that in the future, there will be the temptation to extend it even more so, like add a virtual function - and then we suddenly have a big problem. bzero() is most certainly _NOT_ ok for such class. So I would personally stay well away from this grey area, and rather - _either_ use only a simple struct with no member functions or private or anything like that - and use bzero/memcpy and such. - _or_ do a full class, but then only initialise it with proper call of contructors (whether explicitly or implicitly defined). That makes it much clearer whether a given struct class is one or the other. But I suppose the optimiser code already heavily uses a different strategy. At the least, the declaration of class Column_statistics should have a comment saying that it is initialised with bzero(), so it can not use features not compatible with this. The nice thing about Davi's suggested rule "no bzero of non-POD" is that it is simple, easy to remember, and easy to state. Which is the primary point of coding style guidelines in the first place: make the rules clear, so there is less wasted time discussing what they are. Hope this helps, - Kristian.
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
Or compilers could optimise based on knowledge that no external code can modify a private member (so it does not need to be spilled over a call to pthread_mutex_lock() or similar). But none of this would give a problem with bzero-initialisation after alloc.
Hm, thinking more - I am not sure I was correct with this statement. Consider the following code: class GenId { private: int current; public: int next() { return current++; } } GenId *p= malloc(sizeof(GenId)); memset(p, 0, sizeof(GenId)); p->next(); What is to prevent the compiler from moving the code around, so that the access to GenId::current inside p->next() happens before the memset()? After all, GenId::current is private, there is no code outside of class GenId that is allowed to change it. So this would seem a "correct" optimisation, and the code will run with an uninitialised value for current. This is precisely the kind of problem we want desparately to avoid: A new compiler on a new platform comes out, and it does this optimisation, and some poor seniour developer has to spend days to find a way to reproduce and figure out why some strange case crashes. So I can only agree more with Davi here: let's avoid this. Make all members public. Or use placement new instead of bzero() to initialise. - Kristian.
participants (1)
-
Kristian Nielsen