New subject: [Maria-developers] [Commits] Rev 3353: Performed re-factoring and re-structuring of the code for mwl#248: in file:///home/igor/maria/maria-5.5-mwl248-changes/

13 Aug 2012

      Davi Arnaut <davi@twitter.com> writes:
...
On Wed, Aug 1, 2012 at 12:46 AM, Kristian Nielsen
<knielsen@knielsen-hq.org> wrote:
...
Davi Arnaut <davi@twitter.com> writes:
...
On Thu, Jul 26, 2012 at 5:50 PM, Igor Babaev <igor@askmonty.org> wrote:
...
+  bzero(column_stats, sizeof(Column_statistics) * cnt);
That's not allowed for non-POD types.
Can you elaborate on exactly when bzero() (or memset() or similar) is allowed
or not allowed?
Ok, so I got time to research this a bit more.

The class in question is like this, the original has more members but this is
the essential structure:

    class Column_statistics
    {
    private:
      static const uint Scale_factor_nulls_ratio= 100000;
    public:
      Field *min_value; 
    private:
      ulong nulls_ratio;
    public:
      void set_all_nulls() { ... }
    };

The question is whether it is ok to allocate this with alloc_root() or
similar, and initialise it with bzero(pointer, sizeof(Column_statistics)).
(The issue of bzero vs. memset is a separate one, not discussed in this mail).

Davi points out that this is not a POD. This is because a POD must be a C++
aggregate, and aggregates can not have private members.

Unfortunately, it seems the question of whether bzero is allowed for a
non-POD, is more difficult to answer.

If we look to the standards, the answer is no. It is not well-defined to
bzero() any data except integers. This is what the standard says. However,
this is unhelpful for the case at hand, as bzero to initialise a simple struct
is a well-established practice in systems programming and commonly accepted to
be valid.

It seems generally accepted that bzero() of a POD is acceptable. Since the
only thing that makes Column_statistics not a POD is the presence of private:
members, the question becomes: Should bzero initialisation of classes with
private members be disallowed?

I could not find any definitive reference to support such disallowance.

I am not sure why private members would be disallowed in aggretate, and hence
POD. My guess is because aggregates may be initialised with an initialiser
list, which makes no sense for private member. Or maybe compilers could
re-arrance private members to be placed after public members. Or compilers
could optimise based on knowledge that no external code can modify a private
member (so it does not need to be spilled over a call to pthread_mutex_lock()
or similar). But none of this would give a problem with bzero-initialisation
after alloc.

So I do not know of a language standard/compiler argument to disallow
bzero-initialisation of non-POD.

----

But I would still strongly advice to be very careful with such code, and I
think Davi's original concern is similar to mine. The problem is that
Column_statistics already looks a lot like a nicely encapsulated proper C++
class. It has private members, accessors, it even defines some constants. One
could easily imagine that in the future, there will be the temptation to
extend it even more so, like add a virtual function - and then we suddenly
have a big problem. bzero() is most certainly _NOT_ ok for such class.

So I would personally stay well away from this grey area, and rather

 - _either_ use only a simple struct with no member functions or private or
   anything like that - and use bzero/memcpy and such.

 - _or_ do a full class, but then only initialise it with proper call of
   contructors (whether explicitly or implicitly defined).

That makes it much clearer whether a given struct class is one or the other.
But I suppose the optimiser code already heavily uses a different strategy.

At the least, the declaration of class Column_statistics should have a comment
saying that it is initialised with bzero(), so it can not use features not
compatible with this.

The nice thing about Davi's suggested rule "no bzero of non-POD" is that it is
simple, easy to remember, and easy to state. Which is the primary point of
coding style guidelines in the first place: make the rules clear, so there is
less wasted time discussing what they are.

Hope this helps,

 - Kristian.

Re: [Maria-developers] [Commits] Rev 3353: Performed re-factoring and re-structuring of the code for mwl#248: in file:///home/igor/maria/maria-5.5-mwl248-changes/

Kristian Nielsen

Kristian Nielsen

tags

participants (1)