Hi Daniel, On Tue, Nov 26, 2019 at 2:02 AM Daniel Black <daniel@linux.ibm.com> wrote:
On Mon, 25 Nov 2019 11:32:07 +0200 Marko Mäkelä <marko.makela@mariadb.com> wrote:
I also found a claim that POWER8 supports unaligned access,
This is correct (for the normal cacheable memory (i.e. not device IO mapped - so not applicable to mariadb))
and I seem to remember that the latest version of the SPARC introduced support for that as well. (IA-32 and AMD64 have always supported unaligned access, except for some SIMD operations.)
Last, I believe that we could get some performance benefits if include/byte_order_generic.h was rewritten in a suitable way. Ideally, include/byte_order_generic_x86_64.h would be replaced with a portable version of both, and compilers could simply perform the optimizations. I have been told that replacing the + in the macros with | could already be a good start. I would welcome patches in this area.
I've never managed to get the time to look at these however a non-aligned version for non-common arches seems a better way to model this.
I pushed my micro-optimization to 10.5: https://github.com/MariaDB/server/commit/25e2a556de2e125784d52a0c7ccda4fa659... If there really is no compiler flag that would allow any memcpy(), memset(), memcmp() of 2,4,8 bytes to be translated into simple (possibly unaligned) multi-byte instructions, then we might add further MY_ASSUME_ALIGNED() assertions here and there, to allow gcc and clang to generate better code for POWER and ARM. If the compiler is smart enough, it might suffice to implement an accessor for buf_block_t or buf_block_t::frame that would MY_ASSUME_ALIGNED(frame, 4096). Then the compiler might correctly infer the alignment of (block->frame + some_compile_time_constant) and enable the optimization. I would be unwilling to pepper such hints all over the code. Marko -- Marko Mäkelä, Lead Developer InnoDB MariaDB Corporation