For what it's worth, unaligned access does come with a performance penalty, typically somewhere in the 1-10% range on x86, depending on the generation of chip used. It has been _mostly_ mitigated on recent x86 chips, and IIRC Intel's C compiler does have an option to align all structs and arrays to a 16 byte boundary.
I would be very interested to see some tests data on unalighed access cost on various aarch64 chips. On various 32-bit ARM chips (including those >= ARMv6) the unaligned access performance hit was quite dramatic.

On Wed, Nov 27, 2019 at 11:36 AM Marko Mäkelä <> wrote:
Hi Daniel,

You seem to be right that the compilers are already mostly doing the
right thing. Here is a notable exception where GCC lags behind clang
(unnecessary use of stack):

I created another test program, checking how mach_read_from_4() gets
compiled. It turns out that on Aarch64 and POWER, unaligned reads are
being used by default:

For 32-bit ARM, -march=armv6 seems to enable unaligned reads. For
RISC-V and WebAssembly, the code is rather ugly. :-)

So, indeed, there does not appear to be much to micro-optimize here.

Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

