For what it's worth, unaligned access does come with a performance penalty, typically somewhere in the 1-10% range on x86, depending on the generation of chip used. It has been _mostly_ mitigated on recent x86 chips, and IIRC Intel's C compiler does have an option to align all structs and arrays to a 16 byte boundary.
I would be very interested to see some tests data on unalighed access cost on various aarch64 chips. On various 32-bit ARM chips (including those >= ARMv6) the unaligned access performance hit was quite dramatic.

On Wed, Nov 27, 2019 at 11:36 AM Marko Mäkelä <marko.makela@mariadb.com> wrote:
Hi Daniel,

You seem to be right that the compilers are already mostly doing the
right thing. Here is a notable exception where GCC lags behind clang
(unnecessary use of stack):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89804

I created another test program, checking how mach_read_from_4() gets
compiled. It turns out that on Aarch64 and POWER, unaligned reads are
being used by default:
https://godbolt.org/z/ZcavM4

For 32-bit ARM, -march=armv6 seems to enable unaligned reads. For
RISC-V and WebAssembly, the code is rather ugly. :-)

So, indeed, there does not appear to be much to micro-optimize here.

Marko
--
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to     : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp