For what it's worth, unaligned access does come with a performance penalty, typically somewhere in the 1-10% range on x86, depending on the generation of chip used. It has been _mostly_ mitigated on recent x86 chips, and IIRC Intel's C compiler does have an option to align all structs and arrays to a 16 byte boundary. I would be very interested to see some tests data on unalighed access cost on various aarch64 chips. On various 32-bit ARM chips (including those >= ARMv6) the unaligned access performance hit was quite dramatic. On Wed, Nov 27, 2019 at 11:36 AM Marko Mäkelä <marko.makela@mariadb.com> wrote:
Hi Daniel,
You seem to be right that the compilers are already mostly doing the right thing. Here is a notable exception where GCC lags behind clang (unnecessary use of stack): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89804
I created another test program, checking how mach_read_from_4() gets compiled. It turns out that on Aarch64 and POWER, unaligned reads are being used by default: https://godbolt.org/z/ZcavM4
For 32-bit ARM, -march=armv6 seems to enable unaligned reads. For RISC-V and WebAssembly, the code is rather ugly. :-)
So, indeed, there does not appear to be much to micro-optimize here.
Marko -- Marko Mäkelä, Lead Developer InnoDB MariaDB Corporation
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp