Hello everyone,
We are evaluating various approaches for horizontal scalabilty of fulltext searching, and we thought about trying out Galera. We set up a 3-node cluster in Google Cloud Compute, and we tried loading a collection of 90k synthetic documents into a table like this one:
CREATE TABLE `documents` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`filename` varchar(255) DEFAULT NULL,
`text` longtext DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=282232 DEFAULT CHARSET=latin1
We followed the recommendation in the MariaDB docs of inserting everything, then creating the FULLTEXT index by running:
ALTER TABLE documents ADD FULLTEXT (text);
While this works fine with a regular single node configuration (without Galera), it is crashing in our Galera setup with a message like this:
Exception in thread "main" java.sql.SQLException: (conn=41) Index for table 'documents' is corrupt; try to repair it
at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.get(ExceptionMapper.java:198)
at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.getException(ExceptionMapper.java:110)
at org.mariadb.jdbc.MariaDbStatement.executeExceptionEpilogue(MariaDbStatement.java:228)
at org.mariadb.jdbc.MariaDbStatement.executeInternal(MariaDbStatement.java:334)
at org.mariadb.jdbc.MariaDbStatement.executeUpdate(MariaDbStatement.java:520)
at uk.ac.aston.erdf.mariadb.MariaDBIndexer.indexFiles(MariaDBIndexer.java:211)
at uk.ac.aston.erdf.mariadb.MariaDBIndexer.runIteration(MariaDBIndexer.java:142)
at uk.ac.aston.erdf.mariadb.MariaDBIndexer.run(MariaDBIndexer.java:117)
at uk.ac.aston.erdf.mariadb.MariaDBIndexer.main(MariaDBIndexer.java:106)
Caused by: java.sql.SQLException: Index for table 'documents' is corrupt; try to repair it
Query is: ALTER TABLE documents ADD FULLTEXT (text);
at org.mariadb.jdbc.internal.util.LogQueryTool.exceptionWithQuery(LogQueryTool.java:119)
at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executeQuery(AbstractQueryProtocol.java:200)
at org.mariadb.jdbc.MariaDbStatement.executeInternal(MariaDbStatement.java:328)
... 5 more
We have tried dropping and recreating the database, and SHOW STATUS lists the 3 nodes just fine:
MariaDB [(none)]> show status like 'wsrep_cluster%';
+--------------------------+--------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id | 12 |
| wsrep_cluster_size | 3 |
| wsrep_cluster_state_uuid | 258a45dd-b9e0-11e8-bdd1-e260a7908d0f |
| wsrep_cluster_status | Primary |
+--------------------------+--------------------------------------+
4 rows in set (0.001 sec)
Our Galera section is pretty bog standard: we followed most of the defaults in the Ubuntu MariaDB 10.3 packages, except for disabling bin logs (they take up too much space for our 40GB cloud VMs). Here is our [galera] section:
[galera]
# Mandatory settings
wsrep_on=ON
wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_cluster_address=gcomm://mariadb-g1,mariadb-g2,mariadb-g3
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
#
# Allow server to accept connections on all interfaces.
#
bind-address=0.0.0.0
#
# Optional setting
#wsrep_slave_threads=1
innodb_flush_log_at_trx_commit=0
Are we missing something obvious, or is InnoDB FULLTEXT not yet fully supported on Galera? Should we try Mroonga instead?
Thank you for your help,
Antonio