Hi, Vicențiu! On Dec 11, Vicențiu Ciorbaru wrote:
Hi!
Here is my proposal on extending the storage engine API to provide a functionality for retrieving random rows from tables (those that have indexes). The storage engines for which I plan to implement this are: MyISAM, Aria, Innodb. Possibly RocksDB, TokuDB. ... Maybe I am complicating it too much and a simple random_sample() function is sufficient, kind of how ha_records_in_range does it.
Yeah. My first thought was something like index_random(). That is, index_init(), index_random(), index_random(), ..., index_end(). index_init() does whatever it always needs to do to prepare using an index. and index_random() gets one random row. Not like records_in_range, that takes the index number and does _init/_end internally, because records_in_range is typically done only once or just a few times, while index_random() is more like index_next(), will be repeated continuously. But then I was thinking, why do you need to specify an index at all? Shouldn't it be just "get me a random row"? Index or whatever - that's engine implementation detail. For example, MyISAM with a fixed-size rows can just read from lseek(floor((file_size/row_size)*rand())*row_size). Sergey had a good point about being able to repeat exactly the same sampling by specifying the seed. This should be solved if the engine will get its random numbers from the thd_rnd service. Regards, Sergei