Dear Jan,

Thank you for your response. I have background in SQL (MS, Sybase, MySQL, Postgre), so luckily I'm not a beginner in query writing, however, I have absolutely no background in how databases work on background (other than hash-table).

The idea is not really SIMD here, but a specially designed hardware to perform database functions much faster than the conventional DISK<->RAM<->CACHE<->CPU model. What I'm struggling to figure is that the latency caused by a enterprise-grade SQL script (not very correct and advanced code, but not very low-quality either), is Memory-Related or Arithmetic related or both? 

Here is what can be made for example:

1) A card with 16 DDR3 Chips on board and a Central FPGA, allowing very fast access to data, and like 100 small CPUs integrated in the FPGA to process a query.
   (a query that was written to take advantage of these features however)

2) A card with XDR Memory chips and like 32GB of Flash Chips to achieve very high-speed storage, retrieval (SSD<->RAM) and an FPGA that has many cores inside. A query is executed by FPGA pulling large chunk of data from Flashes, putting them in XDR RAM chips and processes them in a parallel manner

3) A card with XDR Memory chips and an FPGA on board, connected to a PCI Express 8x which receives all the data it needs from the host computer, and does the parallel processing inside itself, returning it back to the host.

4) A card with an FPGA inside, having 100 small CPUs and PCI Express to take commands from host computer and process in a parallel manner

5) A card with an FPGA, not having small processors inside, but special circuitry that does either lookup, sorting, etc in a very fast manner


These are a few examples that I gave. The difference among them is the notion of having RAM, having a cold-storage, having an FPGA with many CPUs inside or an FPGA with special dedicated circuitry that performs special functions in parallel that is difficult for a CPU to handle (imaging adding 300 numbers to another 300 numbers in a single cycle, where a CPU with SIMD would take much longer, or maybe a string processor that processes many string columns with PATINDEX, etc in parallel where a CPU would be slow as it can't handle them in parallel).

So what I'm researching is whether beneath the SQL pyramid, there is a memory intensive operation going on or logic-intensive. What to look for and what to aim for...


Thanks in advance for your help,
Nasser





On Friday, October 24, 2014 7:49 PM, Jan Lindström <jan.lindstrom@mariadb.com> wrote:


Hi,

This idea of SIMD (single instruction multiple data) processing is not totally new one, similarly the idea to perform SQL-operations inside GPU or GPGA is not new. In traditional relational databases problematic is the fact that e.g. in your example TableX contains several columns, picking columns A and B from pages that reside first on disk, then on main-memory and finally on L1-L3 cache is not cheap, and then they are not on continuous memory. This is because page in cache would contain values for other columns that we not even need. In columnar database architecture this would be a lot easier, you just feed column containing values for A and B directly to SIMD operation and every page in main memory would contain a lot more values to process compared to traditional relational database where page would contain also values for columns that we really do not even need to process the result set of query. Anyway, I find the proposal interesting and challenging.

R: Jan Lindström

On Thu, Oct 23, 2014 at 3:39 PM, Nasser Ghoseiri <cdmcsd@yahoo.com> wrote:
Dear Serg,

Following our little chat in IRC, I'm writing this email to explain in more detail what the idea is. My name is Nasser GHOSEIR (Founder, CTO of Butterfly labs), which I must note that this project is not from Butterfly Labs, 
but will be a new company in Europe.

Our idea is to find a way to accelerate SQL query processing by either:

1) Creating an FPGA solution (Which will later evolve into ASIC), that has like 400 processors in itself, allow distribute calculation of some kind
2) Creating an FPGA/ASIC solution that performs large number of unrelated tasks (such as addition, multiplication, etc) in parallel
3) Creating an FPGA/ASIC solution that allows very high-speed access to data with some PRE-PROCESSING involved to accelerate the calculation
4) Creating a very high-speed storage solution, but no pre-processing.

An example to give is for 200,000 rows, imagine: "SELECT A+B, C FROM TableX WHERE C > 0". If there are 200,000 records, the processor has to perform 200,000 additions (A+B).
A CPU will handle these additions one-by-one (or maybe few-by-few if SSE, etc is used). However, an FPGA/ASIC solution can perform 1000 additions in a single-cycle. This results in acceleration.

Now, to what extend this can be effective, or what other solutions (maybe string processing?) can be implemented to accelerate the SQL processing is a question to me right now. But we do have experience in
making extremely fast processors (our Butterfly Labs Monarch chip performs 400 billion double-SHA256 hashes in 1 second, around 20,000 times faster than the best intel XEON processor can do only twenty 
million hashes per second).

The project is in brain-storming phase, and we are aware that this solution will be useful to large enterprises, or companies that have to deal with 200,000 rows in a single "SELECT" query. The idea is to integrate
some features in hardware, and then re-write a portion of MariaDB to take advantage of the new resources. Also, it is possible that users/companies will need to re-write their query to make it compliant.

You can reach my by mail at cdmcsd@yahoo.com or by phone at +33 6 72 17 26 19 (France).



Best Regards,
Nasser GHOSEIRI





_______________________________________________
Mailing list: https://launchpad.net/~maria-developers
Post to     : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp