Dear Jan,
Thank you for your response. I have background in SQL (MS, Sybase, MySQL, Postgre), so luckily I'm not a beginner in query writing, however, I have absolutely no background in how databases work on background (other than hash-table).
The idea is not really SIMD here, but a specially designed hardware to perform database functions much faster than the conventional DISK<->RAM<->CACHE<->CPU model. What I'm struggling to figure is that the latency caused by a enterprise-grade SQL script (not very correct and advanced code, but not very low-quality either), is Memory-Related or Arithmetic related or both?
Here is what can be made for example:
1) A card with 16 DDR3 Chips on board and a Central FPGA, allowing very fast access to data, and like 100 small CPUs integrated in the FPGA to process a query.
(a
query that was written to take advantage of these features however)
2) A card with XDR Memory chips and like 32GB of Flash Chips to achieve very high-speed storage, retrieval (SSD<->RAM) and an FPGA that has many cores inside. A query is executed by FPGA pulling large chunk of data from Flashes, putting them in XDR RAM chips and processes them in a parallel manner
3) A card with XDR Memory chips and an FPGA on board, connected to a PCI Express 8x which receives all the data it needs from the host computer, and does the parallel processing inside itself, returning it back to the host.
4) A card with an FPGA inside, having 100 small CPUs and PCI Express to take commands from host computer and process in
a parallel manner
5) A card with an FPGA, not having small processors inside, but special circuitry that does either lookup, sorting, etc in a very fast manner
These are a few examples that I gave. The difference among them is the notion of having RAM, having a cold-storage, having an FPGA with many CPUs inside or an FPGA with special dedicated circuitry that performs special functions in parallel that is difficult for a CPU to handle (imaging adding 300 numbers to another 300 numbers in a single cycle, where a CPU with SIMD would take much longer, or maybe a string processor that processes many string columns with PATINDEX, etc in parallel where a CPU would be slow as it can't handle them in parallel).
So what I'm researching is whether beneath the SQL pyramid, there is a memory intensive operation going on or logic-intensive. What to look for and what to aim for...
Thanks in advance for your help,
Nasser
On Friday, October 24, 2014 7:49 PM, Jan Lindström <jan.lindstrom@mariadb.com> wrote:
Hi,
This idea of SIMD (single instruction multiple data) processing is not totally new one, similarly the idea to perform SQL-operations inside GPU or GPGA is not new. In traditional relational databases problematic is the fact that e.g. in your example TableX contains several columns, picking columns A and B from pages that reside first on disk, then on main-memory and finally on L1-L3 cache is not
cheap, and then they are not on continuous memory. This is because page in cache would contain values for other columns that we not even need. In columnar database architecture this would be a lot easier, you just feed column containing values for A and B directly to SIMD operation and every page in main memory would contain a lot more values to process compared to traditional relational database where page would contain also values for columns that we really do not even need to process the result set of query. Anyway, I find the proposal interesting and challenging.
R: Jan Lindström