Dear mentors,

I am currently pursuing Master's in University of Illinois at Urbana Champaign, USA and completed by B.Tech from Indian Institute of Technology - Delhi (IIT- Delhi)

This is regarding GSoC 2015. I am really interested in databases, and I was very excited to see all these projects listed here. The exciting part was that some of the projects are really “hard” as in they have challenged the database community since a long time, and thus it would be very interesting to solve some of these challenges as part of GSoC.


I want to discuss 2 projects:

A. Indexes on virtual columns

Materialization gives us two things:

1. A name to the column which we can use in queries
2. A formal "regular" column which is stored and indexed in the regular fashion - Disadvantage: Extra memory requirements for the materialized column.

My initial thoughts on this project are the following:

We do need the name of the column which can be used to query. So maybe we can expose a command such as:

create virtual_index <name> on <column_name> <expression>

What this would do would run a regular query which evaluates expressions (like in WHERE clause) and the feed the result into the indexer. This index can then be stored in the regular fashion.


B. Having UDFs returning an array/set

There are three approaches that I can think of:

1. Supporting array/set as native datatype inside MariaDB (like int64, double, etc) - This might be hard and touches all levels of stack.

2. Have the array/set pass in serialized form to the above node of query execution and have appropriate deserializer when we want to interpret the result - Coming up with ser/deser strategy might be tough and this would be expensive too.

3. The query execution would be in a Tree structure where each node must be exposing functions like init(), next(), read(int col_index), etc. Maybe we can use this to emulate the evaluation of UDF against row. I think this is the suggestion that is listed in the project. I would like to get some direction on this from the mentors.

I would like to discuss these and then decide on one of them. Am I approaching this in the right direction? Can you please point me to the next steps?

Thanks
Richa