Re: [Maria-discuss] MariaDB Server 10.3 notes

13 Oct 2016

      On 13.10.2016 17:26, Andrew Hutchings wrote:

> Whilst I agree it would be a nice feature and something I've heard for 
> years, the implementation would likely be more complex than it seems.
>
> For starters you have to consider the possibility of a procedure 
> crashing and taking the whole daemon with it. You would likely need to 
> fork a small worker process pool and have some kind of shared memory 
> or socket communications for safety. In addition the implementation 
> would have to be extremely careful not to add any potential security 
> hole due to a zero-day in PHP or some bad input filtering for example.
>
> I'm not saying it is impossible, but it will likely be a lot of work 
> to get right and the APIs would need to be carefully thought out.
>
> So the question becomes: is it worth spending time developing this 
> over another feature? Or is it something that could be better 
> implemented safely in another layer, such as in a database proxy?
>
> Kind Regards
> Andrew

I think crashing is much easier if one runs native C UDFs, or any kind 
of natively compiled plugins.

For external languages I think an important aspect is often overlooked. 
Namely, how to would access data. You can create a new JDBC connection, 
but it is likely not what you want to do. You just want to run a SELECT  
from the same session you're in, just like you do in SQL stored 
procedures. Exactly this requires a large amount of work in the 
connectors. It is not  just running a JVM  inside the server.  There is 
no API for data access from the running session inside the server,  not 
even for C.

>
> On 13/10/16 07:04, Federico Razzoli wrote:
>> Hi all,
>>
>> So basically everyone would love, love, love to have external 
>> languages for stored procedures, but no one is working on it... so 
>> bad. Please consider something:
>> 1- Some features could be implemented as stored procedures, it's much 
>> easier. This has been done in the past (Flexviews, Securich...) but 
>> SQL is too limited.
>> 2- I am sure that a lot of people would implement procedures 
>> libraries if they could use something like JavaScript or PHP. If we 
>> could use Python, stuff like NumPy and SciPy could be used.
>> 3- SQL limitations could be lifted (namespaces for global objects, 
>> arrays, argv, cursors based on a prepared statement...) but will you 
>> ever do it? Probably not. But if we have external languages, who 
>> cares about procedural SQL limitations.
>>
>> I believe that next releases features are selected based on their 
>> cost (not only that of course). But please consider points 2 and 3, 
>> and try to estimate the cost of features that the community can 
>> develop for you, plus the cost of features that won't be really 
>> needed anymore if external languages are available (arrays in SQL).
>>
>> Cheers
>> Federico
>>
>>
>>
>>
>> --------------------------------------------
>> Mer 12/10/16, Sergei Golubchik <serg@mariadb.org> ha scritto:
>>
>>  Oggetto: Re: [Maria-discuss] MariaDB Server 10.3 notes
>>  A: "Justin Swanhart" <greenlion@gmail.com>
>>  Cc: "Maria Discuss" <maria-discuss@lists.launchpad.net>
>>  Data: Mercoledì 12 ottobre 2016, 14:15
>>
>>  Hi, Justin!
>>
>>  Very good questions,
>>  thanks!
>>  Some answers below:
>>
>>  On Oct 12, Justin Swanhart
>>  wrote:
>>  >
>>  > > *
>>  InnoDB: InnoDB native partitioning - so MySQL 8 InnoDB? But
>>  Monty
>>  > > says there's next to no
>>  changes in InnoDB 8...  Instant add column.
>>  > > New InnoDB deadlock detection (8.0).
>>  New INFORMATION_SCHEMA table
>>  > >
>>  (8.0). Dedicated tablespace for temporary tables (in 5.7 and
>>  merged,
>>  > > check). Lock wait policy
>>  (contribution)
>>  >
>>  >
>>  Monty is *notorious* for low balling estimates.  His famous
>>  phrase is "it
>>  > is trivial".
>>  Everybody knows that if Monty says it is trivial, you can
>>  add
>>  > 10x the work to get it done, or
>>  more.
>>
>>  Yes, I tend to agree
>>  with that. But Monty estimates are not that far off
>>  when applied to *Monty*. They're often too
>>  low when applied to others.
>>
>>  https://www.google.de/search?q=10x+developer
>>
>>  > includes transportable
>>  tablespaces for partitioned tables, and will likely
>>  > support native partitioning by the time 8
>>  rolls.  Native partitioning also
>>  >
>>  entails implementing the changes to the SE.  See:
>>  > http://mysqlserverteam.com/innodb-native-partitioning-early-access/
>>
>>  InnoDB native partitioning is
>>  in 5.7.6:
>>  https://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-6.html
>>
>>  Is it better than upper-layer
>>  partitioning? Why?
>>
>>  >
>>  > * more for window functions (user defined window
>>  functions, MDEV-10855)
>>  > What other
>>  databases have user definable window functions?  Will the
>>  syntax
>>  > be similar?  I can't really
>>  think of a reason to need custom window
>>  >
>>  functions personally, as the existing set is very
>>  comprehensive.  What kind
>>  > of custom
>>  window function do you think people would need to write?
>>  In most
>>  > implementations any aggregate
>>  functions can be used in a window frame
>>  >
>>  context, thus any aggregate SQL functions should satisfy
>>  requirements for
>>  > user defined window
>>  funcs.
>>
>>  Yes, that was a
>>  confusing description.
>>  What it really means
>>  - we will have aggregate SQL functions (MDEV-7773)
>>  and preferrably, it should be possible to make
>>  them window-aware.
>>
>>  Using an
>>  aggregate function as a window function, means applying it
>>  to
>>  every possible position of a sliding
>>  frame, and it has O(N*n) complexity
>>  (where N
>>  is total number of rows, n is the number of rows in the
>>  frame).
>>  If the aggregate function can remove
>>  rows from a group, it will only be O(N).
>>
>>  > > * socket authentication
>>  > Can you explain this.  Is there an
>>  MDEV?
>>
>>  That's using
>>  unix_socket authentication by default for the root user.
>>  It improves security and maintainability, but
>>  is not really new - Debian
>>  is already doing
>>  that for months. This change could be confusing and
>>  break exising user scripts, so it needs to be
>>  done with care.
>>
>>  > > *
>>  X Protocol/Document store - only if people have time or
>>  money will
>>  > > it be done
>>  > If you want people to pay for it, perhaps
>>  you should implement it in
>>  > MaxScale
>>  instead of the server :)
>>
>>  Interesting idea :)
>>
>>  But the MariaDB Meetup is a community event
>>  organized by the MariaDB
>>  Foundation. This
>>  particular session was about MariaDB Server planning.
>>  MariaDB Foundation has nothing to do with
>>  MaxScale, so we could not have
>>  planned
>>  anything for it.
>>
>>  > >*
>>  Peter asks if there is any plans to support other languages
>>  like V8?
>>  > PLEASE PLEASE PLEASE PLEASE
>>  IMPLEMENT WL-820.  External stored procs and
>>  > table functions.  Been sitting there for
>>  the taking for a long time.
>>  > Antony has
>>  tried to get you to get it into the server, but alas, it
>>  has
>>  > never happened, and though there
>>  continues to be wide requests for this
>>  >
>>  feature from the community, they fall on deaf ears.
>>
>>  Frankly speaking, I'd love
>>  it. But this feature never got enough
>>  priority, not in MySQL times nor in MariaDB.
>>  Unfortunately.
>>
>>  Perhaps
>>  we'll be able to sneak it into 10.3, but no promises
>>  here.
>>
>>  > > *
>>  Compressed binary log (from Tencent)
>>  >
>>  Compressing the binary log saves on space on the master, but
>>  it makes
>>  > seeking into binary logs much
>>  more difficult, and searching backwards
>>  >
>>  through them becomes much more difficult (which has
>>  implications for query
>>  >
>>  'rewind')
>>
>>  Tencent
>>  compresses individual events (think
>>  Compressed_query_log_event
>>  type). So seeking
>>  works as before, events are not decompressed on
>>  reading, they are sent compressed to slaves,
>>  etc. But the compression
>>  ratio is worse, of
>>  course.
>>
>>  > > * Fix the
>>  XA transaction bug ( MySQL has fixed it already ) -
>>  MDEV-7974
>>  > Please actually complete XA
>>  support, don't just half fix it like Oracle
>>  > did.  Add full support for XA SUSPEND and
>>  XA RESUME and allow more than one
>>  >
>>  thread to participate in a distributed transaction in the
>>  server.
>>
>>  Right. Still,
>>  MDEV-7974 is an important step, we cannot have proper XA
>>  without it. XA SUSPEND and XA RESUME should be
>>  the next one, I agree.
>>
>>  >
>>  > * Indexes on expressions (this is part of virtual
>>  columns, will it not
>>  > > go into 10.2?
>>  Check with Serg)
>>  > Indexes on expressions
>>  requires parser support.
>>  > create index
>>  expr_idx on some_table(a + b);
>>  > select *
>>  from some_table where a+b > 30 and a+b <= 50 -- (uses
>>  range over
>>  > expr_idx)
>>
>>  Right, but the parser support
>>  is the least of my worries. I don't want
>>  to low ball estmates :) but the *parser* can be
>>  fixed in a few hours.
>>  The most tricky part
>>  will be to fix the optimizer.
>>
>>  > > * Flashback DDL MDEV-10571 (flashback
>>  DML will come in 10.2). It only
>>  > >
>>  works with row based replication. Talk about what to name
>>  it. There's
>>  > > already a MySQL
>>  time machine on github. Many like the name Rewind (but
>>  > > not Monty). Let's do a poll on
>>  the mailing list
>>  > Okay, Flashback query
>>  is VERY complicated. A flashback query is a
>>  > materialized view.  There are two ways
>>  generally to achieve flashback:
>>  > a)
>>  materialize the query as is, and instead of rolling the
>>  query forward
>>  > incrementally, you roll
>>  it backwards.  Flexviews achieves this by computing
>>  > the query as it is now, then computing the
>>  delta from a prior point in time
>>  > until
>>  the transaction in which that computation happened. Then the
>>  delta is
>>  > played back against the query,
>>  but the MONUS of each operation is applied
>>  > which changes insertions to deletions and
>>  vice versa.  This still presents
>>  > a
>>  problem for OUTER JOIN.  No documented asynchronous refresh
>>  algorithms
>>  > exist for outer join that
>>  I'm aware of, that would work with the concept of
>>  > reading row history from serialized
>>  changes.  Flexviews uses the "rolling
>>  > join propagation" algorithm, which
>>  only works with inner joins.
>>
>>  Interesting... I'll continue reading on
>>  that, thanks.
>>  But see below
>>
>>  > b) provide a
>>  point-in-time snapshot of every table used in the query at
>>  the
>>  > desired point in time, and run the
>>  query over those tables (this is a
>>  >
>>  synchronous mechanism which supports OUTER JOIN).  The
>>  problem here is how
>>  > to provide such a
>>  table.  The most straight-forward way is to copy the
>>  > table, and then do the backwards replay
>>  for each (ideally in parallel) but
>>  > this
>>  is obviously undesirable if the query is "select * from
>>  some_big_table
>>  > join another_big_table
>>  on (...)" because you have to fully copy each table
>>  > before you can undo changes.  Otherwise
>>  you need to have the SE display old
>>  >
>>  versions, just like it does for MVCC, but it has to display
>>  versions from
>>  > binary logs, not undo
>>  logs, and this requires a lot of SE and engine
>>  > changes!
>>
>>  This is about correct. There are different
>>  applications for "flashback"
>>  with
>>  different use cases and different implementation
>>  trade-offs.
>>
>>  One is, for
>>  example, to see historical sales numbers, for different
>>  months or years. That is, basically, for some
>>  kind of data analytics.
>>  Lots of SELECT
>>  queries, ad-hoc queries, at different historical time
>>  points.
>>
>>  Another one is "damn, I've made a typo
>>  in a WHERE clause and updated too
>>  much". In this case one doesn't need
>>  joins or materialized views, one
>>  needs to
>>  see the data before the erroneous statement. This is not
>>  used
>>  often.
>>
>>  For the second use case one can afford to copy
>>  tables and
>>  backward-replay the binary log.
>>  For the first use case one would need a
>>  completely different approach, I agree. May be
>>  something like what
>>  Flexviews does.
>>
>>  > If you are going to have
>>  generic flashback, you might as well commit to
>>  > incrementally refreshable materialized
>>  views.
>>
>>  Right, when
>>  we'll have materialized views, than we could think
>>  about incrementally updating them using rolling
>>  join propagation.
>>
>>  > >
>>  * Additional GIS functions to stay compatible. Also it would
>>  be good
>>  > > to have a standalone GIS
>>  library (Georg suggests; wlad isn't too happy
>>  > > with the suggestion). Georg suggests
>>  that calculations should use the
>>  > >
>>  reference systems (Unflatten the world - MariaDB Server GIS
>>  the world
>>  > > looks flat)
>>  >
>>  > GIS functions should
>>  use gdal, just like postgresql does.  Then you can
>>  > also add support for rasters.  Here is an
>>  example of the awesomeness of
>>  >
>>  postgresql and postgis.  It uses SQL aggregate functions,
>>  table functions,
>>  > CTE, GIS raster
>>  functions, sequences, etc:
>>  > 
>> https://github.com/greenlion/osmvox/blob/master/postgresql/combined_schema.sql
>>
>>  May be. On the other hand,
>>  we're more precise that postgis, because our
>>  implementation uses fixed-point math, not
>>  doubles.
>>  And let's not forget that MySQL
>>  uses Boost::Geometry.
>>
>>  >
>>  > * Query rewriting - MDEV-5561
>>  > I
>>  would like you to provide a SQL->DOM function call
>>  instead of just
>>  > providing a DOM to
>>  plugins.  This function could be exposed as a regular
>>  > item function as well so that anything in
>>  the server can parse SQL. I
>>  > suggest you
>>  implement my SQL "shim" interface and just provide
>>  some way to
>>  > get a easy to iterate over
>>  parsed data structure from the SQL, such as a
>>  > nested JSON array of objects.  A nested
>>  array is generated by
>>  > PHP-SQL-Parser
>>  and would provide a good template for such a JSON object.
>>  > https://github.com/greenlion/PHP-SQL-Parser
>>
>>  We actually have an MDEV for
>>  that :)
>>
>>  > > * With
>>  MyRocks coming, should we drop TokuDB (and maybe even
>>  deprecate
>>  > > in 10.2?) - bugs that
>>  MariaDB Corporation reports to Percona don't
>>  > > seem to get fixed. Peter says that
>>  bugs are fixed for customers... and
>>  >
>>  > there is ongoing development to make it better
>>  > It is certainly disheartening that Percona
>>  isn't responsive to MariaDB
>>  > bugs,
>>  but I'm sure you understand that it is hard for a
>>  competitor to fix
>>  > bugs for another
>>  competitor.  MariaDB maintains a forked version of
>>  > TokuDB.  It isn't fair to expect the
>>  upstream vendor to fix bugs that your
>>  >
>>  customers are paying you to support, does it?  Perhaps you
>>  should pay a
>>  > percentage of your support
>>  fees for TokuDB issues to Percona, or come to
>>  > some other support agreement.  Perhaps
>>  YOU should make the bug fixes and
>>  >
>>  submit them to Percona.
>>
>>  And
>>  we do, look for bug reports I've reported on TokuDB to
>>  Percona -
>>  with patches :)
>>
>>  But, really, Percona *is*
>>  fixing TokuDB bugs, it's just that they did a
>>  bit of refactoring after getting TokuDB and it
>>  took time for it to
>>  stabilize. And MariaDB
>>  Server has a vanilla TokuDB with almost no
>>  changes, we have no plans to fork it.
>>
>>  > > * ORDER BY LIMIT
>>  optimizer bugs (MDEV-8306)
>>  > Oh god, this
>>  is a downward spiral every time somebody touches it.
>>  > Fixing it correctly requires rewriting the
>>  parser, something that
>>  > Oracle is
>>  undertaking.
>>
>>  This
>>  didn't have much to do with the parser, and we've
>>  refactored that
>>  part of the parser in 10.2,
>>  so now this has nothing to do with the
>>  parser, it's purely the optimizer issue.
>>
>>  > What about other new
>>  MySQL 8 features?  Are you getting rid of the .FRM
>>  > nightmare?  Are you going to support SET
>>  PERSIST?  etc?
>>
>>  FRM was
>>  made purely optional in 10.0 - every storage engine decides
>>  for
>>  itself whether it uses FRMs or not. May
>>  be InnoDB will use FRMs in 10.3,
>>  may be it
>>  will not. MyISAM most probably will continue use them.
>>
>>  SET PERSIST - and this is my
>>  personal opinion - this needs to be thought
>>  over *very* carefully. There have been quite a
>>  few security
>>  vulnerabilities with my.cnf
>>  stored in the datadir, and that's why since
>>  2005 the server no longer reads my.cnf in the
>>  datadir. But mysql_safe
>>  still does - and
>>  there have been new security vulnerabilities *last
>>  month*, caused by my.cnf in the datadir. And
>>  finally both MySQL (in
>>  5.7?) and MariaDB (in
>>  10.2) stopped reading my.cnf in the datadir for
>>  real.  So, having my security@mariadb.org
>>  hat on, I look at SET PERSIST
>>  with a lot of
>>  suspicion, because it's nothing else than another
>>  attempt
>>  of storing server configuration
>>  information in the datadir and have it
>>  writable by the server itself.
>>
>>  Regards,
>>  Sergei
>>  Chief Architect
>>  MariaDB
>>  and security@mariadb.org
>>
>>  _______________________________________________
>>  Mailing list: https://launchpad.net/~maria-discuss
>>  Post to     : maria-discuss@lists.launchpad.net
>>  Unsubscribe : https://launchpad.net/~maria-discuss
>>  More help   : https://help.launchpad.net/ListHelp
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~maria-discuss
>> Post to     : maria-discuss@lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~maria-discuss
>> More help   : https://help.launchpad.net/ListHelp
>>
>

Re: [Maria-discuss] MariaDB Server 10.3 notes

Vladislav Vaintroub