[Maria-developers] Phone home
Hi. So, "Phone Home" or "MySQL feedback daemon" or "better name wanted" feature. It is something that can be installed together with MariaDB, it will gather different statistic about how MariaDB is used and will send this information anonymously to mariadb.org. Not unlike the Uptimes Project or Debian Popularity Contest. The complete specs will be here: http://askmonty.org/worklog/Server-Sprint/?tid=12 There are basically four questions I'm thinking on. 1. Should that be a MariaDB plugin or a separate executable ? I tend to prefer a separate executable. There is no need to keep it in memory constantly - cron job can do. Being separate its bugs won't affect the server. Being separate one instance can monitor many MariaDB servers. It can be upgraded separately - and it's not tied to the server release schedule. The drawback - it won't be able to grab MariaDB internals easily, which means it may not report some data that are worth reporting. But to solve this we can add an I_S table that provides this information. This way there's no "hidden" data to report, everything is available from the SQL. Which is good :) 2. How to send the data. We'll use HTTP. Seems to be the most universally working transport. That's what other projects are using too - Uptimes Project uses UDP or HTTP, Debian Popularity Contest - SMTP or HTTP. We *may* want to add SMTP later, if needed. 3. Auditing. How can we prove to paranoid users that we only send what we are saying we send, and none of potentially private information. Possible solution: http sending should support a proxy (to work behind firewalls), so one can install a logging proxy and record all the data sent. On the other hand, we'd like to use SSL too. We can support, besides direct http, a "wget mode" where the data are sent by invoking wget (which supports proxies, SSL and --post-file) and one could easily replace wget with a simple script that logs all the data. 4. What to report. That's the most interesting part :) note that not everything from below is collected in MariaDB now, but I describe the ideal case, what would be useful to know to steer MariaDB development in the right direction. The principle I used was not "let's grab as much as we can" but "on a need-to-know basis". For example, we may need to decide whether to optimize huge IN (...) lists or GIS first. Knowing what is used more often would help to make a correct decision. hardware: CPU, RAM OS (linux distribution, kernel) mariadb version, memory usage parts of config (e.g. buffer sizes) list of installed plugins number of databases, max/avg number of tables in a database, max/avg db/table size uptime something that indicates the load, e.g. average qps how much a particular feature is used: Com_ counters from SHOW STATUS plugin usage counters per feature, like GIS, replication, etc. per query parts, like ORDER BY, subquery in the FROM, IN subquery ... how useful is query cache (hit ratio?) What else ? Regards, Sergei
So, "Phone Home" or "MySQL feedback daemon" or "better name wanted" feature.
Maybe call it "Butler" ??? Just a thought... Not unlike the Uptimes Project or Debian Popularity Contest.
Opt-in only with an easy disable option after opting in... correct?
The complete specs will be here: http://askmonty.org/worklog/Server-Sprint/?tid=12
I imagine the following ... (optionally by user) geographic location (optionally by user) user information / company name (optionally by user) Monty Program Ab customer support contract id won't be shown to everyone, correct? So maybe a filtered public versus unfiltered private view?
1. Should that be a MariaDB plugin or a separate executable ?
A separate executable would probably be the best for the reasons you highlight in your first paragraph. The drawbacks are probably covered by the fact that 1) if a user is having that awful of a time, they are probably able to step through the executing code or 2) the user probably has a support contract with a company that can step through the code and debug the problem. Granted more in depth statistics would be useful, but maybe it would make sense to have a separate project to create a loadable module that would be "more invasive." This tool seems to be oriented towards usage and "usage related" data, not necessarily troubleshooting/fixing.
2. How to send the data.
I imagine if the code is generated with this in mind it should be easy to switch out the "transport" (read transmission method) layer at a later time. Unless the person coding it really ties the data formatting and submission process to the protocol. 3. Auditing.
I think the proxy idea, as well as the "wget mode" are great ideas. If the user isn't paranoid and doesn't want to "sniff traffic" one could also provide a log of all activities and a separate log for all messages.
4. What to report.
hardware: CPU, RAM
maybe disk speeds? and type? (SATA vs SAS vs IDE)
OS (linux distribution, kernel)
any libraries?
number of databases, max/avg number of tables in a database,
the slightly insane might also run multiple instances on a single machine, so what about checking for other installations? Just a few thoughts, hopefully they're not distracting or useless. -Adam
Hi, Adam! On Sep 09, Adam M. Dutko wrote:
So, "Phone Home" or "MySQL feedback daemon" or "better name wanted" feature.
Maybe call it "Butler" ??? Just a thought...
:) Why?
Not unlike the Uptimes Project or Debian Popularity Contest.
Opt-in only with an easy disable option after opting in... correct?
Of course. Sorry, I didn't make it clear enough - the first email was only about questions, unclear moments in this task. Whether it should be opt-in is not one of them :)
The complete specs will be here: http://askmonty.org/worklog/Server-Sprint/?tid=12
I imagine the following ...
(optionally by user) geographic location (optionally by user) user information / company name (optionally by user) Monty Program Ab customer support contract id
won't be shown to everyone, correct? So maybe a filtered public versus unfiltered private view?
Of course.
1. Should that be a MariaDB plugin or a separate executable ?
A separate executable would probably be the best for the reasons you highlight in your first paragraph. The drawbacks are probably covered by the fact that 1) if a user is having that awful of a time, they are probably able to step through the executing code or 2) the user probably has a support contract with a company that can step through the code and debug the problem. Granted more in depth statistics would be useful, but maybe it would make sense to have a separate project to create a loadable module that would be "more invasive." This tool seems to be oriented towards usage and "usage related" data, not necessarily troubleshooting/fixing.
Right.
2. How to send the data.
I imagine if the code is generated with this in mind it should be easy to switch out the "transport" (read transmission method) layer at a later time. Unless the person coding it really ties the data formatting and submission process to the protocol.
Right.
3. Auditing.
I think the proxy idea, as well as the "wget mode" are great ideas. If the user isn't paranoid and doesn't want to "sniff traffic" one could also provide a log of all activities and a separate log for all messages.
Yes. I was trying to find something convincing for paranoid users (like me :). Normal users can just look in the log.
4. What to report.
hardware: CPU, RAM
maybe disk speeds? and type? (SATA vs SAS vs IDE)
Good idea. Indeed, it's important. And to know if it's SSD or not.
OS (linux distribution, kernel)
any libraries?
I don't know. As you said it's not to troubleshoot, it's to steer development. I don't know if we may want to optimize for a specific version of a specific library. And if yes - for what library?
number of databases, max/avg number of tables in a database,
the slightly insane might also run multiple instances on a single machine, so what about checking for other installations?
Right.
Just a few thoughts, hopefully they're not distracting or useless.
Not at all! Thanks for sharing them. Regards, Sergei
:) Why?
When I think of a Butler I think of someone who monitors various aspects of a household/estate, stashes that information and uses it to improve service.
Good idea. Indeed, it's important. And to know if it's SSD or not.
Last night I was also thinking about network configuration. It might be good to know if people are using the database over the network more often than a standalone with BindAddress 127.0.0.1. It might also be good to know the distribution of NIC speeds (10/100/1000/10000) as it might help when determining where to focus development efforts. That is, if a ton of people are using 10Mbps (unlikely) maybe it might be useful to look at improving compression or other data related parts? I don't know if we may want to optimize for a specific version of a
specific library. And if yes - for what library?
I imagine the MariaDB version will determine what libraries people have installed because of various dependencies, but it might be useful to collect that information as well or whether they're running custom C libraries versus stock and etc because this might point out areas with problems for high-end users. I'm not familiar enough with the code base to know which ones MariaDB might want to monitor, I just thought it might be useful to think on it some more...
Hi, Adam! On Sep 10, Adam M. Dutko wrote:
Good idea. Indeed, it's important. And to know if it's SSD or not.
Last night I was also thinking about network configuration. It might be good to know if people are using the database over the network more often than a standalone with BindAddress 127.0.0.1. It might also be good to know the distribution of NIC speeds (10/100/1000/10000) as it might help when determining where to focus development efforts. That is, if a ton of people are using 10Mbps (unlikely) maybe it might be useful to look at improving compression or other data related parts?
Noted, thanks! Regards, Sergei
Hi. On Sep 09, Sergei Golubchik wrote:
So, "Phone Home" or "MySQL feedback daemon" or "better name wanted" feature.
my current working name is "MariaDB User Feedback", which isn't catchy at all :(
It is something that can be installed together with MariaDB, it will gather different statistic about how MariaDB is used and will send this information anonymously to mariadb.org.
My current thoughts about the server part. I'm thinking about an I_S plugin. It will create a table with all the information that is going to be sent. If we'll have sending code in the server - Monty wants it for some reason - then this plugin will send the content of the table. But one can also disable the sending and simply use I_S table as any other I_S table. Or one can send the data manually: mysql -e 'select * from i_s.feedback_table'|mail email@mariadb.org or the same with "wget --post-file" or "curl --data". From the cronjob, if desired. Which is not very Windows friendly, but the point is that technically a paranoid user can control what data are sent. Of course, most users will probably use a built-in sending code. Regards, Sergei
participants (2)
-
Adam M. Dutko
-
Sergei Golubchik