Re: [Maria-developers] [knielsen@knielsen-hq.org] [Commits] Rev 3459: MDEV-232: Remove one fsync() from commit phase. in http://bazaar.launchpad.net/~maria-captains/maria/5.5
Sergei Golubchik <serg@askmonty.org> writes:
Hi, Kristian!
On Jun 22, Kristian Nielsen wrote:
Can you review below patch for mdev-232 (At your convenience, not urgent)?
Here's the review. Mostly questions, I have to confess :)
Thanks a lot for your review! I've tried to answer all your questions below, and for many I have tried to clarify the patch, as noted. Do you want to see the changes for when I implement async commit_checkpoint_request() in InnoDB/XtraDB, before I push it to 10.0?
=== modified file 'sql/slave.cc' --- a/sql/slave.cc 2012-06-22 09:40:40 +0000 +++ b/sql/slave.cc 2012-06-22 10:20:29 +0000 @@ -1751,13 +1751,12 @@ when it try to get the value of TIME_ZON /* Announce MariaDB slave capabilities. */ DBUG_EXECUTE_IF("simulate_slave_capability_none", goto after_set_capability;); { - const char *q= - DBUG_EVALUATE_IF("simulate_slave_capability_old_53", - "SET @mariadb_slave_capability=" - STRINGIFY_ARG(MARIA_SLAVE_CAPABILITY_ANNOTATE), - "SET @mariadb_slave_capability=" - STRINGIFY_ARG(MARIA_SLAVE_CAPABILITY_MINE)); - if (mysql_real_query(mysql, q, strlen(q))) + int rc= DBUG_EVALUATE_IF("simulate_slave_capability_old_53", + mysql_real_query(mysql, STRING_WITH_LEN("SET @mariadb_slave_capability=" + STRINGIFY_ARG(MARIA_SLAVE_CAPABILITY_ANNOTATE))), + mysql_real_query(mysql, STRING_WITH_LEN("SET @mariadb_slave_capability=" + STRINGIFY_ARG(MARIA_SLAVE_CAPABILITY_MINE)))); + if (rc)
Why?
Monty asked me to not use strlen() which I had introduced due to a request from you in review of an earlier worklog. And you wrote that you did not care much one way or the other (neither do I), so I reverted the change back to the original without strlen(). (Maybe I should have done it as a separate changeset, but I took this short cut since MDEV 225/181/232 are kind of tied together and will likely be pushed together.)
=== modified file 'storage/xtradb/handler/ha_innodb.cc' --- a/storage/xtradb/handler/ha_innodb.cc 2012-06-15 12:54:23 +0000 +++ b/storage/xtradb/handler/ha_innodb.cc 2012-06-22 10:20:29 +0000 @@ -485,8 +486,11 @@ static MYSQL_THDVAR_ULONG(lock_wait_time static MYSQL_THDVAR_ULONG(flush_log_at_trx_commit, PLUGIN_VAR_OPCMDARG, "Set to 0 (write and flush once per second)," " 1 (write and flush at each commit)" - " or 2 (write at commit, flush once per second).", - NULL, NULL, 1, 0, 2, 0); + " or 2 (write at commit, flush once per second)." + " Can also be set to 3 to write and flush at each commit, even when this is" + " unnecessary because of binlog crash recovery; should not normally be used" + " due to performance degration, but provided for completeness.",
would you like to improve the description on =1 value? For example, 1 (write and flush on prepare or on commit) 3 (write and flush both on prepare and on commit, usually it is redundant)
Yes, good idea. Here is my suggestion: "Controls the durability/speed trade-off for commits." " Set to 0 (write and flush redo log to disk only once per second)," " 1 (flush to disk at each commit)," " 2 (write to kernel at commit but flush to disk only once per second)" " or 3 (flush to disk twice per commit, usually redundant)." " 1 and 3 guarantees that after a crash, committed transactions will" " not be lost and will be consistent with the binlog and other transactional" " engines. 2 can get inconsistent and lose transactions if there is a" " power failure or kernel crash but not if mysqld crashes. 0 has no" " guarantees in case of crash. 0 and 2 can be faster than 1 or 3.",
/*****************************************************************//** +Handle a commit checkpoint request from server layer. +We simply flush the redo log immediately and do the notify call.*/ +static +void +innobase_checkpoint_request( + handlerton *hton, + void *cookie) +{ + log_buffer_flush_to_disk(); + ha_commit_checkpoint_notify(hton, cookie); +} + +/*****************************************************************//**
What about making innobase_checkpoint_request() implementation that flushed in the background? At least to serve as an example of the correct usage of your API, if not for performance. And to test your code for cases when the notification isn't coming directly from the checkpoint_request method.
it looks like it's very easy to make innodb to invoke ha_commit_checkpoint_notify from the background flusher thread (the one that syncs once a second). you only need to store the cookie in the log_sys, and invoke ha_commit_checkpoint_notify after the flush when this stored cookie isn't NULL.
Yes, I would like to do this, I will try. It might even help performance, I have seen several blog posts mentioning problems with stalls at binlog rotation, and this might help. As you say, this should be easy. I need to store a list of cookies and their associated LSN (not just a single cookie), but this is infrequently executed so just a simple linked list with malloc/free of elements will work fine.
+void +ha_commit_checkpoint_notify(handlerton *hton, void *cookie)
are there other ha_xxx methods that a storage engine calls?
typically server calls methods, like, ha_commit_trans, and they invoke storage engine methods. Not the other way around. It is exactly for this reason I used names like "trans_register_ha" - inverse of ha_register_trans - to emphasize that this method is invoked by the engine to call into the server, not by the server to call into the engine.
Ah, thanks, good point. I suggest "commit_checkpoint_notify_ha()", what do you think?
=== modified file 'sql/log.h' --- a/sql/log.h 2012-06-22 09:46:28 +0000 +++ b/sql/log.h 2012-06-22 10:20:29 +0000 @@ -359,9 +373,14 @@ class MYSQL_QUERY_LOG: public MYSQL_LOG Ids start from BINLOG_COOKIE_START; the value BINLOG_COOKIE_DUMMY is special meaning "no binlog" (we cannot use zero as that is reserved for error return from log_and_order). + The low-order bit is reserved for an error flag. */ -#define BINLOG_COOKIE_DUMMY 1 -#define BINLOG_COOKIE_START 2 +#define BINLOG_COOKIE_ERROR_FLAG 1 +#define BINLOG_COOKIE_ID_INCREMENT (BINLOG_COOKIE_ERROR_FLAG*2) +#define BINLOG_COOKIE_DUMMY (BINLOG_COOKIE_ID_INCREMENT*1)
Is that right? BINLOG_COOKIE_DUMMY == BINLOG_COOKIE_ID_INCREMENT ?
It is "right", but I agree it is incomprehensible. I will improve it. The thing is, I need to store a number of different values in the "cookie" used with log_and_order() and unlog(): - 0, which is the error return value from log_and_order() - A pair (binlog_id, error_flag) - A special dummy binlog_id meaning that nothing was logged to the binlog. I will change it to this, what do you think? : /* We assign each binlog file an internal ID, used to identify them for unlog(). The IDs start from 0 and increment for each new binlog created. In unlog() we need to know the ID of the binlog file that the corresponding transaction was written into. We also need a special value for a corner case where there is no corresponding binlog id (since nothing was logged), as well as an error flag to mark that unlog() must return failure. We use the following macros to pack all of this information into the single ulong available with log_and_order() / unlog(). Note that we cannot use the value 0, as that is reserved as error return value from log_and_order(). */ #define BINLOG_COOKIE_ERROR_RETURN 0 #define BINLOG_COOKIE_DUMMY_ID 1 #define BINLOG_COOKIE_BASE 2 #define BINLOG_COOKIE_DUMMY(error_flag) \ ( (BINLOG_COOKIE_DUMMY_ID<<1) | ((error_flag)&1) ) #define BINLOG_COOKIE_MAKE(id, error_flag) \ ( (((id)+BINLOG_COOKIE_BASE)<<1) | ((error_flag)&1) ) #define BINLOG_COOKIE_GET_ERROR_FLAG(c) ((c) & 1) #define BINLOG_COOKIE_GET_ID(c) ( ((ulong)(c)>>1) - BINLOG_COOKIE_BASE ) #define BINLOG_COOKIE_IS_DUMMY(c) \ ( ((ulong)(c)>>1) == BINLOG_COOKIE_DUMMY_ID ) Then the code in log.cc can use these macros, and not have to depend on strange details on how to pack different values into a single ulong. I think the resulting code is much clearer, now BINLOG_COOKIE_* is only used in log_and_order() and unlog() functions, thanks!
class MYSQL_BIN_LOG: public TC_LOG, private MYSQL_LOG @@ -464,15 +492,6 @@ class MYSQL_BIN_LOG: public TC_LOG, priv uint file_id; uint open_count; // For replication int readers_count; - bool need_start_event; - /* - no_auto_events means we don't want any of these automatic events : - Start/Rotate/Stop. That is, in 4.x when we rotate a relay log, we don't - want a Rotate_log event to be written to the relay log. When we start a - relay log etc. So in 4.x this is 1 for relay logs, 0 for binlogs. - In 5.0 it's 0 for relay logs too! - */ - bool no_auto_events;
Why? Because it's some old code which isn't used nowadays?
Yes. This was dead code.
=== modified file 'sql/log.cc' --- a/sql/log.cc 2012-06-22 09:46:28 +0000 +++ b/sql/log.cc 2012-06-22 10:20:29 +0000 @@ -3597,12 +3623,40 @@ bool MYSQL_BIN_LOG::reset_logs(THD* thd)
+ /* We are going to nuke all binary log files. - So first wait until all pending binlog checkpoints have completed. + Without binlog, we cannot XA recover prepared-but-not-committed + transactions in engines. So force a commit checkpoint first. + + Note that we take and immediately release LOCK_commit_ordered. This has + the effect to ensure that any on-going group commit (in + trx_group_commit_leader()) has completed before we request the checkpoint, + due to the chaining of LOCK_log and LOCK_commit_ordered in that function. + (We are holding LOCK_log, so no new group commit can start). + + Without this, it is possible (though perhaps unlikely) that the RESET + MASTER could run in-between the write to the binlog and the + commit_ordered() in the engine of some transaction, and then a crash + later would leave such transaction not recoverable. */ + mysql_mutex_lock(&LOCK_commit_ordered); + mysql_mutex_unlock(&LOCK_commit_ordered); + + mark_xids_active(current_binlog_id, 1); + do_checkpoint_request(current_binlog_id);
why do_checkpoint_request ?
There may be transactions in the current binlog file that are not yet flushed to disk in the engine. If we proceed in RESET MASTER to delete the binlog file, and then crash, then we could lose such transactions. So we request a checkpoint for the current binlog file (we already did so for any previous binlog files). And then wait for this and any other pending checkpoint to complete. Only then can we safely delete binlog files.
@@ -3731,6 +3780,31 @@ bool MYSQL_BIN_LOG::reset_logs(THD* thd)
+ /* + Remove all entries in the xid_count list except the last. + Normally we will just be deleting all the entries that we waited for to + drop to zero above. But if we fail during RESET MASTER for some reason + then we will not have created any new log file, and we may keep the last + of the old entries.
I'm not sure I understand this comment. What's the point of keeping the last of the old entries, if old binlog files are deleted and binary logging is probably stopped (when RESET MASTER fails it's kinda serious)?
The code in sql/log.cc assumes that binlog_xid_count_list is never empty. So even in the error case we must be sure to leave at least one entry in there. So the comment is to explain why we do not simply delete everything in the list at the start, and then insert the new entry on top of that. If we get an error half-way, error handling is just easier by doing the deletion at the end, where we know what we can delete and what not. It is true that RESET MASTER failure is quite serious, and if we cannot open a new binlog after deleting the old ones we are indeed in trouble. And there are very likely legacy bugs lurking in this error case. But with this code, at least no new error handling issues are introduced, and the rest of the code is clean and can rely on having at least one entry in the list.
If some engine confirms that everything below a certain cookie of a particular binlog_id is flushed. Does it automatically mean that all previous binlog_id's are flushed too in this engine?
Yes. This is the relevant comment in handler.h: The TC will periodically (eg. once per binlog rotation) call commit_checkpoint_request(). When this happens, the engine must arrange for all transaction that have completed commit_ordered() to be durably flushed to disk (this does not include transactions that might be in the middle of executing commit_ordered()). When such flush has completed, the engine must call commit_checkpoint_notify_ha(), passing back the opaque "cookie". commit_ordered() calls complete in the same order as transactions are written in the binlog. And we request a checkpoint for a binlog file only when all transactions in it have completed commit_ordered(). So when an engine has flushed all transactions that completed commit_ordered(), this includes everything in the given binlog, as well as anything in previous binlogs.
@@ -5493,7 +5656,27 @@ int MYSQL_BIN_LOG::rotate(bool force_rot if (!write_incident_already_locked(current_thd)) flush_and_sync(0);
- *check_purge= true; + /* + We failed to rotate - so we have to decrement the xid_count back that + we incremented before attempting the rotate. + */
why you don't use mark_xid_done() here?
We cannot call mark_xid_done() here, as we are holding LOCK_log - and mark_xid_done() also takes (sometimes) LOCK_log.
@@ -5920,22 +6115,29 @@ MYSQL_BIN_LOG::write_binlog_checkpoint_e */ if (!ev.write(&log_file) && !flush_and_sync(0)) { - bool check_purge= false; signal_update(); - rotate(false, &check_purge); - if (check_purge) - purge();
Why did you remove rotate/purge from here?
Binlog rotation triggers the writing of checkpoint event in the first place. So it does not seem right that writing checkpoint event can trigger binlog rotation. This could in principle lead to infinite rotation of new binlogs, though one would need a _very_ small max binlog size to trigger that.
+int TC_LOG_MMAP::unlog(ulong cookie, my_xid xid) +{ + pending_cookies *full_buffer= NULL; + DBUG_ASSERT(*(my_xid *)(data+cookie) == xid); + + /* + Do not delete the entry immediately, as there may be participating storage + engines which implement commit_checkpoint_request(), and thus have not yet + flushed the commit durably to disk. + + Instead put it in a queue - and periodically, we will request a checkpoint + from all engines and delete a whole batch at once. + */ + mysql_mutex_lock(&LOCK_pending_checkpoint); + if (pending_checkpoint == NULL) + { + uint32 size= sizeof(*pending_checkpoint); + if (!(pending_checkpoint= + (pending_cookies *)my_malloc(size, MYF(MY_ZEROFILL)))) + { + my_error(ER_OUTOFMEMORY, MYF(0), size); + mysql_mutex_unlock(&LOCK_pending_checkpoint); + return 1; + } + pending_checkpoint->cookies[pending_checkpoint->count++]= cookie;
Hm, should this be under if (pending_checkpoint == NULL) ?
No :-( That seems completely, utterly broken :-( So this means that I must have not tested this code at all. Which makes me sad. Of course, with PBXT dead, I am not sure how it is even possible to test it... ? Thanks for spotting.
+ /* + We do an extra increment and notify here - this ensures that + things work also if there are no engines at all that support + commit_checkpoint_request. + */ + ++full_buffer->pending_count; + ha_commit_checkpoint_request(full_buffer, mmap_do_checkpoint_callback); + commit_checkpoint_notify(full_buffer);
I'm sorry, I don't understand how you handle the case when some engines don't support commit_checkpoint_request
If an engine supports commit_checkpoint_request, then ha_commit_checkpoint_request() will ask the engine to do a checkpoint; and pending_count will be incremented for each such engine to ensure that we wait for each engine to finish the checkpoint before we delete any entries from the on-disk log. If an engine does not support commit_checkpoint_request, then we will not do anything with that engine before deleting entries. But such engines are required to make their transactions durable already in their commit() method, so deletion is always safe. If _no_ engines support ha_commit_checkpoint_request then we would not get any callback to commit_checkpoint_notify() - so we add one extra such a call ourselves, at the end. Also, the increment of pending_count and decrement in the extra commit_checkpoint_notify() call protects against this race: - We call engine A's commit_checkpoint_request() - Before we have time to call engine B's commit_checkpoint_request(), engine A does commit_checkpoint_notify(), so pending_count drops to zero and we delete entries even though B has not had a chance to do a checkpoint yet - We call engine B's commit_checkpoint_request with a void *cookie pointing to freed memory. Hope this clarifies things.
@@ -7869,6 +8196,11 @@ int TC_LOG_BINLOG::recover(LOG_INFO *lin if (!binlog_checkpoint_found) break; first_round= false; + DBUG_EXECUTE_IF("xa_recover_expect_master_bin_000004", + if (0 != strcmp("./master-bin.000004", binlog_checkpoint_name) && + 0 != strcmp(".\\master-bin.000004", binlog_checkpoint_name))
Are you sure the name is always lowercase even on case-insensitive filesystems? I'd use strcasecmp to avoid the need to think about it :) Something like
if (0 != strcasecmp("master-bin.000004", binlog_checkpoint_name+2))
(right, unsafe if strlen(binlog_checkpoint_name) < 2, but it's just a test case)
I don't understand - this has nothing to do with file system case sensitivity. The string is generated inside log.cc based on what mysql-test-run.pl passes to mysqld for the --log-bin option, I don't see how this could change case. And I thought it was clearer to explicitly show the two possible cases for the full file name (otherwise the offset 2 in the comparison is completely magic). But as you said, this is just a test case, I'm happy to change it if you insists (I just did not understand the reason to change it). BTW, as I remember this was for something that was quite hard to test in mysql-test-run. Since recovery happens early in the server startup, before anything mysqltest can do. So I could not think of anything better than to inject an error insert with a configuration option. - Kristian.
Hi, Kristian! On Aug 17, Kristian Nielsen wrote:
Do you want to see the changes for when I implement async commit_checkpoint_request() in InnoDB/XtraDB, before I push it to 10.0?
Yes, but it can be a separate patch.
=== modified file 'storage/xtradb/handler/ha_innodb.cc' --- a/storage/xtradb/handler/ha_innodb.cc 2012-06-15 12:54:23 +0000 +++ b/storage/xtradb/handler/ha_innodb.cc 2012-06-22 10:20:29 +0000 @@ -485,8 +486,11 @@ static MYSQL_THDVAR_ULONG(lock_wait_time static MYSQL_THDVAR_ULONG(flush_log_at_trx_commit, PLUGIN_VAR_OPCMDARG, "Set to 0 (write and flush once per second)," " 1 (write and flush at each commit)" - " or 2 (write at commit, flush once per second).", - NULL, NULL, 1, 0, 2, 0); + " or 2 (write at commit, flush once per second)." + " Can also be set to 3 to write and flush at each commit, even when this is" + " unnecessary because of binlog crash recovery; should not normally be used" + " due to performance degration, but provided for completeness.",
would you like to improve the description on =1 value? For example, 1 (write and flush on prepare or on commit) 3 (write and flush both on prepare and on commit, usually it is redundant)
Yes, good idea. Here is my suggestion:
"Controls the durability/speed trade-off for commits." " Set to 0 (write and flush redo log to disk only once per second)," " 1 (flush to disk at each commit)," " 2 (write to kernel at commit but flush to disk only once per second)"
I find "write to kernel" very confusing. May be "write to log" ?
" or 3 (flush to disk twice per commit, usually redundant)."
I'd like "flush at prepare and at commit, usually redundant", because otherwise one may ask why to flush twice at commit. It's easy to overlook the prepare step and miss the difference between "twice PER commit" and "twice AT commit" (especially as you write "AT commit" above, for =1).
" 1 and 3 guarantees that after a crash, committed transactions will" " not be lost and will be consistent with the binlog and other transactional" " engines. 2 can get inconsistent and lose transactions if there is a" " power failure or kernel crash but not if mysqld crashes. 0 has no" " guarantees in case of crash. 0 and 2 can be faster than 1 or 3.",
Good explanation!
What about making innobase_checkpoint_request() implementation that flushed in the background? At least to serve as an example of the correct usage of your API, if not for performance. And to test your code for cases when the notification isn't coming directly from the checkpoint_request method.
it looks like it's very easy to make innodb to invoke ha_commit_checkpoint_notify from the background flusher thread (the one that syncs once a second). you only need to store the cookie in the log_sys, and invoke ha_commit_checkpoint_notify after the flush when this stored cookie isn't NULL.
Yes, I would like to do this, I will try. It might even help performance, I have seen several blog posts mentioning problems with stalls at binlog rotation, and this might help.
As you say, this should be easy. I need to store a list of cookies and their associated LSN (not just a single cookie), but this is infrequently executed so just a simple linked list with malloc/free of elements will work fine.
I thought you don't need a list. But simply log_sys->cookie= cookie; in innobase_checkpoint_request() and if (log_sys->cookie) { ha_commit_checkpoint_notify(log_sys->cookie); log_sys->cookie= 0; } in log_buffer_sync_in_background(). There can never be two cookies active, right?
+void +ha_commit_checkpoint_notify(handlerton *hton, void *cookie)
are there other ha_xxx methods that a storage engine calls?
typically server calls methods, like, ha_commit_trans, and they invoke storage engine methods. Not the other way around. It is exactly for this reason I used names like "trans_register_ha" - inverse of ha_register_trans - to emphasize that this method is invoked by the engine to call into the server, not by the server to call into the engine.
Ah, thanks, good point. I suggest "commit_checkpoint_notify_ha()", what do you think?
Looks better, thanks!
@@ -5493,7 +5656,27 @@ int MYSQL_BIN_LOG::rotate(bool force_rot if (!write_incident_already_locked(current_thd)) flush_and_sync(0);
- *check_purge= true; + /* + We failed to rotate - so we have to decrement the xid_count back that + we incremented before attempting the rotate. + */
why you don't use mark_xid_done() here?
We cannot call mark_xid_done() here, as we are holding LOCK_log - and mark_xid_done() also takes (sometimes) LOCK_log.
Ok, so here you only decrement the counter, and mark_xid_done() additionally might log a checkpoint event. It looks like it might sense to extend mark_xid_done() with a second argument -TC_LOG_BINLOG::mark_xid_done(ulong cookie) +TC_LOG_BINLOG::mark_xid_done(ulong cookie, bool write_checkpoint) then you'll be able to use it here and avoid code duplication.
@@ -7869,6 +8196,11 @@ int TC_LOG_BINLOG::recover(LOG_INFO *lin if (!binlog_checkpoint_found) break; first_round= false; + DBUG_EXECUTE_IF("xa_recover_expect_master_bin_000004", + if (0 != strcmp("./master-bin.000004", binlog_checkpoint_name) && + 0 != strcmp(".\\master-bin.000004", binlog_checkpoint_name)) ... Something like ... But as you said, this is just a test case, I'm happy to change it if you insists (I just did not understand the reason to change it).
No, that's okay. Keep it your way. Regards, Sergei
Sergei Golubchik <serg@askmonty.org> writes:
On Aug 17, Kristian Nielsen wrote:
Do you want to see the changes for when I implement async commit_checkpoint_request() in InnoDB/XtraDB, before I push it to 10.0?
Yes, but it can be a separate patch.
Ok, will do.
"Controls the durability/speed trade-off for commits." " Set to 0 (write and flush redo log to disk only once per second)," " 1 (flush to disk at each commit)," " 2 (write to kernel at commit but flush to disk only once per second)"
I find "write to kernel" very confusing. May be "write to log" ?
Ok.
" or 3 (flush to disk twice per commit, usually redundant)."
I'd like "flush at prepare and at commit, usually redundant", because otherwise one may ask why to flush twice at commit. It's easy to overlook the prepare step and miss the difference between "twice PER commit" and "twice AT commit" (especially as you write "AT commit" above, for =1).
Ok. Thanks for the suggestions!
As you say, this should be easy. I need to store a list of cookies and their associated LSN (not just a single cookie), but this is infrequently executed so just a simple linked list with malloc/free of elements will work fine.
I thought you don't need a list. But simply
log_sys->cookie= cookie;
in innobase_checkpoint_request() and
if (log_sys->cookie) { ha_commit_checkpoint_notify(log_sys->cookie); log_sys->cookie= 0; }
in log_buffer_sync_in_background().
There can never be two cookies active, right?
Yes, there can. Why not? Checkpointing is now asynchronous, there is nothing that prevents a new log rotate and a new checkpoint request to happen before the old one finished. In fact, this case is tested in binlog_checkpoint.test. Of course in practise, this is unlikely to happen. It requires that the time between binlog rotations is smaller than the time to complete a checkpoint.
We cannot call mark_xid_done() here, as we are holding LOCK_log - and mark_xid_done() also takes (sometimes) LOCK_log.
Ok, so here you only decrement the counter, and mark_xid_done() additionally might log a checkpoint event. It looks like it might sense to extend mark_xid_done() with a second argument
-TC_LOG_BINLOG::mark_xid_done(ulong cookie) +TC_LOG_BINLOG::mark_xid_done(ulong cookie, bool write_checkpoint)
then you'll be able to use it here and avoid code duplication.
Ok, done. Thanks! - Kristian.
I am using cmake -G"CodeBlocks - Unix Makefiles" to load the 5.5.25 project into CodeBlocks for building and editing. Does anybody use KDevelop4 or Codeblocks to build mariadb? If so, how do you set the parameters up so you can run the server in gdb debug? Is there a way to run mysqld without mysql_safe monitor daemon in front of it? md
-----Original Message----- From: maria-developers- bounces+wlad=montyprogram.com@lists.launchpad.net [mailto:maria- developers-bounces+wlad=montyprogram.com@lists.launchpad.net] On Behalf Of md@rpzdesign.com Sent: Freitag, 7. September 2012 03:00 To: maria-developers@lists.launchpad.net Subject: [Maria-developers] IDE debugging Mariadb linux build
I am using cmake -G"CodeBlocks - Unix Makefiles" to load the 5.5.25 project into CodeBlocks for building and editing.
Does anybody use KDevelop4 or Codeblocks to build mariadb?
I used KDevelop some time ago (half a year or so), and Codeblocks long-long ago (still with vanilla MySQL)
If so, how do you set the parameters up so you can run the server in gdb debug?
Debugging and building are slightly different things. CMake does not have builtin support for IDE-debugging, so you need to use a semi-manual method . I found the below method useful 1. Switch to build directory 2. cd mysql-test 3. perl mysql-test-run.pl --start 1st --manual-debug This will create database directory, config file, and write path for mysqld executable together working directory and args to the console. You can copy/paste it into IDE, where debug parameters for executable are stored (I do not know btw where KDevelop4 has parameters for debugging, I just describe the process I used in Visual Studio and Xcode )
Is there a way to run mysqld without mysql_safe monitor daemon in front of it?
Yes, for example the one above.
Vladislov: Here is an update: I tried CodeBlocks -> Huge Fail. Cannot get debugger properly set up. I tried KDevelop 4.2 -> Huge Fail. Cannot get gdb properly working. Netbeans IDE 7.1.2 under Fedora 14 KDE -> WORKS. Make sure you put the program arguments in the RUN "{$OUTPUT}" "--defaults-file=/mysql/my.cnf" "--port=3306" etc It is not obvious that you would put the arguments under the RUN for the DEBUG, but that is the way it works. Netbeans will import the CMAKE output fairly well. (CMakeLists.txt) I have not tried Eclipse, but do not care since Netbeans works. I can now get back to working on code. md On 9/7/2012 3:55 AM, Vladislav Vaintroub wrote:
-----Original Message----- From: maria-developers- bounces+wlad=montyprogram.com@lists.launchpad.net [mailto:maria- developers-bounces+wlad=montyprogram.com@lists.launchpad.net] On Behalf Of md@rpzdesign.com Sent: Freitag, 7. September 2012 03:00 To: maria-developers@lists.launchpad.net Subject: [Maria-developers] IDE debugging Mariadb linux build
I am using cmake -G"CodeBlocks - Unix Makefiles" to load the 5.5.25 project into CodeBlocks for building and editing.
Does anybody use KDevelop4 or Codeblocks to build mariadb? I used KDevelop some time ago (half a year or so), and Codeblocks long-long ago (still with vanilla MySQL)
If so, how do you set the parameters up so you can run the server in gdb debug? Debugging and building are slightly different things. CMake does not have builtin support for IDE-debugging, so you need to use a semi-manual method .
I found the below method useful 1. Switch to build directory 2. cd mysql-test 3. perl mysql-test-run.pl --start 1st --manual-debug
This will create database directory, config file, and write path for mysqld executable together working directory and args to the console. You can copy/paste it into IDE, where debug parameters for executable are stored (I do not know btw where KDevelop4 has parameters for debugging, I just describe the process I used in Visual Studio and Xcode )
Is there a way to run mysqld without mysql_safe monitor daemon in front of it? Yes, for example the one above.
I can report to all the developers for MariaDB on linux that Netbeans 7.1.2 is working VERY well on Fedora 14 - 64 bit. Vladislav indicated he used only Windows Visual studio and OSX Xcode. It has a great Find It Grep support and the debugger is easy to set breakpoints and run the code. Marco On 9/7/2012 3:55 AM, Vladislav Vaintroub wrote:
-----Original Message----- From: maria-developers- bounces+wlad=montyprogram.com@lists.launchpad.net [mailto:maria- developers-bounces+wlad=montyprogram.com@lists.launchpad.net] On Behalf Of md@rpzdesign.com Sent: Freitag, 7. September 2012 03:00 To: maria-developers@lists.launchpad.net Subject: [Maria-developers] IDE debugging Mariadb linux build
I am using cmake -G"CodeBlocks - Unix Makefiles" to load the 5.5.25 project into CodeBlocks for building and editing.
Does anybody use KDevelop4 or Codeblocks to build mariadb? I used KDevelop some time ago (half a year or so), and Codeblocks long-long ago (still with vanilla MySQL)
If so, how do you set the parameters up so you can run the server in gdb debug? Debugging and building are slightly different things. CMake does not have builtin support for IDE-debugging, so you need to use a semi-manual method .
I found the below method useful 1. Switch to build directory 2. cd mysql-test 3. perl mysql-test-run.pl --start 1st --manual-debug
This will create database directory, config file, and write path for mysqld executable together working directory and args to the console. You can copy/paste it into IDE, where debug parameters for executable are stored (I do not know btw where KDevelop4 has parameters for debugging, I just describe the process I used in Visual Studio and Xcode )
Is there a way to run mysqld without mysql_safe monitor daemon in front of it? Yes, for example the one above.
Does anybody have any thoughts on what would be involved with getting Aria Tables to have an option to be memory only (Fully cached with no disk writing, power off means data loss) ARIA support row locking instead of Memory Tables which only have Table locking. Table locking sucks for anything that is heavy read/write. Right now I am using RAM DISKs to store my memory based ARIA tables so I can get decent speed for heavy read/write sections of the DB. But there is a bit of a operations headache where I have to allocate the RAM disks on system boot and the storage strategy of Aria does not release disk blocks when records are deleted, those spaces are left for future records by setting the deleted flag back to undeleted. So the memory pool is not used very effectively. So deleting records from a RAM disk ARIA table means that I have to vaccuum the tables using ALTER table every so often or I could run out of memory. (Very bad for ramdisks) If the ARIA cache routines could allocate memory dynamically from the global heap and return the memory upon delete operations, that would be a better options. And of course, no data is written to disk, just kept in cache. Cheers all, marco On 9/3/2012 6:40 AM, Kristian Nielsen wrote:
Sergei Golubchik <serg@askmonty.org> writes:
On Aug 17, Kristian Nielsen wrote:
Do you want to see the changes for when I implement async commit_checkpoint_request() in InnoDB/XtraDB, before I push it to 10.0? Yes, but it can be a separate patch. Ok, will do.
"Controls the durability/speed trade-off for commits." " Set to 0 (write and flush redo log to disk only once per second)," " 1 (flush to disk at each commit)," " 2 (write to kernel at commit but flush to disk only once per second)" I find "write to kernel" very confusing. May be "write to log" ? Ok.
" or 3 (flush to disk twice per commit, usually redundant)." I'd like "flush at prepare and at commit, usually redundant", because otherwise one may ask why to flush twice at commit. It's easy to overlook the prepare step and miss the difference between "twice PER commit" and "twice AT commit" (especially as you write "AT commit" above, for =1). Ok. Thanks for the suggestions!
As you say, this should be easy. I need to store a list of cookies and their associated LSN (not just a single cookie), but this is infrequently executed so just a simple linked list with malloc/free of elements will work fine. I thought you don't need a list. But simply
log_sys->cookie= cookie;
in innobase_checkpoint_request() and
if (log_sys->cookie) { ha_commit_checkpoint_notify(log_sys->cookie); log_sys->cookie= 0; }
in log_buffer_sync_in_background().
There can never be two cookies active, right? Yes, there can. Why not?
Checkpointing is now asynchronous, there is nothing that prevents a new log rotate and a new checkpoint request to happen before the old one finished. In fact, this case is tested in binlog_checkpoint.test.
Of course in practise, this is unlikely to happen. It requires that the time between binlog rotations is smaller than the time to complete a checkpoint.
We cannot call mark_xid_done() here, as we are holding LOCK_log - and mark_xid_done() also takes (sometimes) LOCK_log. Ok, so here you only decrement the counter, and mark_xid_done() additionally might log a checkpoint event. It looks like it might sense to extend mark_xid_done() with a second argument
-TC_LOG_BINLOG::mark_xid_done(ulong cookie) +TC_LOG_BINLOG::mark_xid_done(ulong cookie, bool write_checkpoint)
then you'll be able to use it here and avoid code duplication. Ok, done.
Thanks!
- Kristian.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Hi, md! On Sep 08, md@rpzdesign.com wrote:
Does anybody have any thoughts on what would be involved with getting Aria Tables to have an option to be memory only (Fully cached with no disk writing, power off means data loss)
It should be quite easy, I suppose. One would just need to add some option that would prevent flushing pages of these tables from the page cache down to disk.
ARIA support row locking instead of Memory Tables which only have Table locking. Table locking sucks for anything that is heavy read/write.
Aria does not support row locking.
Right now I am using RAM DISKs to store my memory based ARIA tables so I can get decent speed for heavy read/write sections of the DB.
But there is a bit of a operations headache where I have to allocate the RAM disks on system boot and the storage strategy of Aria does not release disk blocks when records are deleted, those spaces are left for future records by setting the deleted flag back to undeleted. So the memory pool is not used very effectively.
So deleting records from a RAM disk ARIA table means that I have to vaccuum the tables using ALTER table every so often or I could run out of memory. (Very bad for ramdisks)
If the ARIA cache routines could allocate memory dynamically from the global heap and return the memory upon delete operations, that would be a better options. And of course, no data is written to disk, just kept in cache.
Allocating memory dynamically would be more difficult that simply not flushing pages. But anyway, the point is moot, because Aria uses table-level locking. Regards, Sergei
Wow, that is quite a surprise. You are right. Aria 1.5 is current. What a shame. When is Aria 2.0 estimated for release? http://kb.askmonty.org/en/aria-faq/ Looks like I need to begin to try XtraDB on RAM Disk to see what performance I get with that and basically throw out Aria until 2.0. I NEED row level locking on RAM DISK tables for performance reasons. Another research project for today. On 9/9/2012 12:44 AM, Sergei Golubchik wrote:
Hi, md!
Does anybody have any thoughts on what would be involved with getting Aria Tables to have an option to be memory only (Fully cached with no disk writing, power off means data loss) It should be quite easy, I suppose. One would just need to add some
On Sep 08, md@rpzdesign.com wrote: option that would prevent flushing pages of these tables from the page cache down to disk.
ARIA support row locking instead of Memory Tables which only have Table locking. Table locking sucks for anything that is heavy read/write. Aria does not support row locking.
Right now I am using RAM DISKs to store my memory based ARIA tables so I can get decent speed for heavy read/write sections of the DB.
But there is a bit of a operations headache where I have to allocate the RAM disks on system boot and the storage strategy of Aria does not release disk blocks when records are deleted, those spaces are left for future records by setting the deleted flag back to undeleted. So the memory pool is not used very effectively.
So deleting records from a RAM disk ARIA table means that I have to vaccuum the tables using ALTER table every so often or I could run out of memory. (Very bad for ramdisks)
If the ARIA cache routines could allocate memory dynamically from the global heap and return the memory upon delete operations, that would be a better options. And of course, no data is written to disk, just kept in cache. Allocating memory dynamically would be more difficult that simply not flushing pages. But anyway, the point is moot, because Aria uses table-level locking.
Regards, Sergei
participants (4)
-
Kristian Nielsen
-
md@rpzdesign.com
-
Sergei Golubchik
-
Vladislav Vaintroub