Hi, Sachin! Unfortunately, your comment is rather difficult to understand. What about this one: Problem: The command was: find $paths -mindepth 1 -regex $cpat -prune -o -exec rm -rf {} \+ Which was supposed to work as * skipping $paths directories themselves (-mindepth 1) * see if the dir/file name matches $cpat (-regex) * if yes - don't dive into the directory, skip it (-prune) * otherwise (-o) * remove it and everything inside (-exec) Now -exec ... \+ works like this: every new found path is appended to the end of the command line. when accumulated command line length reaches `getconf ARG_MAX` (~2Gb) it's executed, and find continues, appending to a new command line. What happens here, find appends some directory to the command line, then dives into it, and starts appending files from that directory. At some point command line overflows, rm -rf gets executed and removes the whole directory. Now find tries to continue scanning the directory that was already removed. Fix: don't dive into directories that will be recursively removed anyway, use -prune for them. Basically, we should be pruning both paths that have matched $cpat and paths that have not matched it. This is achived by pruning unconditionally, before the regex is tested: find $paths -mindepth 1 -prune -regex $cpat -o -exec rm -rf {} \+ On Dec 19, sachin wrote:
revision-id: 7a7ad82029a6c78d31d6736a562c12d02c4d968c (mariadb-galera-5.5.58-3-g7a7ad82) parent(s): e6e026ae51a77969749de201d491a176483bbc69 author: Sachin Setiya committer: Sachin Setiya timestamp: 2017-12-19 22:30:43 +0530 message:
MDEV-13478 Full SST sync fails because of the error in the cleaning part
Problem:- The problem is in wsrep_sst_xtrabackup-v2.sh we use find $ib_home_dir $ib_log_dir $ib_undo_dir $DATA -mindepth 1 -regex $cpat -prune -o -exec rm -rfv {} 1>&2 \+ the problem is that since we have '\+' in end that means all output will be expanded after rm -rfv . If we have really large database( quite a no of tables with big names) then this create a problem. This will result in calling 'rm -rvf' two time(or may be more). So non deterministicly this might that upto directory name went to rm and remaining was truncated. For example consider a folder xyz with lots of files. We executed find xyz -exec rm -rfv {} \+ Since we have like millions of file in rm, So it will be greater then ARG_MAX , so there will be multiple rm invocation. Say in nth invocation this might happen rm ....... xyz/ {So it get truncated at xyz/} Above will remove the whole xyz directory and remaining invocation of rm will return error. How ever this type of error is non deterministic so is bug. Solution:- In above example if instead of removing each file in xyz if we remove xyz then we have our solution :). Actually if we shift the -prune term in find we can get to solution. Why currently find is working like this find ( -regex && -prune) || -exec that means if -regex is true (-prune will always return true) then exec (and hence rm )wont work. But if regex fails then exec is applied with out -prune which makes rm delete each single file instead of folder. So we change the position of regex and prune then prune will always be applied whether regex is true or not. find (-prune && -regex) || -exec
Patch Credit:- Serg
--- scripts/wsrep_sst_xtrabackup-v2.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/scripts/wsrep_sst_xtrabackup-v2.sh b/scripts/wsrep_sst_xtrabackup-v2.sh index 327c92f..00d8fe2 100644 --- a/scripts/wsrep_sst_xtrabackup-v2.sh +++ b/scripts/wsrep_sst_xtrabackup-v2.sh @@ -863,7 +863,7 @@ then
wsrep_log_info "Cleaning the existing datadir and innodb-data/log directories" - find $ib_home_dir $ib_log_dir $ib_undo_dir $DATA -mindepth 1 -regex $cpat -prune -o -exec rm -rfv {} 1>&2 \+ + find $ib_home_dir $ib_log_dir $ib_undo_dir $DATA -mindepth 1 -prune -regex $cpat -o -exec rm -rfv {} 1>&2 \+
tempdir=$(parse_cnf mysqld log-bin "") if [[ -n ${tempdir:-} ]];then
Regards, Sergei Chief Architect MariaDB and security@mariadb.org