[GH-ISSUE #28] DBDiff says falsely tables are identical #155

Open
opened 2026-03-07 20:41:39 +03:00 by kerem · 14 comments
Owner

Originally created by @Pachat on GitHub (Oct 1, 2016).
Original GitHub issue: https://github.com/DBDiff/DBDiff/issues/28

Depending on the presence of a third field DBDiff detects or not the difference in the second field.

Step to reproduce
1°) Create the a11_elements and a12_elements tables in two databases
2°) In database2, change a value
3°) Run DBDiff

1°)

CREATE TABLE a11_elements (
id int(6) NOT NULL,
show_in_list_summary int(1) DEFAULT NULL,
filter_exact_match int(1) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO a11_elements (id, show_in_list_summary, filter_exact_match) VALUES(646, 1, NULL);
ALTER TABLE a11_elements ADD PRIMARY KEY (id);

CREATE TABLE a12_elements (
id int(6) NOT NULL,
show_in_list_summary int(1) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO a12_elements (id, show_in_list_summary) VALUES (646, 1);
ALTER TABLE a12_elements ADD PRIMARY KEY (id);

2°)

database2.a11_elements.show_in_list_summary=0
database2.a12_elements.show_in_list_summary=0

3°)

php dbdiff --server1=root:root@localhost:3306 --type=data --include=all server1.database1.a11_elements:server1.database2.a11_elements --output=dbdiff_database1to2.sql

Results :

Now calculating data diff for table a11_elements
Identical resources
Completed

php dbdiff --server1=root:root@localhost:3306 --type=data --include=all server1.database1.a12_elements:server1.database2.a12_elements --output=dbdiff_database1to2.sql

Results :

Now calculating data diff for table a12_elements
Now generating UP migration
Now generating DOWN migration
Writing migration file to dbdiff_database1to2.sql
Completed

---------- UP ----------

UPDATE a12_elements SET show_in_list_summary = '0' WHERE id = '646';

---------- DOWN ----------

UPDATE a12_elements SET show_in_list_summary = '1' WHERE id = '646';

Conclusion: With the same context for the first two fields (646 on both databases and respectively 1 / 0 on each database), only when deleting the third field DBDiff detects the difference in the second field.

Originally created by @Pachat on GitHub (Oct 1, 2016). Original GitHub issue: https://github.com/DBDiff/DBDiff/issues/28 Depending on the presence of a third field DBDiff detects or not the difference in the second field. Step to reproduce 1°) Create the a11_elements and a12_elements tables in two databases 2°) In database2, change a value 3°) Run DBDiff 1°) CREATE TABLE `a11_elements` ( `id` int(6) NOT NULL, `show_in_list_summary` int(1) DEFAULT NULL, **`filter_exact_match` int(1) DEFAULT NULL** ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; INSERT INTO `a11_elements` (`id`, `show_in_list_summary`, `filter_exact_match`) VALUES(646, 1, NULL); ALTER TABLE `a11_elements` ADD PRIMARY KEY (`id`); CREATE TABLE `a12_elements` ( `id` int(6) NOT NULL, `show_in_list_summary` int(1) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; INSERT INTO `a12_elements` (`id`, `show_in_list_summary`) VALUES (646, 1); ALTER TABLE `a12_elements` ADD PRIMARY KEY (`id`); 2°) database2.a11_elements.`show_in_list_summary`=0 database2.a12_elements.`show_in_list_summary`=0 3°) php dbdiff --server1=root:root@localhost:3306 --type=data --include=all server1.database1.a11_elements:server1.database2.a11_elements --output=dbdiff_database1to2.sql Results : > Now calculating data diff for table a11_elements > **Identical resources** > Completed ## php dbdiff --server1=root:root@localhost:3306 --type=data --include=all server1.database1.a12_elements:server1.database2.a12_elements --output=dbdiff_database1to2.sql Results : > Now calculating data diff for table a12_elements > **Now generating UP migration** > **Now generating DOWN migration** > **Writing migration file to dbdiff_database1to2.sql** > Completed # ---------- UP ---------- UPDATE `a12_elements` SET `show_in_list_summary` = '0' WHERE `id` = '646'; # ---------- DOWN ---------- UPDATE `a12_elements` SET `show_in_list_summary` = '1' WHERE `id` = '646'; _Conclusion: With the same context for the first two fields (646 on both databases and respectively 1 / 0 on each database), only when deleting the third field DBDiff detects the difference in the second field._
Author
Owner

@djshen-ponddy commented on GitHub (Oct 5, 2016):

What if filter_exact_match is not NULL ?

<!-- gh-comment-id:251630073 --> @djshen-ponddy commented on GitHub (Oct 5, 2016): What if `filter_exact_match` is not NULL ?
Author
Owner

@Pachat commented on GitHub (Oct 5, 2016):

When filter_exact_match is not NULL, the result is correct.

Step to reproduce

Mydatabase1
CREATE TABLE a13_elements (
id int(6) NOT NULL,
show_in_list_summary int(1) DEFAULT NULL,
filter_exact_match int(1) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO a13_elements (id, show_in_list_summary, filter_exact_match) VALUES(646, 1, 0);
ALTER TABLE a13_elements ADD PRIMARY KEY (id);

Mydatabase2
CREATE TABLE a13_elements (
id int(6) NOT NULL,
show_in_list_summary int(1) DEFAULT NULL,
filter_exact_match int(1) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO a13_elements (id, show_in_list_summary, filter_exact_match) VALUES(646, 0, 0);
ALTER TABLE a13_elements ADD PRIMARY KEY (id);

php dbdiff --server1=root:root@localhost:3306 --type=data --include=all server1.mydatabase1.a13_elements:server1.mydatabase2.a13_elements --output=dbdiff_13.sql

Now calculating data diff for table a13_elements
Now generating UP migration
Now generating DOWN migration
Writing migration file to dbdiff_13.sql
Completed

So, the result that is still incorrect is for a11_elements (see first post) when
filter_exact_match is NULL
(which may happen)

Does this gives a direction for the correction of the bug?

<!-- gh-comment-id:251750130 --> @Pachat commented on GitHub (Oct 5, 2016): When filter_exact_match is not NULL, the result is correct. ## Step to reproduce Mydatabase1 CREATE TABLE a13_elements ( id int(6) NOT NULL, show_in_list_summary int(1) DEFAULT NULL, filter_exact_match int(1) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; INSERT INTO a13_elements (id, show_in_list_summary, filter_exact_match) VALUES(646, **1, 0**); ALTER TABLE a13_elements ADD PRIMARY KEY (id); Mydatabase2 CREATE TABLE a13_elements ( id int(6) NOT NULL, show_in_list_summary int(1) DEFAULT NULL, filter_exact_match int(1) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; INSERT INTO a13_elements (id, show_in_list_summary, filter_exact_match) VALUES(646, **0, 0**); ALTER TABLE a13_elements ADD PRIMARY KEY (id); php dbdiff --server1=root:root@localhost:3306 --type=data --include=all server1.mydatabase1.a13_elements:server1.mydatabase2.a13_elements --output=dbdiff_13.sql > Now calculating data diff for table `a13_elements` > Now generating UP migration > Now generating DOWN migration > Writing migration file to dbdiff_13.sql > Completed So, the result that is still incorrect is for a11_elements (see first post) when **filter_exact_match is NULL** (which may happen) Does this gives a direction for the correction of the bug?
Author
Owner

@djshen-ponddy commented on GitHub (Oct 6, 2016):

I think this bug is the same as #9 .

In getChangeDiff` of src/DB/Data/LocalTableData.php, I replace NULL value by ''.

        $wrapCast = function($arr, $p) {
            return array_map(function($el) use ($p) {
                return "CAST(IFNULL(`{$p}`.`{$el}`, '') AS CHAR CHARACTER SET utf8)";
            }, $arr);
        };
<!-- gh-comment-id:251845351 --> @djshen-ponddy commented on GitHub (Oct 6, 2016): I think this bug is the same as #9 . In getChangeDiff` of src/DB/Data/LocalTableData.php, I replace NULL value by ''. ``` php $wrapCast = function($arr, $p) { return array_map(function($el) use ($p) { return "CAST(IFNULL(`{$p}`.`{$el}`, '') AS CHAR CHARACTER SET utf8)"; }, $arr); }; ```
Author
Owner

@Pachat commented on GitHub (Oct 11, 2016):

@djshen-ponddy
That's an interesting proposal. However, it makes NULL and '' equivalent. Which might not be the case.
A NULL value by default, may become '' when the user decides to leave it blank.
NULL shows that the user did not update the value, '' shows that the user updated to blank.

We may have to use explicitly IS NULL functions to compare?

<!-- gh-comment-id:252960782 --> @Pachat commented on GitHub (Oct 11, 2016): @djshen-ponddy That's an interesting proposal. However, it makes NULL and '' equivalent. Which might not be the case. A NULL value by default, may become '' when the user decides to leave it blank. NULL shows that the user did not update the value, '' shows that the user updated to blank. We may have to use explicitly IS NULL functions to compare?
Author
Owner

@djshen-ponddy commented on GitHub (Oct 12, 2016):

So we have to replace NULL value and use ISNULL as a flag for each field, and then concatenate all fields and flags and calculate md5 hash?

<!-- gh-comment-id:253094113 --> @djshen-ponddy commented on GitHub (Oct 12, 2016): So we have to replace NULL value and use ISNULL as a flag for each field, and then concatenate all fields and flags and calculate md5 hash?
Author
Owner

@Pachat commented on GitHub (Oct 12, 2016):

It looks like the culprit here is MD5 or any other PHP hash function. For example:

$str1 = '';
$str2 = NULL;
echo '
md5='.(md5($str1)===md5($str2)); // answers 1 (for TRUE)
echo '
strcmp='.strcmp($str1, $str2); // answers 0 (for TRUE, as 0 is strcmp's answer when TRUE)

I think that, indeed, we cannot avoid the use of ISNULL to get the difference

<!-- gh-comment-id:253142915 --> @Pachat commented on GitHub (Oct 12, 2016): It looks like the culprit here is MD5 or any other PHP hash function. For example: > $str1 = ''; > $str2 = NULL; > echo '<br />md5='.(md5($str1)===md5($str2)); // answers 1 (for TRUE) > echo '<br />strcmp='.strcmp($str1, $str2); // answers 0 (for TRUE, as 0 is strcmp's answer when TRUE) I think that, indeed, we cannot avoid the use of ISNULL to get the difference
Author
Owner

@jasdeepkhalsa commented on GitHub (Oct 12, 2016):

Oh dear! We do need to use hashes in some way to make it easier to compare the data...

If hashes are the problem I wonder how GIT manages to do it! (Maybe we'll have to ask them for their secret)

<!-- gh-comment-id:253159203 --> @jasdeepkhalsa commented on GitHub (Oct 12, 2016): Oh dear! We do need to use hashes in some way to make it easier to compare the data... If hashes are the problem I wonder how GIT manages to do it! (Maybe we'll have to ask them for their secret)
Author
Owner

@djshen-ponddy commented on GitHub (Oct 12, 2016):

SELECT CONCAT("a", NULL) is NULL and SELECT MD5(NULL) is NULL too.
I think we can still use hashes to compare the data, but we need to handle NULL value to avoid the problem above.

MySQL CONCAT and MD5

<!-- gh-comment-id:253168204 --> @djshen-ponddy commented on GitHub (Oct 12, 2016): `SELECT CONCAT("a", NULL)` is NULL and `SELECT MD5(NULL)` is NULL too. I think we can still use hashes to compare the data, but we need to handle NULL value to avoid the problem above. MySQL [CONCAT](http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_concat) and [MD5](http://dev.mysql.com/doc/refman/5.7/en/encryption-functions.html#function_md5)
Author
Owner

@rekitae commented on GitHub (Oct 13, 2016):

I had the same problem.
There is also a way to solve CONCAT_WS and COALESCE functions.

How COALESCE

$ WrapCast = function ($ arr, $ p) {
             return array_map (function ($ el) use ($ p) {
                 return "COALESCE (CAST (` {$ p} `.` {$ el} `AS CHAR CHARACTER SET utf8),'')";
             }, $ Arr);
         };

How CONCAT_WS

MD5(concat_ws('', $columnsA)) AS hash1,
MD5(concat_ws('', $columnsB)) AS hash2
<!-- gh-comment-id:253420435 --> @rekitae commented on GitHub (Oct 13, 2016): I had the same problem. There is also a way to solve CONCAT_WS and COALESCE functions. How COALESCE ``` php $ WrapCast = function ($ arr, $ p) {              return array_map (function ($ el) use ($ p) {                  return "COALESCE (CAST (` {$ p} `.` {$ el} `AS CHAR CHARACTER SET utf8),'')";              }, $ Arr);          }; ``` How CONCAT_WS ``` MD5(concat_ws('', $columnsA)) AS hash1, MD5(concat_ws('', $columnsB)) AS hash2 ```
Author
Owner

@djshen-ponddy commented on GitHub (Oct 13, 2016):

I found another problem about CONCAT.
The two records, (1, 'abc', 'de') and (1, 'abcde', ''), are both concatenated into '1abcde', so their MD5 hashes are identical.

<!-- gh-comment-id:253425444 --> @djshen-ponddy commented on GitHub (Oct 13, 2016): I found another problem about CONCAT. The two records, (1, 'abc', 'de') and (1, 'abcde', ''), are both concatenated into '1abcde', so their MD5 hashes are identical.
Author
Owner

@rekitae commented on GitHub (Oct 13, 2016):

concat_ws seems to be a good alternative.
The example below

select concat_ws('|', 1, 'abc', 'de'), concat_ws('|', 1, 'abcde');
+--------------------------------+----------------------------+
| concat_ws('|', 1, 'abc', 'de') | concat_ws('|', 1, 'abcde') |
+--------------------------------+----------------------------+
| 1|abc|de                       | 1|abcde                    |
+--------------------------------+----------------------------+
<!-- gh-comment-id:253432826 --> @rekitae commented on GitHub (Oct 13, 2016): concat_ws seems to be a good alternative. The example below ``` select concat_ws('|', 1, 'abc', 'de'), concat_ws('|', 1, 'abcde'); +--------------------------------+----------------------------+ | concat_ws('|', 1, 'abc', 'de') | concat_ws('|', 1, 'abcde') | +--------------------------------+----------------------------+ | 1|abc|de | 1|abcde | +--------------------------------+----------------------------+ ```
Author
Owner

@djshen-ponddy commented on GitHub (Oct 13, 2016):

I think we need a special separator (magic number) to reduce the probability of occurrence of this problem.

select concat_ws('|', 1, 'abc', 'de|'), concat_ws('|', 1, 'abc|de', '');
+---------------------------------+---------------------------------+
| concat_ws('|', 1, 'abc', 'de|') | concat_ws('|', 1, 'abc|de', '') |
+---------------------------------+---------------------------------+
| 1|abc|de|                       | 1|abc|de|                       |
+---------------------------------+---------------------------------+
<!-- gh-comment-id:253435437 --> @djshen-ponddy commented on GitHub (Oct 13, 2016): I think we need a special separator (magic number) to reduce the probability of occurrence of this problem. ``` sql select concat_ws('|', 1, 'abc', 'de|'), concat_ws('|', 1, 'abc|de', ''); +---------------------------------+---------------------------------+ | concat_ws('|', 1, 'abc', 'de|') | concat_ws('|', 1, 'abc|de', '') | +---------------------------------+---------------------------------+ | 1|abc|de| | 1|abc|de| | +---------------------------------+---------------------------------+ ```
Author
Owner

@isgroup-srl commented on GitHub (Jul 30, 2017):

@djshen-ponddy you can add data (a space) to the string before calling md5() to avoid the issue md5() returning null if the data is null. This way you will not have any collision (eg: "NULL" and " " will not have the same MD5 as the string will become " " and " "). With concat_ws() you cannot reliably add a separator that is not in the data itself, btw using a random token of enough entropy will make collisions unlikely.

<!-- gh-comment-id:318901918 --> @isgroup-srl commented on GitHub (Jul 30, 2017): @djshen-ponddy you can add data (a space) to the string before calling md5() to avoid the issue md5() returning null if the data is null. This way you will not have any collision (eg: "NULL" and " " will not have the same MD5 as the string will become " " and " "). With concat_ws() you cannot reliably add a separator that is not in the data itself, btw using a random token of enough entropy will make collisions unlikely.
Author
Owner

@SergioMBS commented on GitHub (Oct 15, 2017):

concat_ws seems to work, why hasn't this been changed in the code?

<!-- gh-comment-id:336721979 --> @SergioMBS commented on GitHub (Oct 15, 2017): concat_ws seems to work, why hasn't this been changed in the code?
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/DBDiff#155
No description provided.