[GH-ISSUE #22] Incomplete Diff #11

Closed
opened 2026-02-25 21:33:01 +03:00 by kerem · 6 comments
Owner

Originally created by @ckinnane on GitHub (Aug 19, 2016).
Original GitHub issue: https://github.com/DBDiff/DBDiff/issues/22

Perhaps you could include an option to prevent ignoring tables that have the same count().

Originally created by @ckinnane on GitHub (Aug 19, 2016). Original GitHub issue: https://github.com/DBDiff/DBDiff/issues/22 Perhaps you could include an option to prevent ignoring tables that have the same count().
kerem closed this issue 2026-02-25 21:33:01 +03:00
Author
Owner

@jasdeepkhalsa commented on GitHub (Aug 22, 2016):

I agree this could be useful in many scenarios, as the first part of the data check is to:

And then for each table, the table storage type (e.g. MyISAM, CSV), the collation (e.g. utf8_general_ci), and number of rows are compared, in that order. If there are any differences they are noted before moving onto the next test

So the options for --type=schema or data or all could be expanded to be more fine-grained, or alternatively, as you suggest we could have an --ignore flag. I would prefer going for the former, instead of adding too many flags.

<!-- gh-comment-id:241429419 --> @jasdeepkhalsa commented on GitHub (Aug 22, 2016): I agree this could be useful in many scenarios, as the first part of the data check is to: `And then for each table, the table storage type (e.g. MyISAM, CSV), the collation (e.g. utf8_general_ci), and number of rows are compared, in that order. If there are any differences they are noted before moving onto the next test` So the options for `--type=schema or data or all` could be expanded to be more fine-grained, or alternatively, as you suggest we could have an `--ignore` flag. I would prefer going for the former, instead of adding too many flags.
Author
Owner

@ckinnane commented on GitHub (Sep 2, 2016):

Hi Jasdeep.
I do like this ultility, it is quite useful for when I need to
maintain certain tables with their data while upgrading the structure and
content of other parts of a database.

I will need to find a way to always analyse tables even when they have the
same number of rows though, which I should be able to do.

What confuses me is why the comparison is done this way in the first place.
It seems that a basic compare of the count of rows is a way to speed up the
analysis, but at the expense of accuracy.

The first time I used it, it completely skipped the data comparison for
users, because even though the list was different, they had the same number
of rows.

Such a good utility should really be accurate by default and then have
speed optimisations that might compromise integrity as options in my
opinion.

I'll see if I can make a patch.

Thanks for the great work.

Corley.

On Tue, Aug 23, 2016 at 12:25 AM, Jasdeep Khalsa notifications@github.com
wrote:

I agree this could be useful in many scenarios, as the first part of the
data check is to:

And then for each table, the table storage type (e.g. MyISAM, CSV), the
collation (e.g. utf8_general_ci), and number of rows are compared, in that
order. If there are any differences they are noted before moving onto the
next test

So the options for --type=schema or data or all could be expanded to be
more fine-grained, or alternatively, as you suggest we could have an
--ignore flag. I would prefer going for the former, instead of adding too
many flags.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/DBDiff/DBDiff/issues/22#issuecomment-241429419, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAeJOuKfuPgfPzygflO3GI0ZKrA_7PD-ks5qibFQgaJpZM4JoLqy
.

Time keeps turning round,
It knows little of its past,
Stay upon your path.

corley.kinnane.net

<!-- gh-comment-id:244251153 --> @ckinnane commented on GitHub (Sep 2, 2016): Hi Jasdeep. I do like this ultility, it is quite useful for when I need to maintain certain tables with their data while upgrading the structure and content of other parts of a database. I will need to find a way to always analyse tables even when they have the same number of rows though, which I should be able to do. What confuses me is why the comparison is done this way in the first place. It seems that a basic compare of the count of rows is a way to speed up the analysis, but at the expense of accuracy. The first time I used it, it completely skipped the data comparison for users, because even though the list was different, they had the same number of rows. Such a good utility should really be accurate by default and then have speed optimisations that might compromise integrity as options in my opinion. I'll see if I can make a patch. Thanks for the great work. Corley. On Tue, Aug 23, 2016 at 12:25 AM, Jasdeep Khalsa notifications@github.com wrote: > I agree this could be useful in many scenarios, as the first part of the > data check is to: > > And then for each table, the table storage type (e.g. MyISAM, CSV), the > collation (e.g. utf8_general_ci), and number of rows are compared, in that > order. If there are any differences they are noted before moving onto the > next test > > So the options for --type=schema or data or all could be expanded to be > more fine-grained, or alternatively, as you suggest we could have an > --ignore flag. I would prefer going for the former, instead of adding too > many flags. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > https://github.com/DBDiff/DBDiff/issues/22#issuecomment-241429419, or mute > the thread > https://github.com/notifications/unsubscribe-auth/AAeJOuKfuPgfPzygflO3GI0ZKrA_7PD-ks5qibFQgaJpZM4JoLqy > . ## Time keeps turning round, It knows little of its past, Stay upon your path. corley.kinnane.net
Author
Owner

@jasdeepkhalsa commented on GitHub (Sep 2, 2016):

Oh I thought you meant something else, this should definitely not be happening.

Could you please post your .dbdiff file here (with sensitive details removed) along with the command you're using to initiate the comparison please?

I'd like to investigate further.

<!-- gh-comment-id:244315750 --> @jasdeepkhalsa commented on GitHub (Sep 2, 2016): Oh I thought you meant something else, this should definitely not be happening. Could you please post your `.dbdiff` file here (with sensitive details removed) along with the command you're using to initiate the comparison please? I'd like to investigate further.
Author
Owner

@ckinnane commented on GitHub (Sep 5, 2016):

I had 2 databases jms and jms2 that were very similar, the user table had the same number of rows and the migration list seemed to skip the user table completely. I can't give you the exact example now as the data has changed and it appears to be working normally.

When I read the description, I assumed this was the intended behaviour, but I do see its a bug to skip a table for this reason alone.

I couldn't find the reference to "count" in the code that would exclude tables, probably because it isn't there. I could only find count being used to acquire the next n rows from a table and shift the offset.

I had to specify the servers on the command line as well as the 2 databases, both from the same server.

<!-- gh-comment-id:244671506 --> @ckinnane commented on GitHub (Sep 5, 2016): I had 2 databases jms and jms2 that were very similar, the user table had the same number of rows and the migration list seemed to skip the user table completely. I can't give you the exact example now as the data has changed and it appears to be working normally. When I read the description, I assumed this was the intended behaviour, but I do see its a bug to skip a table for this reason alone. I couldn't find the reference to "count" in the code that would exclude tables, probably because it isn't there. I could only find count being used to acquire the next n rows from a table and shift the offset. I had to specify the servers on the command line as well as the 2 databases, both from the same server.
Author
Owner

@ckinnane commented on GitHub (Sep 5, 2016):

The user table definitely had different data in a few rows, but the same row count.

<!-- gh-comment-id:244671729 --> @ckinnane commented on GitHub (Sep 5, 2016): The user table definitely had different data in a few rows, but the same row count.
Author
Owner

@jasdeepkhalsa commented on GitHub (Nov 24, 2017):

I'm closing this for now as there's no way to reproduce this original issue of skipped tables,

and certainly adding this as a feature to just skip tables with the same row count would - as you say - reduce the accuracy of comparisons and this would not make sense for a diff tool like DBDiff

<!-- gh-comment-id:346873461 --> @jasdeepkhalsa commented on GitHub (Nov 24, 2017): I'm closing this for now as there's no way to reproduce this original issue of skipped tables, and certainly adding this as a feature to just skip tables with the same row count would - as you say - reduce the accuracy of comparisons and this would not make sense for a diff tool like DBDiff
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/DBDiff#11
No description provided.