Deduplicator - What criteria does it use to find and remove duplicates?

I’d like to know what criteria the deduplicator tool (on versions 6 and 7) uses to identify duplicate emails within a folder and/or across multiple folders? 

I tested it out on 10 emails and eM Client identified them as duplicates when there were obvious differences in the body of the emails. Only the subject line, sender and date stamp was identical. 

Since I’ve used it to clean a very large email archive in the past, I am now concerned that I may have lost important emails if it was in fact identifying emails as duplicate even if the body of the emails were entirely different.

I ran it on a local folder last night, with disastrous results. Fortunately I had a backup with WLM that I was able to restore.

I ran it on a local folder, and it found about 300 duplicates. I clicked “process”, and it deleted all the emails in the folder…not just the duplicates. I found them all in the trash folder, and clicked Edit > Undo. It then deleted them from the trash folder as if it were restoring them, but they never made it back to the original folder. It just deleted them all permanently.

I’ve also discovered that I now have NEW unrelated duplicate emails all over my server since installing and trying EMC, with no real way to safely delete them.

Lovely.

I also noticed that if I click on the hyperlink of one or more of the listed emails (the one on the right that says “1 duplicate” or whatever, and shows the keep/remove message at the bottom)…  then it seems to only remove what it should. But not clicking on one of those first and going straight to the “Process” button just zaps everything.

Hello,
the deduplicator tool definitely compares Subject, Date-time, Sender and Recipients info and Message ID.
Could you please send us an example of different messages that were marked as duplicates even if they have different body?
Send these as exported EML files to rust@emclient.com with a link to this forum thread.
Thank you.

Regards,
Olivia

Hi Olivia, I no longer have those emails to send. Sorry. I have a very large email backup well over 15gb and I was using the deduplicator to merge two separate backup copies of my inbox and that’s when I noticed the problem. By the way, I have my account setup as POP instead of IMAP.

Fortunately for me, I had to use The Bat! to perform all my mailbox cleaning and de-duping, then I exported all the emails as .eml files and imported them back into EM Client. 

I just imported 91 .eml files exported from the webmail client Zimbra. When I ran the deduplicator tool on that single folder it found one duplicate. I proceeded to remove the duplicate but eM Client deleted all 91 emails instead. They are nowhere to be found – not even in the Trash bin. This is a big concern!

https://forum.emclient.com/emclient/topics/i-highly-suggest-you-dont-use-the-deduplicator-tool-or-at…

Hello,
the issue is currently only with the ‘Move to Trash’ action and we are working on a fix.
We apologize for any loss caused.

Regards,
Olivia

It would help greatly if people would include what rev eMC they’re using. I’m on 6.0.249… and have seen no problems with this at all. Did I just get lucky or it this only a rev 7 problem?

Most of the issues mentioned on the forums lately are with v7, as it is pretty unstable and apparently still needs a lot of work. There doesn’t seem to be nearly as many issues with v6.

They just released 7.0.27894.0, and the extremely vague release notes says “Fixed some issues in deduplicator”…whatever that means.

I’d recommend going back to 6. While I did just do a “repair” with a fresh download of 6 to fix a condition where  the app would not launch because a previous eMC process was still lurking in memory and you can’t run 2 at once, It’s been very solid for some years. The bandaid fix for this was a reboot and start over.  I won’t know for a few days if this recent repair fixed the above issue. Other than that, I find the deduplicator function is working fine for me and I used it on my full inbox with, at the time, over 13,000 entries! Yeah, some housekeeping is in order.

I’ve just signed up just to add my piece. Deduplicator just doesn’t work. Yes, it finds loads of duplicate messages on files that I know to be duplicates, but it found duplicates in messages just because they had the same size, similar names and similar recipients and senders. But when you look at the actual files connected to these files, such as HTML files that have different codes in them, it failed to find the duplicates only and called every single file in a folder a duplicate even though most of these were individual emails and not duplicates.

I like how fast Em Client is, but I can’t use this knowing I have to check absolutely everything to be certain its not deleting any files I want to keep.

I’ve had to go through a lot of folders as I’ve had multiple PC failures from parents etc and backed stuff up anyway I can. Now i’ve got a chance to go through everything to get rid of duplicates and this is the one app that i’ve found doesn’t do what I was hoping for. 

I think EM Client needs to give control to the end user and give us a choice of what we want checked, so at least that way we know what is being covered.

Oh well, looks like im going back to thunderbird.

Considering trying the Deduplicator on emails, does it work in version 8?

Yes, it works.

Depending on which section of eM Client you use it on you will have different options on what to do with the duplicates. For messages and events that is delete or move because it is looking for exact duplicates. Contacts are a little different, and that section also allows you to merge the duplicates because it looks for similar contacts.

Ok, I am mostly interested in mail and folder dups. Does it compare folders as well or just at the message level?

It compares messages. You can choose to compare all messages within a single folder, or across a selection of folders.