Deduplicator - What criteria does it use to find and remove duplicates?

I’d like to know what criteria the deduplicator tool (on versions 6 and 7) uses to identify duplicate emails within a folder and/or across multiple folders? 

I tested it out on 10 emails and eM Client identified them as duplicates when there were obvious differences in the body of the emails. Only the subject line, sender and date stamp was identical. 

Since I’ve used it to clean a very large email archive in the past, I am now concerned that I may have lost important emails if it was in fact identifying emails as duplicate even if the body of the emails were entirely different.

I ran it on a local folder last night, with disastrous results. Fortunately I had a backup with WLM that I was able to restore.

I ran it on a local folder, and it found about 300 duplicates. I clicked “process”, and it deleted all the emails in the folder…not just the duplicates. I found them all in the trash folder, and clicked Edit > Undo. It then deleted them from the trash folder as if it were restoring them, but they never made it back to the original folder. It just deleted them all permanently.

I’ve also discovered that I now have NEW unrelated duplicate emails all over my server since installing and trying EMC, with no real way to safely delete them.

Lovely.

I also noticed that if I click on the hyperlink of one or more of the listed emails (the one on the right that says “1 duplicate” or whatever, and shows the keep/remove message at the bottom)…  then it seems to only remove what it should. But not clicking on one of those first and going straight to the “Process” button just zaps everything.

Hello,
the deduplicator tool definitely compares Subject, Date-time, Sender and Recipients info and Message ID.
Could you please send us an example of different messages that were marked as duplicates even if they have different body?
Send these as exported EML files to [email protected] with a link to this forum thread.
Thank you.

Regards,
Olivia

Hi Olivia, I no longer have those emails to send. Sorry. I have a very large email backup well over 15gb and I was using the deduplicator to merge two separate backup copies of my inbox and that’s when I noticed the problem. By the way, I have my account setup as POP instead of IMAP.

Fortunately for me, I had to use The Bat! to perform all my mailbox cleaning and de-duping, then I exported all the emails as .eml files and imported them back into EM Client. 

I just imported 91 .eml files exported from the webmail client Zimbra. When I ran the deduplicator tool on that single folder it found one duplicate. I proceeded to remove the duplicate but eM Client deleted all 91 emails instead. They are nowhere to be found – not even in the Trash bin. This is a big concern!

https://forum.emclient.com/emclient/topics/i-highly-suggest-you-dont-use-the-deduplicator-tool-or-at…

Hello,
the issue is currently only with the ‘Move to Trash’ action and we are working on a fix.
We apologize for any loss caused.

Regards,
Olivia

It would help greatly if people would include what rev eMC they’re using. I’m on 6.0.249… and have seen no problems with this at all. Did I just get lucky or it this only a rev 7 problem?

Most of the issues mentioned on the forums lately are with v7, as it is pretty unstable and apparently still needs a lot of work. There doesn’t seem to be nearly as many issues with v6.

They just released 7.0.27894.0, and the extremely vague release notes says “Fixed some issues in deduplicator”…whatever that means.

I’d recommend going back to 6. While I did just do a “repair” with a fresh download of 6 to fix a condition where  the app would not launch because a previous eMC process was still lurking in memory and you can’t run 2 at once, It’s been very solid for some years. The bandaid fix for this was a reboot and start over.  I won’t know for a few days if this recent repair fixed the above issue. Other than that, I find the deduplicator function is working fine for me and I used it on my full inbox with, at the time, over 13,000 entries! Yeah, some housekeeping is in order.

I’ve just signed up just to add my piece. Deduplicator just doesn’t work. Yes, it finds loads of duplicate messages on files that I know to be duplicates, but it found duplicates in messages just because they had the same size, similar names and similar recipients and senders. But when you look at the actual files connected to these files, such as HTML files that have different codes in them, it failed to find the duplicates only and called every single file in a folder a duplicate even though most of these were individual emails and not duplicates.

I like how fast Em Client is, but I can’t use this knowing I have to check absolutely everything to be certain its not deleting any files I want to keep.

I’ve had to go through a lot of folders as I’ve had multiple PC failures from parents etc and backed stuff up anyway I can. Now i’ve got a chance to go through everything to get rid of duplicates and this is the one app that i’ve found doesn’t do what I was hoping for. 

I think EM Client needs to give control to the end user and give us a choice of what we want checked, so at least that way we know what is being covered.

Oh well, looks like im going back to thunderbird.

Considering trying the Deduplicator on emails, does it work in version 8?

Yes, it works.

Depending on which section of eM Client you use it on you will have different options on what to do with the duplicates. For messages and events that is delete or move because it is looking for exact duplicates. Contacts are a little different, and that section also allows you to merge the duplicates because it looks for similar contacts.

Ok, I am mostly interested in mail and folder dups. Does it compare folders as well or just at the message level?

It compares messages. You can choose to compare all messages within a single folder, or across a selection of folders.

Hi eM’ies :ice_cream:

this thread is quite old but it seems, there is no further development in this area?
As my question belongs (nearly) to the same topic

I really love the eMClient, but the “Deduplication” extra broke my heart. i cant believe that this extra was written from the same developers as the eMClient. Otherwise im really scared.

No way to choose which “Duplicate” to choose for deletion? (searchresult over more folders)
Or did i miss somethig? i dont think that this feature is given in the commercial version?

btw. in my world, “deduplilcation” is not the same like duplicates, but anyway.

Thanks for your feedback in advance

Yes, you missed something.

If you click on a duplicate in the list, where it says 1 duplicates, an additional option opens where you can select which to delete.

The exact same feature is there regardless of which license you have activated.

The term deduplication refers generally to eliminating duplicate or redundant information. :wink:

Thanks Gary,
i would love to see only one duplicate in my results, but you can imagine, i´ve more than that.
Otherwise (on my perspective) i have no need to use this extra feature.

to be short:
I ve a huge amount of duplicates in the results and each time, eM decided to keep items in the wrong Folder that i wish to keep.
Who is saying, that i have to go over hundreds of items to configure each?
Extra-Feature? Timesaver? maybe with <10 results…thats my point.

I hope you have know a good idea from what im talking, and more than that, you have an good solution that ive still missed ?

Your example is fine, and yes, it shows exactly my problem.
There is only a way to configure “keep/remove” in the results, and only once for EACH result.
There is no way to say,
-> keep all found items in this Folder, dont touch
-> remove from this folder

No there isn’t.

The Deduplicator is a basic tool provided to assist you in removing duplicates because of mistakes you have made. You will need to go through the list and decide for each one what to do. Then it is done and hopefully you don’t copy messages again into multiple folders.