Many a times we end up in lots of duplicate data in our systems like CRM, and when it becomes difficult to detect the duplicates and clean them up. This blog describes a very simple way to detect duplicates in data without any rocket science knowledge. This process of finding duplicates is not only for CRM data, but any data which can be copied/exported in excel sheet. For CRM, you can retrieve all records of any entity using SDK or other methods and follow the steps below to know all the duplicates.
Step 1: Have your data in excel sheet. I have taken a data set here for easy explanation. My data looks like below –
I have written First Name so that it describes which ones are duplicate and which ones are not. This is to show that we have filtered out all the duplicates at the end of this simple process. Now we need to come up with the criteria on which we want to find duplicates. Lets say our criteria is “If Last name and Email of a record is equal to same of other records then record is duplicate”. You can choose your own criteria.
Step 2: Now Select all the data in Last Name column excluding header. To do this you can select the cell D2, and then press “shift + ctrl + down arrow key”.
Step 3: Do conditional formatting to highlight duplicate values as shown in the screen shot below. (This is optional step and is being done only for easier identification). Do Step2 and Step3 for Email column too. If your criteria contains other columns too, then you will need to do these steps for them also.
After doing step2 and Step 3 for “Last Name” and “Email”, your data will look like below.
Step 4: Now select all the data rows leaving behind headers. To do this, select A2 cell, and press “shift + ctrl + right Arrow+Down Arrow”. Now click on sort and sort this as shown below –
If you had ignored step 3 earlier, then you can ignore first two condition of sorting. If you have chosen more columns in your criteria, then sorting criteria will also contain those. After clicking on ok on ‘sort’ window, our data looks like below –
Now you have all the duplicate records on top. You can just detect it by looking at rows where ‘Last Name’ and ‘Email’ both cells are colored. To further make it more detectable you can perform the following steps –
Step 5: Insert a column at starting, name it as “Is duplicate”. Select A2, and insert formula ‘=IF(AND(E2=E3,F2=F3),"Dup of row below","")’ into this cell, just like shown below. If your criteria contains more columns then this formula will look like ‘‘=IF(AND(E2=E3,F2=F3,X2=X4,...),"Dup of row below","")’.
Step 6: Now copy the similar formula in all the cells of first column. To do this easily, copy the cell A2. Now select all the cells of “Is Duplicate” column A3 onwards, and press “ctrl+v”. Your final result is as below –
Now you know all the duplicates. You just need to locate empty cell followed by “Dup of row Below” in “Is Duplicate” column and you can group duplicates. In this example, first 2nd, 3rd and 4th rows are duplicate to each other and 5th and 6th Rows are duplicates to each other. You are now free to give appropriate treatments to these duplicates.
Note: If you are using SDK of CRM and need to delete duplicate records, it will be easier if you have GUIDs in one column, so that you can delete these by using SDK API.