Wednesday, May 18, 2016

Another simple way to remove duplicate data

Purpose:

In my previous blog, I have explained a way to avoid duplicate records. Here I am with a simple query to achieve the same.

SELECT DISTINCT column1, column2, column3, .......

FROM tablename;

                                                              OR

SELECT column1, column2, column3,....
FROM(
     SELECT COUNT(1) count_value, column1, column2, column3, .......
     FROM tablename
     GROUP BY column1, column2, column3, .......);


                                                              OR

SELECT column1, column2, column3, .......

FROM tablename
GROUP BY column1, column2, column3, .......;


If you notice the sub-query contains COUNT(1), which is not in the main query.

How this query avoids duplicate?

Follow the below approach to avoid to the duplicate in records or delete duplicate records.

Consider the below Table "Employee" which has two duplicate records


EmpId
EmpName
Salary
1
AAAA
20000
2
BBBB
30000
2
BBBB
30000
3
CCCC
25000
4
DDDD
20000
4
DDDD
20000

SELECT COUNT(1) count_value, EmpId, EmpName, Salary
FROM Employee
GROUP BY EmpId, EmpName, Salary;

The above query will return the below result


count_value
EmpId
EmpName
Salary
1
1
AAAA
20000
2
2
BBBB
30000
1
3
CCCC
25000
2
4
DDDD
20000

It is clear from the above query that EmpId 2 and EmpId 4 has duplicate records.

So with the above query, we can able to group the duplicate records.

Now create a temporary table of the same schema and insert data into that temp table using the below query


INSERT into Employee_temp 
SELECT EmpId, EmpName, Salary
FROM(
     SELECT COUNT(1) count_value, EmpId, EmpName, Salary
     FROM Employee
     GROUP BY EmpId, EmpName, Salary);


                                                                     OR
INSERT into Employee_temp 
SELECT EmpId, EmpName, Salary
FROM Employee
GROUP BY EmpId, EmpName, Salary;

                                                                     OR
INSERT into Employee_temp 
SELECT DISTINCT EmpId, EmpName, Salary
FROM Employee;



Now the temp table will have unique data as below



EmpId
EmpName
Salary
1
AAAA
20000
2
BBBB
30000
3
CCCC
25000
4
DDDD
20000

Now drop the Employee table and Rename the Employee_temp table to Employee.

Note: This query will not work for the below records, if you need all the columns.


EmpId
EmpName
Salary
Department
1
AAAA
20000
Dept1
2
BBBB
30000
Dept1
2
BBBB
30000
Dept2
3
CCCC
25000
Dept2
4
DDDD
20000
Dept2
4
DDDD
20000
Dept3

If you note this, the department for EmpId 2 and EmpId 4 are different in both the records.

If you do not need the department column, then this query will work.

Hope this should avoid duplicate records or delete duplicate records.

No comments:

Post a Comment