SQL Tricks - Finding Duplicates
In today’s post I will be looking into few of the commonly used SQL queries and functions. To demonstrate the queries, we will using PostgreSQL database. In Docker-PostgreSQL, I have outlined the steps to run a PostgreSQL container. You can execute the pg_hr_schema.sql to populate data used in these examples. Additionally lets insert few records into locations
.
|
|
Finding Duplicates
|
|
Deleting duplicates
Now that we have found the rows which are duplicate, lets delete the duplicate rows. To achieve this we will use ctid
, the system identifier on each row.
|
|
What is ctid
column ?
The physical location of the row version within its table. Note that although the ctid can be used to locate the row version very quickly, a row’s ctid will change if it is updated or moved by VACUUM FULL. Therefore ctid is useless as a long-term row identifier. The OID, or even better a user-defined serial number, should be used to identify logical rows.