Skip to content


Deleting Duplicate Records in MySQL

I am writing this post to remind myself of a cool trick that I rediscovered today. When you are scrubbing a database table, there is the common problem of removing duplicate data. In my case, I had a table of ZIP codes, city names, and state codes, along with the latitude and longitude values. The problem was that there were a few thousand records where the lat/lon data was slightly off leading to multiple duplicate zip code rows with slightly different data. I have seen solutions for other databases that require a temporary ID field and self joins. It turns out that for MySQL the solution is much simpler. If you use the ALTER IGNORE TABLE command to add a unique index, then your duplicate rows will be silently deleted. Here was the command I used:
ALTER IGNORE TABLE `zip_codes`
ADD UNIQUE INDEX dupes (`region_code`, `city`, `zipcode`);

Finally, just drop the temporary index and you’re done. Here’s the official word from the manual:

IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key, The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.

Posted in How To.

Tagged with .