Removing Duplicates with DISTINCT in SQL

Removing Duplicates with DISTINCT in SQL

When querying data from a database, you may encounter duplicate rows in your result set. The DISTINCT keyword in SQL is used to eliminate duplicate rows, ensuring that the results returned by your query are unique. This is particularly useful when you want to retrieve a list of unique values from a column or a combination of columns.


1. Understanding DISTINCT

The DISTINCT keyword is applied to the SELECT statement to remove duplicate rows from the result set. When you use DISTINCT, SQL scans the result set and removes any duplicate rows, returning only unique entries.

Syntax:

SELECT DISTINCT column1, column2, ...
FROM table_name;
  • column1, column2, ...: The columns you want to retrieve, with duplicates removed.

2. Using DISTINCT with a Single Column

You can use DISTINCT to retrieve unique values from a single column. This is useful when you want to get a list of all distinct values in that column.

Example:

SELECT DISTINCT department 
FROM employees;

In this example:

  • The SELECT DISTINCT statement retrieves a list of unique departments from the employees table.

Result:

In this output:

  • The result contains only one row for each unique department, even if multiple employees work in the same department.

3. Using DISTINCT with Multiple Columns

You can also use DISTINCT to retrieve unique combinations of values from multiple columns. This ensures that the combination of values in the specified columns is unique across the result set.

Example:

SELECT DISTINCT first_name, last_name 
FROM employees;

In this example:

  • The SELECT DISTINCT statement retrieves unique combinations of first_name and last_name from the employees table.

Result:

In this output:

  • The result contains unique combinations of first_name and last_name, even if some first names or last names appear multiple times in the table.

4. Combining DISTINCT with Other Clauses

You can combine DISTINCT with other SQL clauses such as WHERE, ORDER BY, and LIMIT to further refine your query.

Example 1: Using DISTINCT with WHERE

SELECT DISTINCT department 
FROM employees
WHERE salary > 50000;

In this example:

  • The SELECT DISTINCT statement retrieves a list of unique departments where employees have a salary greater than 50,000.

Result:

In this output:

  • Only the departments with employees earning more than 50,000 are included in the result, and duplicates are removed.

Example 2: Using DISTINCT with ORDER BY

SELECT DISTINCT department 
FROM employees
ORDER BY department ASC;

In this example:

  • The SELECT DISTINCT statement retrieves unique departments and sorts them in ascending order.

Result:

In this output:

  • The unique department names are displayed in alphabetical order.

Example 3: Using DISTINCT with LIMIT

SELECT DISTINCT department 
FROM employees
ORDER BY department ASC
LIMIT 3;

In this example:

  • The SELECT DISTINCT statement retrieves unique departments, sorts them in ascending order, and limits the result to the top 3 rows.

Result:

In this output:

  • The top 3 unique departments, sorted alphabetically, are displayed.

5. Handling NULL Values with DISTINCT

When using DISTINCT, SQL treats NULL values as unique. If a column contains NULL values, the DISTINCT keyword will include one NULL in the result set, even if there are multiple NULL values.

Example:

SELECT DISTINCT department 
FROM employees;

If the employees table has the following data:

Result:

In this output:

  • The DISTINCT keyword removes duplicate values, including multiple NULL entries, leaving only one NULL in the result set.

6. Practical Use Case: Fetching Unique Customer Locations

Consider a scenario where you want to retrieve a list of unique cities where your customers are located.

Example:

SELECT DISTINCT city 
FROM customers;

In this example:

  • The SELECT DISTINCT statement retrieves a list of unique cities from the customers table, ensuring that each city appears only once.

Result:

This query helps you identify all the different cities where your customers are based, without duplicate entries.


Conclusion

The DISTINCT keyword in SQL is a powerful tool for ensuring that your query results contain only unique values. Whether you're working with a single column or multiple columns, DISTINCT helps you eliminate duplicates and retrieve clean, unique data. By combining DISTINCT with other SQL clauses, you can create refined and efficient queries that return the exact data you need.

This tutorial is brought to you by codes with pankaj.

Last updated