PostgreSQL DISTINCT Clause
In PostgreSQL, the DISTINCT clause is used with a SELECT statement to remove duplicate rows from a result set. It keeps only one row out of two or more duplicate rows. It can also be used for finding distinct values on one or more columns.
The SELECT ALL is the default clause specified which keeps all rows from the result set.
SELECT DISTINCT <column_1>,<column_2>... <column_n>
FROM <table_name>;
Let's use the following weather_report
table to demonstrate the DISTINCT clause.
To get a list of distinct Cities, use the SELECT DISTINCT clause like below.
SELECT DISTINCT city FROM weather_reports;
The above query will return the following result:
The SELECT DISTINCT clause can be applied to multiple columns. In that case, the unique combination of all columns will be returned from the result set.
SELECT DISTINCT city, weather FROM weather_reports;
DISTINCT ON
Just like the DISTINCT clause, PostgreSQL also supports DISTINCT ON can be used with a SELECT statement to remove duplicates from a query result set. The only difference is it keeps the “first row” from each set of rows where the expression evaluates to be true.
SELECT DISTINCT ON (<column_1>) <column_alias>,
<column_2>...<column_n>
FROM <table_name>
ORDER BY <column_1>, <column_2>…<column_n>;
The order of rows returned from the SELECT statement is unpredictable. Hence it is always a good practice to use the ORDER BY clause with the DISTINCT ON clause to ensure the desired row appears first. The DISTINCT ON expression must match the leftmost ORDER BY expression.
The DISTINCT ON clause can be used to get the “first” row of every group of the duplicate resultset. For example, the following statement will sort the data of the weather_report table by ascending order of the city and descending order of report_date, and then for every group of duplicates, it keeps the first row in the result set.
As described below, DISTINCT ON with ORDER BY clause is used to fetch the most recent weather report of each city from the data stored in the weather_reports table.
SELECT DISTINCT ON (city) city, report_date, weather
FROM weather_reports
ORDER BY city, report_date DESC;
Note that the DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). If we don't use ORDER BY in the above query, we can get weather reports for unpredictable report_date for each city.