How to remove duplicate items from list in Python?
The difference between the list and the Set is that an element can appear more than once in a list, but an element can appear only once in a set. Hence, if we cast a list to a set, duplicates will be removed. However, the original order of elements is not guaranteed. The order of elements in a set is decided by hashing mechanism, which may be different than in the list. This is verified by the following code:
>>> mylist=[5,10,15,20,3,15,25,20,30,10,100] >>> myset=set(mylist) >>> print(list(myset)) [3, 100, 5, 10, 15, 20, 25, 30]
So, how to remove duplicate appearance yet retain original order?
Append Unique Items in another List using For Loop
A simple approach would be to append the first appearance of each number in another list, as shown below.
>>> uniques=[] >>> for num in mylist: if num not in uniques: uniques.append(num) >>> print(uniques) [5, 10, 15, 20, 3, 25, 30, 100]
Using List Comprehension
We can use list comprehension to make it little more concise.
>>> uniques=[] >>> [uniques.append(num) for num in mylist if not num in uniques] >>> print(uniques) [5, 10, 15, 20, 3, 25, 30, 100]
The above approach is simple in implementation but not efficient, especially for a list with the large number of items. The following technique removes duplicates fairly efficiently.
Using OrderedDict.fromkeys()
The solution is slightly different from Python versions less than 3.7 and later versions. Prior to Python 3.7, dictionary output may not be as per the order of insertion. However, OrderedDict can do so. We then use the fromkeys()
method to build an ordered dictionary using list items as keys whose associated value is None
.
>>> mylist=[5,10,15,20,3,15,25,20,30,10,100] >>> from collections import OrderedDict >>> list(OrderedDict.fromkeys(mylist)) [5, 10, 15, 20, 3, 25, 30, 100]
In later versions, dictionary is guaranteed to remember its key insertion order. Hence, the fromkeys()
method of normal dict class would do the same job.
Using the reduce() Function
The most efficient solution to this problem is to use the reduce() function of the functools
module.
In the following example, a two-element tuple with an empty list and set is used as an initializer. Each new occurrence in the original list is appended in an empty list, and Set acts as a look-up table.
>>> from functools import reduce >>> mylist=[5,10,15,20,3,15,25,20,30,10,100] >>> tup = (list(), set()) >>> # list to keep order, set as lookup table >>> def myfunction(temp, item): if item not in temp[1]: temp[0].append(item) temp[1].add(item) return temp >>> uniques=reduce(myfunction, mylist, tup)[0] >>> print(uniques) [5, 10, 15, 20, 3, 25, 30, 100]