This is quite a simple question that “How to remove duplicate items from a list in Python?”. There are quite some methods you can use to do so. Some programmers like easy way and simple way to remove items from a
list in Python. Some want sophisticated and reliable method. It all depends on the requirement and to some extend the personal choice of the programmer.
Searching the web and Googling gave me many results and most of them are confusing or don’t fulfill the requirement. Let have some discussion on it.
There are some problems to remove duplicate items from a list in Python. They are considered before you choose a method.
Consideration: How to remove duplicate items from the list in Python?
- Whether objects in the list are hashable or not?
- Whether they support comparison?
- Do you need to preserve the order of the list after removing the items?
- Are you dealing with large list object and need a fast method?
1. Objects in the list are hashable or not!
If all objects in the list are hashable like integers, strings, float (Keep in mind objects like
set are not hashable) and you don’t need to preserve the order of the
list. Then one of the simplest methods is you convert a list into a set and then back into a list object. Something like
list(set(list_obj)). This simple yet powerful method and quite fast too.
2. Whether they support comparison!
If objects in the
list are not primitive objects like integers, floats, and strings. They will not be a simple way to find identical objects in the Python list. So Python interpretative will not be able to group and remove them straight away. A special comparison algorithm will be required to achieve this task. Fortunately, Python has an elegant way for such tasks.
__cmp__ method of the class is a clue to success.
3. Do you need to preserve the order of the list after removing the items?
In most of the situations, an order doesn’t matter after removing items from a
list in Python. Sometimes, it is required and make life tedious for programmers to implement it. Python doesn’t have any built-in method for this task. So we have to do some homework and find a solution. Python built-in
set.setdefault() method can be used with the list comprehension to achieve this goal.
Here is a simple code snippet you can implement in you Python program or modify it as you require.
[set.setdefault(x,x) for x in alist if x not in set]
4. Are you dealing with large Python list object and need a fast method?
Here comes all expertise to solve this situation. Small Python
list object doesn’t need any consideration. Even moderate Python lists don’t need as much consideration if they are not used often. If you have large statistical data which involve heavy use of Python
list and you need to remove duplicate items from them. You must pay a very close consideration to for this task. Sometimes, it becomes the pain in the head and requires special expertise.
Fortunately, Python has
OrderedDict class from
collection package. It is purely implemented in C programming language from Python 3.5 and onward. If this class fulfills your requirement then you are in the heaven. This
OrderedDict class doesn’t support non-hashable object such as,
set in the
I have also found an external Python library
iteration_utilities. Here is a link to this iteration_utilities library home page. The author of this library claims that it is the fastest library. This library is also implemented in C programming language. Obviously, it is an edge and speed comes in the tradition of C language. You should give it a try.
I have discussed major issues involved in this task. I have found some resources where a clear solution is provided. You can follow these links and find the right solution for yourself. As I have described earlier that you should perform an extensive test on your data before you implement in the production environment. Here are some useful links you may follow are learnandlearn and codeacademy.