Don’t Run Loops in Python, Instead, Use These!

As Data science practitioners we always deal with large datasets and often we need to modify one or multiple columns. Using a loop for that kind of task is slow.

In this blog, I will take you through a few alternative approaches which are faster than loops in python.

1. Filter

Based on the name we can easily guess what it does. It filters iterable objects for us. We are going to pass the filtering conditions in form of a function and this function is going to use to filter each element in the iterable object.

Syntax

filter(function, iterable)

Now let’s compare the python filter’s performance compare to for loop and while loop.

Here, I’m creating a list of 100000 sequential items and then will do a factorial of even numbers. In the end, we will sum these factorial values.

Note: I'm running all these codes in google colab; depending on your system these results could differ.

# Using for loop

import math
import time
import resource
import sys

time_start = time.perf_counter()

sys.set_int_max_str_digits(0)

def factorial(n):
   return math.factorial(n)

test_list = list(range(1, 100000))
result = []
for i in range(len(test_list)) :
    if test_list[i] % 2 == 0:
      result.append(test_list[i]) 
print(sum(result))

time_elapsed = (time.perf_counter() - time_start)
memMb=resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024.0/1024.0

print ("%f secs %f MByte" % (time_elapsed,memMb))

# Output
2499950000
0.041538 secs 1.082287 MByte

# Using while loop

import math
import time
import resource
import sys

time_start = time.perf_counter()

sys.set_int_max_str_digits(0)

def factorial(n):
   return math.factorial(n)

test_list = list(range(1, 100000))
result = []
i=0
while i < len(test_list) :
    if test_list[i] % 2 == 0:
      result.append(test_list[i]) 
    i = i + 1 
print(sum(result))

time_elapsed = (time.perf_counter() - time_start)
memMb=resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024.0/1024.0

print ("%f secs %f MByte" % (time_elapsed,memMb))

# Output
2499950000
0.047446 secs 1.085056 MByte

# Using Filter

import math
import time
import resource
import sys

time_start = time.perf_counter()

sys.set_int_max_str_digits(0)

def factorial(n):
   return math.factorial(n)

test_list = list(range(1, 100000))
result = filter(lambda x: x % 2 == 0, test_list)
print(sum((result)))

time_elapsed = (time.perf_counter() - time_start)
memMb=resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024.0/1024.0

print ("%f secs %f MByte" % (time_elapsed,memMb))

# Output
2499950000
0.022666 secs 1.087833 MByte

2. MAP

This function is really useful if you want to apply a function to each value of an iterable object like a list, tuple, or even a pandas series.

Syntax:

map(function, iterable)

Let’s test it with a for a loop.

# Using for loop
import math
import time
import resource
import sys

time_start = time.perf_counter()

sys.set_int_max_str_digits(0)

def factorial(n):
   return math.factorial(n)

test_list = list(range(1, 10000))
result = 0
for i in range(len(test_list)) :
    test_list[i] = factorial(test_list[i]) 
print(sum(test_list))

time_elapsed = (time.perf_counter() - time_start)
memMb=resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024.0/1024.0

print ("%f secs %f MByte" % (time_elapsed,memMb))

#output
284654436382457541 .....................0420940313
12.229430 secs 0.167130 MByte

What about while loop

# Using while loop

import math
import time
import resource
import sys

time_start = time.perf_counter()

sys.set_int_max_str_digits(0)

def factorial(n):
   return math.factorial(n)

test_list = list(range(1, 10000))
result = 0
i=0
while i < len(test_list) :
    test_list[i] = factorial(test_list[i])
    i = i + 1 
print(sum(test_list))

time_elapsed = (time.perf_counter() - time_start)
memMb=resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024.0/1024.0

print ("%f secs %f MByte" % (time_elapsed,memMb))

#output
284654436382457541 .....................0420940313
11.263874 secs 1.013439 MByte

Now if we run the same code using the map function

# Using Map

import math
import time
import resource
import sys

time_start = time.perf_counter()

sys.set_int_max_str_digits(0)

def factorial(n):
   return math.factorial(n)

test_list = list(range(1, 10000))
result = map(factorial, test_list)
print(sum((result)))

time_elapsed = (time.perf_counter() - time_start)
memMb=resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024.0/1024.0

#Output
284654436382457541 .....................0420940313
10.069755 secs 1.013439 MByte

3. Reduce

Python offers a function called reduce() that allows you to reduce a list in a more concise way. this function performs functional computation by taking a function and iterable like a list, tuple, series, etc as arguments and returns a single value as output.

Syntax

reduce(func, iterable)

The reduce() function applies the function of two arguments cumulatively to the items of the list, from left to right to reduce the list into a single value.

Unlike the map() and filter() functions, the reduce() is not a built-in function in Python. The reduce() function belongs to the functools module.

Steps of how to reduce function work in python

The function passed as an argument is applied to the first two elements of the iterable.
After this, the function is applied to the previously generated result and the next element in the iterable.
This process continues until all of the iterable items are iterated.
A single value is returned as a result of applying the reduce function on the iterable.

Let’s understand the steps with an illustration:

If we have a list of numbers [1,2,3,4,5] that is reduced by applying the addition function.

As a final output will be the sum of all the numbers of the list 15.

python reduce — Reduce a list by applying the sum function

Now, let’s compare the python function reduce() performance compare to the for loop and while loop.

For this comparison, I’m going to use an addition() function that adds two input numbers.

def addition(x,y):
    return x + y

Next, in order to get the sum of all numbers in a list I will apply the addition function with for and while loops. Finally will apply this addition function as an argument to the reduce function.

Testing with for loop

# Using for loop
import math
import time
import resource
import sys

time_start = time.perf_counter()

sys.set_int_max_str_digits(0)

def addition(x,y):
    return x + y

test_list = list(range(1, 1000000))

result = 0
for i in range(len(test_list)) :
      result = addition(result, test_list[i]) 
print(result)

time_elapsed = (time.perf_counter() - time_start)
memMb=resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024.0/1024.0

print ("%f secs %f MByte" % (time_elapsed,memMb))

# Output
499999500000
0.289569 secs 0.167500 MByte

Testing with a while loop

# Using while loop

import math
import time
import resource
import sys

time_start = time.perf_counter()

sys.set_int_max_str_digits(0)

def addition(x,y):
    return x + y

test_list = list(range(1, 1000000))

result = 0
i=0
while i < len(test_list) :
      result = addition(result, test_list[i]) 
      i = i + 1 
print(result)

time_elapsed = (time.perf_counter() - time_start)
memMb=resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024.0/1024.0

print ("%f secs %f MByte" % (time_elapsed,memMb))

# Output
499999500000
0.429485 secs 0.167500 MByte

Testing with a reduce function

# Using Reduce

from functools import reduce
import time
import resource
import sys

time_start = time.perf_counter()

sys.set_int_max_str_digits(0)

def addition(x,y):
    return x + y

test_list = list(range(1, 1000000))

result = reduce(addition, test_list)
print(result)

time_elapsed = (time.perf_counter() - time_start)
memMb=resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024.0/1024.0

print ("%f secs %f MByte" % (time_elapsed,memMb))

# Output
499999500000
0.120981 secs 0.167500 MByte

Now, the difference is only in milliseconds. But if we have a really large dataset then these milliseconds will convert into secs and hours.

Conclusion

Thanks for reading! Tried my best to explain how different approaches can perform better, but it will always depend on you. What’s more convenient for you? If you have any feedback, please share it in the comment section, I will be happy to know.

If you like this article and wish to connect with me follow me on Linkedin.

Be curious, keep learning and stay willing to learn new things until we meet next time!

Don’t Run Loops in Python, Instead, Use These!

Table of Contents

1. Filter

2. MAP

3. Reduce

Conclusion

Unlock The Future: Dive Into The Wonders Of AgentVerse Now!

How to Master LangChain Agents with React: Definitive 6,000-Word Guide

Anup Das

2 thoughts on “Don’t Run Loops in Python, Instead, Use These!”

Connect with us

Table of Contents

1. Filter

2. MAP

3. Reduce

Conclusion

Unlock The Future: Dive Into The Wonders Of AgentVerse Now!

How to Master LangChain Agents with React: Definitive 6,000-Word Guide

Anup Das

Stay Connected with ATT

2 thoughts on “Don’t Run Loops in Python, Instead, Use These!”