The Utter Failure of Async in Python
I’m probably going to have to eat this blog post 2 years from now…. oh well. I still believe that Async has been mostly a failure since introduced in Python 3.4. Maybe I should be more specific, there seems to be a failure to adopt Async in the Python community and major packages at large. Sure, there are glimmers of hope like aiohttp, but for the most part all the Async work and adoption seems to be on the fringes of the Python world. Why is this?
Why has the adoption of Async in the Python community at large failed?
I believe there are a few pretty straight forward reasons for this.
- It isn’t easy to use or understand how to use. It just isn’t Pythonic.
- Its implementation has changed drastically over some minor Python versions, early in Async’s journey.
- A combination of multithreading and multiprocessing can give the same speed improvements, in more straightforward way.
Let’s cover each of these.
It isn’t easy to use or understand how to use. It just isn’t Pythonic.
Some of the examples and tutorials you run across right away might appear simple, try to lure you in. I can almost guarantee writing a fully async problem in the real world won’t be as straight forward and easy the first 100 times as you think it will be. Part of the reason Python has become so popular is that it is so approachable for the average person. It lowers the bar, for better or worse, of the programming entry point.
Async raises that bar again. Here is extremely simple example.
import asyncio
from aiohttp import ClientSession
from datetime import datetime
async def fetch(url, session):
async with session.get(url) as response:
return await response.read()
async def run():
urls = ['http://www.cnn.com',
'http://www.foxnews.com',
'http://www.nbcnews.com',
'http://www.abcnews.com',
'http://www.usatoday.com/news',
'https://www.bbc.com/news/world']
tasks = []
async with ClientSession() as session:
for url in urls:
task = asyncio.ensure_future(fetch(url, session))
tasks.append(task)
responses = await asyncio.gather(*tasks)
return responses
t1 = datetime.now()
loop = asyncio.get_event_loop()
future = asyncio.ensure_future(run())
responses = loop.run_until_complete(future)
for response in responses:
print(response)
t2 = datetime.now()
timer = t2 - t1
print(f'Process took {timer} seconds')
At first glance this might appear not too bad. But start to look closer, find all the async, await, ensure_future, gather etc. They are sprinkled throughout the code. It just isn’t that obvious. All the sudden the simple task of calling a bunch of website content. Granted it is pretty fast. Process took 0:00:00.896876 seconds.
Writing good async code just requires a different through process and approach from the normal Python routine. You have to think about understand the blocking parts of your codebase, figure out if the libraries you are using are async compatible or have async versions. You have to think about futures, awaiting, and event loops. It’s just a different way of thinking and once you commit to it, your codebase is going to look substantially different. And most likely trouble shooting is going to take on a new meaning.
Its implementation has changed drastically over some minor Python versions, early in Async’s journey.
When I first started to dabble with and write async Python code, when reading documentation and tutorials, it quickly became clear it mattered which version of Python I was on. It was evident that the form and function of writing Aync code had/has changed a lot between 3.4 and 3.6+. Enough where the code just wouldn’t work.
It was extremely confusing and frustrating at that point. When you have a new paradigm that is as hard to embrace and understand as Async, the last thing you want to do is the bait and switch. Also, the numerous different ways to express and accomplish the same tasks didn’t help either.
A combination of multithreading and multiprocessing can give the same speed improvements, in more straightforward way.
Let’s re-write the above example in a Multihreading way.
from concurrent.futures import ThreadPoolExecutor
import requests
from datetime import datetime
urls = ['http://www.cnn.com',
'http://www.foxnews.com',
'http://www.nbcnews.com',
'http://www.abcnews.com',
'http://www.usatoday.com/news',
'https://www.bbc.com/news/world']
def get_url(url: str):
response = requests.get(url)
return response.content
t1 = datetime.now()
with ThreadPoolExecutor(max_workers=6) as thready:
results = thready.map(get_url, urls)
for result in results:
print(result)
t2 = datetime.now()
timer = t2 - t1
print(f'Process took {timer} seconds')
Not bad, straightforward code, and by George it’s a little faster. Process took 0:00:00.793653 seconds. This is most likely the problem with Async being adopted into the mainstream Python community as a whole. It’s a hard sell to move from easy to write code and can be maintained and tested, to much more complex and hard to troubleshoot async code.
Don’t get me wrong, I love async and write code as such when I get a chance. I think it’s a good mental exercise and I wish certain popular packages would support async. Pushing yourself to understand new way of writing code is always a good exercise. But I have my doubts about if Async will ever be a “mainstream” way of writing Python code now and in the future for most programmers.