Adding a concurrency limit to Python’s asyncio.as_completed

Series: asyncio basics, large numbers in parallel, parallel HTTP requests, adding to stdlib

In the previous post I demonstrated how the limited_as_completed method allows us to run a very large number of tasks using concurrency, but limiting the number of concurrent tasks to a sensible limit to ensure we don’t exhaust resources like memory or operating system file handles.

I think this could be a useful addition to the Python standard library, so I have been working on a modification to the current asyncio.as_completed method. My work so far is here: limited-as_completed.

I ran similar tests to the ones I ran for the last blog post with this code to validate that the modified standard library version achieves the same goals as before.

I used an identical copy of timed from the previous post and updated versions of the other files because I was using a much newer version of aiohttp along with the custom-built python I was running.

server looked like:

#!/usr/bin/env python3

from aiohttp import web
import asyncio
import random

async def handle(request):
    await asyncio.sleep(random.randint(0, 3))
    return web.Response(text="Hello, World!")

app = web.Application()
app.router.add_get('/{name}', handle)

web.run_app(app)

client-async-sem needed me to add a custom TCPConnector to avoid a new limit on the number of concurrent connections that was added to aiohttp in version 2.0. I also need to move the ClientSession usage inside a coroutine to avoid a warning:

#!/usr/bin/env python3

from aiohttp import ClientSession, TCPConnector
import asyncio
import sys

limit = 1000

async def fetch(url, session):
    async with session.get(url) as response:
        return await response.read()

async def bound_fetch(sem, url, session):
    # Getter function with semaphore.
    async with sem:
        await fetch(url, session)

async def run(r):
    with ClientSession(connector=TCPConnector(limit=limit)) as session:
        url = "http://localhost:8080/{}"
        tasks = []
        # create instance of Semaphore
        sem = asyncio.Semaphore(limit)
        for i in range(r):
            # pass Semaphore and session to every GET request
            task = asyncio.ensure_future(
                bound_fetch(sem, url.format(i), session))
            tasks.append(task)
        responses = asyncio.gather(*tasks)
        await responses

loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.ensure_future(run(int(sys.argv[1]))))

My new code that uses my proposed extension to as_completed looked like:

#!/usr/bin/env python3

from aiohttp import ClientSession, TCPConnector
import asyncio
import sys

async def fetch(url, session):
    async with session.get(url) as response:
        return await response.read()

limit = 1000

async def print_when_done():
    with ClientSession(connector=TCPConnector(limit=limit)) as session:
        tasks = (fetch(url.format(i), session) for i in range(r))
        for res in asyncio.as_completed(tasks, limit=limit):
            await res

r = int(sys.argv[1])
url = "http://localhost:8080/{}"
loop = asyncio.get_event_loop()
loop.run_until_complete(print_when_done())
loop.close()

and with these, we get similar behaviour to the previous post:

$ ./timed ./client-async-sem 10000
Memory usage: 73640KB	Time: 19.18 seconds
$ ./timed ./client-async-stdlib 10000
Memory usage: 49332KB	Time: 18.97 seconds

So the implementation I plan to submit to the Python standard library appears to work well. In fact, I think it is better than the one I presented in the previous post, because it uses on_complete callbacks to notice when futures have completed, which reduces the busy-looping we were doing to check for and yield finished tasks.

The Python issue is bpo-30782 and the pull request is #2424.

Note: at first glance, it looks like the aiohttp.ClientSession‘s limit on the number of connections (introduced in version 1.0 and then updated in version 2.0) gives us what we want without any of this extra code, but in fact it only limits the number of connections, not the number of futures we are creating, so it has the same problem of unbounded memory use as the semaphore-based implementation.

10 thoughts on “Adding a concurrency limit to Python’s asyncio.as_completed”

I created this to solve this problem more generally: https://gist.github.com/thehesiod/7081ab165b9a0d4de2e07d321cc2391d

Hi Alexander, that is cool. In what way is it more general?

Alexander’s script has the advantage that it allows to add coroutines after the loop has been started. Can be useful if one async operations should schedule other operations recursively.
Here’s another solution:
https://gist.github.com/njam/e19a497185a9f657dc77429eed3aea07

btw I guess for most usecases setting aiohttp’s connection limit should be sufficient:
http://aiohttp.readthedocs.io/en/stable/client.html#limiting-connection-pool-size

Thanks Reto, good points.

Thanks for the inspiration!
I’m currently using the following generalization of `asyncio.gather` for similar problem:
““py
async def igather(tasks, limit=None):
pending = set()
while True:
for task in islice(tasks, limit – len(pending) if limit else None):
pending.add(task)
if pending:
done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
for task in done:
yield task
else:
break
““

Thanks for the inspiration!
I’m currently using the following generalization of `asyncio.gather` for similar problem:
—-
async def igather(tasks, limit=None):
pending = set()
while True:
for task in islice(tasks, limit – len(pending) if limit else None):
pending.add(task)
if pending:
done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
for task in done:
yield task
else:
break
—-

Nice, thanks Andrey!

Maybe the “meant to be” way to limit number of workers is using queues:
https://docs.python.org/3/library/asyncio-queue.html#asyncio-queues

Thanks Sergey – yes, maybe we could write some kind of limited-size task pool that pull from a queue.

What is the difference between a semaphore limit and TCP connector limit ?Can u please explain in detail with an example. I am a beginner in Asyncio. thanks

Alexander Mohr says:

July 28, 2017 at 10:12 pm

I created this to solve this problem more generally: https://gist.github.com/thehesiod/7081ab165b9a0d4de2e07d321cc2391d
Andy Balaam says:

July 29, 2017 at 8:48 am

Hi Alexander, that is cool. In what way is it more general?
Reto Kaiser says:

October 13, 2017 at 6:30 pm

Alexander’s script has the advantage that it allows to add coroutines after the loop has been started. Can be useful if one async operations should schedule other operations recursively.
Here’s another solution:
https://gist.github.com/njam/e19a497185a9f657dc77429eed3aea07

btw I guess for most usecases setting aiohttp’s connection limit should be sufficient:
http://aiohttp.readthedocs.io/en/stable/client.html#limiting-connection-pool-size
Andy Balaam says:

October 22, 2017 at 12:28 pm

Thanks Reto, good points.
Andrey Paramonov says:

February 20, 2019 at 3:48 pm

Thanks for the inspiration!
I’m currently using the following generalization of `asyncio.gather` for similar problem:
““py
async def igather(tasks, limit=None):
pending = set()
while True:
for task in islice(tasks, limit – len(pending) if limit else None):
pending.add(task)
if pending:
done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
for task in done:
yield task
else:
break
““
Andrey Paramonov says:

February 20, 2019 at 3:50 pm

Thanks for the inspiration!
I’m currently using the following generalization of `asyncio.gather` for similar problem:
—-
async def igather(tasks, limit=None):
pending = set()
while True:
for task in islice(tasks, limit – len(pending) if limit else None):
pending.add(task)
if pending:
done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
for task in done:
yield task
else:
break
—-
Andy Balaam says:

February 21, 2019 at 9:49 am

Nice, thanks Andrey!
Sergey says:

August 22, 2019 at 6:05 am

Maybe the “meant to be” way to limit number of workers is using queues:
https://docs.python.org/3/library/asyncio-queue.html#asyncio-queues
Andy Balaam says:

August 22, 2019 at 8:36 am

Thanks Sergey – yes, maybe we could write some kind of limited-size task pool that pull from a queue.
Raaga J says:

June 2, 2020 at 12:25 pm

What is the difference between a semaphore limit and TCP connector limit ?Can u please explain in detail with an example. I am a beginner in Asyncio. thanks

This site uses Akismet to reduce spam. Learn how your comment data is processed.

10 thoughts on “Adding a concurrency limit to Python’s asyncio.as_completed”

Leave a Reply