Should you use Django Asynchronous Support?
Like me, you might have considered using Django if you wanted a fully-featured framework for rapid web development.
Unfortunately, Django is not celebrated for performance — maybe because of its by-default Synchronous serving (Sync). This is what the introduction walks through (https://docs.djangoproject.com/en/5.0/intro/tutorial01/).
Sync performs poorly because a server thread is only able to serve one request at a time. Importantly, I/O requests block the thread. This leads to a low throughput. Throughput can be increased by adding server threads, but having to consider thread count as an additional factor complicates server throughput optimization. If throughput isn’t high enough to warrant optimizing, you should also consider using a serverless architecture: https://nalkhish.medium.com/hosting-very-low-traffic-web-apps-for-cheap-on-aws-3-year-66f748cd04cf
Asynchronous design (Async) was added in Django 3.0 (https://docs.djangoproject.com/en/3.0/topics/async/), and is not mentioned in the introduction. It’s an afterthought mentioned as the last topic in using Django (https://docs.djangoproject.com/en/5.0/topics/).
Async can outperform Sync because a single Async server thread can serve multiple requests at a time. When a task awaits a coroutine, it yields control back to other threads. This voluntary yielding (1) gives asynchronous design the name collaborative multi-tasking and (2) makes it more efficient than the clock-based control change in multithreading.
So that raised the questions:
- How much more throughput can Django get by using an Asynchronous setup?
- Does it matter what the route/endpoint is doing (CPU-heavy vs IO-heavy)?
- What about if there’s a low latency (the server is nearby) or high latency (server is far away)?
Experiment
So I ran a load test experiment. You can find the code and details here https://github.com/nalkhish/Asynchronize-Django. Briefly, this is what I investigated:
For each setup (sync, async limited, async unlimited):
For each request route (high cpu, high io create, high io read):
For each added latency (200, 100, 0):
Emulate load and generate report
Server Worker Thread Setup
- For Sync, Gunicorn was used to spin up 1 Sync worker (https://docs.gunicorn.org/en/stable/design.html).
- For Async Limited, Gunicorn was used to spin up 1 Uvicorn worker (https://www.uvicorn.org/deployment/)
- For Async Unlimited, Gunicorn was used to spin up 1 Uvicorn worker (https://www.uvicorn.org/deployment/), but postgres was set up to allow 1000 simultaneous connections (more than default 100).
Routes
3 routes were used:
- cpu — indicating that the route is cpu-heavy
def cpu(request: HttpRequest):
"""High cpu route
CPU: High
I/O: low
"""
ct = 0
for i in range(10**7):
ct += i * i
return JsonResponse({"ct": ct})
- io_create — indicating that the route is io-heavy, and that it’s asking the database to create an object. It’s common to create 1 object in backend POST request.
# In sync
def io_create(request: HttpRequest):
"""Create 1 post
CPU: low
I/O: High
"""
new_post = Post.objects.create(
title=request.POST["title"],
content=request.POST["content"],
)
return JsonResponse(new_post.to_dict())
# In async
async def io_create(request: HttpRequest):
"""Create 1 post
CPU: low
I/O: High
"""
new_post = await Post.objects.acreate(
title=request.POST["title"],
content=request.POST["content"],
)
return JsonResponse(new_post.to_dict())
- io_read — indicating that the route is io-heavy, and that it’s asking the database to read some rows. This is common (asking for a page of some stuff).
# In sync
def io_read(request: HttpRequest):
"""Get first 25 posts
CPU: low
I/O: High
"""
posts = Post.objects.all()[:25]
ct = Post.objects.count()
return JsonResponse(
{
"ct": ct,
"posts": [post.to_dict() for post in posts],
}
)
# In Async
async def io_read(request: HttpRequest):
"""Get first 25 posts
CPU: low
I/O: High
"""
posts = []
async for p in Post.objects.all()[:25]:
posts.append(p)
ct = await Post.objects.acount()
return JsonResponse(
{
"ct": ct,
"posts": [post.to_dict() for post in posts],
}
)
Adding Latency
Latency was simulated by adding outbound traffic delay to the Sync/Async servers using the linux tool tc (https://man7.org/linux/man-pages/man8/tc.8.html).
docker exec <container_name> tc qdisc add dev eth0 root netem delay 100ms
Where <container_name> is the name of the container
Load Emulation
Load was emulated using the open-source tool Locust (https://locust.io/). Locust bombards ports with requests. As soon as a ‘user’ finishes its task, it begins again.
The total load lasted 3 minutes:
- Started with 0 users.
- For the first 100 seconds, increasing at a rate of 5 users per second to cap at 500 concurrent users.
- For the last 80 seconds, users were maintained at 500.
For the read io_route, 1 million posts were added to the db on server startup, but before the test.
Findings:
Definitions:
- Max sRPS: Maximum successful requests per second was obtained by taking the maximum difference between overall requests and failed requests for the entire report graphs (see supplementary figures in repo).
- Total successes: Overall successful requests was obtained by subtracting #failed requests from #total requests (see supplementary figures in repo).
CPU-bound route
Table 1: CPU-bound. Max sRPS & Overall successes for combinations of setup (Sync/Async Limited/Async unlimited) and Latency (0ms, 100ms, 200ms). Async actually does poorly.
Async limited vs Async unlimited:
- Why did Async unlimited do better than Async limited? Async unlimited allows more database connections, but the CPU route and middleware setup does not use the database. This needs to be investigated further.
- In any case, both Async setups have unpredictable dynamics (see supplementary figures on repo https://github.com/nalkhish/Asynchronize-Django).
Sync vs Async.
- Sync had lower max sRPS than Async unlimited. This is probably because async servers are able to handle multiple requests at a time and so multiple requests happened to finish at the same time. This is surprising because asyncio supposedly does not switch context unless it hits an await statement, which does not exist in the cpu route. This needs to be investigated further.
- Sync had predictable dynamics and had higher overall successes than Async. This is sufficient to warrant using Sync for cpu-bound services.
IO-bound routes:
Table 2: I/O-bound DB read. Max sRPS & Overall successes for combinations of Server (Async/Sync) and Latency (0ms, 100ms, 200ms). There is a higher throughput at high latencies when using Async instead of Sync.
Table 3: I/O-bound DB create. Max sRPS & Overall successes for combinations of Server (Async/Sync) and Latency (0ms, 100ms, 200ms). There is a higher throughput at high latencies when using Async.
Async limited vs Async unlimited: Async unlimited had higher max sRPS and overall successes than Async limited.
- For the IO-bound read route, this can likely be attributed to the database being a bottleneck as it was failing.
- For the IO-bound create route, this needs to be investigated further as the database was not failing for Async limited (see supplementary figure)
Sync vs Async:
- For both io_read and io_create routes, Sync had a much lower throughput than Async (for an added latency of 200ms the difference in overall throughput was 40-fold for io_read and 230-fold for io_create).
- This is likely because the server worker thread was waiting for database requests to finish before it could handle the next request. This theory is supported by the inverse relationship between latency and max sRPS and overall successes for Sync.
Conclusion:
The limitations are explored more at the repo readme https://github.com/nalkhish/Asynchronize-Django (the little information discussed here, the absence of retry logic, and the absence of inbound latency), but this study sufficiently compares asynchronous and synchronous Django serving.
If you use Django and have CPU-bound loads, use Sync. Otherwise, if the loads are I/O-bound, use Async. It will likely more than 10x your server throughput — depending on how much CPU the server can access.
I did not investigate using multiple synchronous threads because of having to worry about optimizing the number of threads according to additional memory costs and latency experienced by users.
If you want to see more software engineering ideas, follow my medium!