It’s useful to reuse objects that are expensive to create.
Doing so can increase the efficiency of your programs & make it easier to reason about them.
A resource pool is a construct that allows multiple clients to share the same objects.
I recently needed to pool UdpSockets when following dnsguide.
The pool allowed me to reuse sockets that were already bound to a port & share them concurrently.
The Recipe
A resource pool needs three mechanisms:
- Concurrency control
- Object storage
- Object reuse
Concurrency Control
The pool needs to give exclusive access to the objects it manages; an object should have only one owner at a time.
Object Storage
The pool needs to store objects somewhere so that it knows how to retrieve them when needed.
Object Reuse
The pool needs to recycle objects that are no longer being used.
RAII Guards are a great way to implement object reuse. Std’s Mutex & MutexGuard are exemplary for this pattern.
Alternatively, a library designer may allow their clients to manually add & remove entries from the pool. This is error-prone.
Implementation
This section will describe the implementation of tub, an async pool crate.
Pool is a construct that allows clients to retrieve an exclusive reference to an object from the pool.
Clients call Pool::acquire to retrieve a future that returns a Guard. The future will resolve when an object is available.
The Guard is a construct that allows object access and automatic recycling + reuse. It uses RAII to ensure that the object is returned to the pool when the guard is dropped.
The Pool is built on top of tokio::sync::Notify and crossbeam_queue::ArrayQueue. These dependencies provide concurrency control and object storage.
Structs
pub struct Pool<T> {
inner: Arc<PoolInner<T>>,
}
struct PoolInner<T> {
queue: ArrayQueue<T>,
notify: Notify,
}
pub struct Guard<T> {
value: Option<T>,
inner: Arc<PoolInner<T>>,
}
PoolInner
is used to hold the queue and notify. We wrap PoolInner
in an Arc so that we can clone it and share it between the Pool and Guard.
Impl
You can refer to tub/src/lib.rs for the full implementation.
Pool
impl<T> Pool<T> {
#[inline]
pub async fn acquire(&self) -> Guard<T> {
let inner = self. inner.clone();
loop {
if let Some(value) = inner.queue.pop() {
return Guard {
value: Some(value),
inner,
};
}
inner.notify.notified().await;
}
}
}
This block of code returns a future that resolves into a Guard when a shared object is available.
It first checks if there are any available objects. If there are, it returns a Guard with the object. Otherwise, it waits for a notification that an object is available.
ArrayQueue::pop is used to retrieve an object from the queue. This queue is safe in the presence of concurrent access & does not block the current thread.
Notify::notified is used to wake the task up when an object is available.
The method loops until an object is acquired (it acts similar to a compare-and-swap loop that you see frequently in concurrent data structures).
Guard
impl<T> Drop for Guard<T> {
#[inline]
fn drop(&mut self) {
if let Some(value) = self.value.take() {
// Safety: The queue will never be full when a Guard is alive
let _ = self.inner.queue.push(value);
self.inner.notify.notify_one();
}
}
}
impl<T> Deref for Guard<T> {
type Target = T;
fn deref(&self) -> &Self::Target {
// Safety: The value is always Some
self.value.as_ref().unwrap()
}
}
impl<T> DerefMut for Guard<T> {
fn deref_mut(&mut self) -> &mut Self::Target {
// Safety: The value is always Some
self.value.as_mut().unwrap()
}
}
The Guard allows clients to access the object and automatically return it to the pool when it is dropped. The value is wrapped in an Option so that it can be taken out of the Guard when it is dropped.
Object access is achieved by implementing Deref & DerefMut for Guard. The implementation simply calls as_ref & as_mut on the Option and unwraps the result. This is safe because the Option is always Some when the Guard is alive.
Object reclamation is achieved by implementing Drop for Guard. When the Guard is dropped, it will return the object to the pool by calling ArrayQueue::push and Notify::notify_one.
It is safe to call ArrayQueue::push because the queue will never be full when a Guard is alive.
Notify::notify_one is used instead of Notify::notify_all because only one client will be able to “win” the race for the recycled object.
Using Notify::notify_all would result in all clients being woken up, even if only one of them can actually acquire the object. This causes contention, which results in wasted CPU cycles and higher latency.
Benchmarking
It’s time for a showdown!
We will put a few random async pool crates from crates.io to the test.
The crates
The following crates are analyzed:
The following crates were omitted due to issues when benchmarking:
- deadpool - appears to deadlock using the unmanaged pool. I may have misused the crate.
- qp - appears to deadlock
- lease - heap corruption detected
- async-resource - ran out of resources in comparison benchmark
Methodology
The code used for benchmarking is publicly available.
By following Criterion’s Async Benchmarking Guide, I was able to create a benchmark powered by the Tokio runtime.
The tests exercise resource acquisition and release by spawning a number of tasks that acquire a resource from the pool and then release it.
All benchmarks were written in a similar way, namely:
- Create a pool with 1 object
- Spawn
iters
tasks that each acquire a resource from the pool - Wait for all tasks to complete
The following code snippet shows the benchmarking method for tub:
async fn run_tub(pool: tub::Pool<u32>, iters: usize) {
let pool = Arc::new(pool);
join_all(
(0..iters)
.map(|_| {
let pool = pool.clone();
tokio::spawn(async move {
// Acquire a resource from the pool
let guard = pool.acquire().await;
// Return the resource to the pool
drop(guard)
})
})
.collect::<Vec<_>>(),
)
.await;
}
This method is run multiple times during the benchmarks. For more information on the benchmarking suite, see Criterion.
Results
The benchmarks help us understand how efficient the underlying mechanisms for concurrency control, object storage, and object reuse are.
Here is a benchmark between tub, simple-pool, and async-object-pool.
It seems that async-object-pool is orders of magnitude slower, so we take a close look at tub and simple-pool.
100,000 acquires & releases
In the following sections I show benchmarks for running 100,000 acquire & release operations across tasks.
tub
Benchmarks for tub:
async-object-pool
Benchmarks for async-object-pool:
simple-pool
Benchmarks for simple-pool:
async-resource
Benchmarks for async-resource:
Conclusion
In this post we have seen how to create an async pool by building the necessary mechanisms for concurrency control, object storage, and object reuse.
The benchmarks show that tub performs well relative to similar crates.
And just like that another no name async-pool crate is born!
Addendum
It may be possible that I’ve used the API for these crates incorrectly, causing some performance degradation (I have this suspicion for async-object-pool).
If I did, feel free to raise a pull request addressing the benchmarks on https://github.com/wcygan/tub/.