πŸ‘‰ Latest post:
"Why .every() on an empty list is true"
By Vincent Driessen
on Wednesday, March 28, 2012

Today, I’m open sourcing a project that I’ve been working for the last few months. It is a Python library to put work in the background, that you’d typically use in a web context. It is designed to be simple to set up and use, and be of help in almost any modern Python web stack.

Existing solutions

Of course, there already exist a few solutions to this problem. Celery (by the excellent @asksol) is by far the most popular Python framework for working with asynchronous tasks. It is agnostic about the underlying queueing implementation, which is quite powerful, but also poses a learning curve and requires a fair amount of setup.

Don’t get me wrongβ€”I think Celery is a great library. In fact, I’ve contributed to Celery myself in the past. My experiences are, however, that as your Python web project grows, there comes this moment where you want to start offloading small pieces of code into the background. Setting up Celery for these cases is a substantial effort that isn’t done swiftly and might be holding you back.

I wanted something simpler. Something that you’d use in all of your Python web projects, not only the big and serious ones.

Redis as a broker

In many modern web stacks, chances are that you’re already using Redis (by @antirez). Besides being a kick-ass key value store, Redis also provides semantics to build a perfect queue implementation. The commands RPUSH, LPOP and BLPOP are all it takes.

Inspired by Resque (by defunkt) and the simplicity of this Flask snippet (by @mitsuhiko), I’ve challenged myself to imagine just how hard a job queue library really should be.

Introducing RQ

I wanted a solution that was lightweight, easy to adopt, and easy to grasp. So I devised a simple queueing library for Python, and dubbed it RQ. In a nutshell, you define a job like you would any normal Python function.

def myfunc(x, y):
    return x * y

Now, with RQ, it is ridiculously easy to put it in the background like this:

from rq import use_connection, Queue

# Connect to Redis
use_connection()

# Offload the "myfunc" invocation
q = Queue()
q.enqueue(myfunc, 318, 62)

This puts the equivalent of myfunc(318, 62) on the default queue. Now, in another shell, run a separate worker process to perform the actual work:

$ rqworker
12:46:56:
12:46:56: *** Listening on default...
12:47:35: default: mymodule.myfunc(318, 62) (38d9c157-e997-40e2-8d20-574a97ec5a99
12:47:35: Job OK, result = 19716
12:47:35:
12:47:35: *** Listening on default...
...

To poll for the asynchronous result in the web backend, you can use:

>>> r = q.enqueue(myfunc, 318, 62)
>>> r.return_value
None
>>> time.sleep(2)
>>> r.return_value
19716

Although I must admit that polling for job results through the return_value isn’t quite useful and probably won’t be a pattern that you’d use in your day-to-day work. (I would certainly recommend against doing that, at least.)

There’s extensive documentation available at: http://nvie.github.com/rq.

Near-zero configuration

RQ was designed to be as easy as possible to start using it immediately inside your Python web projects. You only need to pass it a Redis connection to use, because I didn’t want it to create new connections implicitly.

To use the default Redis connection (to localhost:6379), you only have to do this:

from rq import use_connection
use_connection()

You can reuse an existing Redis connection that you are already using and pass it into RQ’s use_connection function:

import redis
from rq import use_connection

my_connection = redis.Redis(hostname='example.com', port=6379)
use_connection(my_connection)

There are more advanced ways of connection management available however, so please pick your favorite. You can safely mix your Redis data with RQ, as RQ prefixes all of its keys with rq:.

Building your own queueing system

RQ offers functionality to put work on queues. It provides FIFO-semantics per queue, but how many queues you create is up to you. For the simplest cases, simply using the default queue suffices already.

>>> q = Queue()
>>> q.name
'default'

But you can name your queues however you want:

>>> lo = Queue('low')
>>> hi = Queue('high')
>>> lo.enqueue(myfunc, 2, 3)
>>> lo.enqueue(myfunc, 4, 5)
>>> hi.enqueue(myfunc, 6, 7)
>>> lo.count
2
>>> hi.count
1

Both queues are equally important to RQ. None of these has higher priority as far as RQ is concerned. But when you start a worker, you are defining queue priority by the order of the arguments:

$ rqworker high low
12:47:35:
12:47:35: *** Listening on high, low...
12:47:35: high: mymodule.myfunc(6, 7) (cc183988-a507-4623-b31a-f0338031b613)
12:47:35: Job OK, result = 42
12:47:35:
12:47:35: *** Listening on high, low...
12:47:35: low: mymodule.myfunc(2, 3) (95fe658e-b23d-4aff-9307-a55a0ee55650)
12:47:36: Job OK, result = 6
12:47:36:
12:47:36: *** Listening on high, low...
12:47:36: low: mymodule.myfunc(4, 5) (bfb89229-3ce4-463c-abf8-f19c2808cb7c)
12:47:36: Job OK, result = 20
...

First, all work on the high queue is done (with FIFO semantics), then low is emptied. If meanwhile work is enqueued on high, that work takes precedence over the low queue again after the currently running job is finished.

No rocket science here, just what you’d expect.

Insight over performance

One of the things I missed most in other queueing systems is to have a decent view of what’s going on in the system. For example:

  • What queues exist?
  • How many messages are on each queue?
  • What workers are listening on what queues?
  • Who’s idle or busy?
  • What actual messages are on the queue?

RQ provides an answer to all of these questions (except for the last one, currently), via the rqinfo tool.

$ rqinfo
high       |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 20
low        |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 12
default    |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 8
3 queues, 45 jobs total

Bricktop.19233 idle: low
Bricktop.19232 idle: high, default, low
Bricktop.18349 idle: default
3 workers, 3 queues

Showing only a subset of queues (including empty ones):

$ rqinfo high archive
high       |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 20
archive    | 0
2 queues, 20 jobs total

Bricktop.19232 idle: high
1 workers, 2 queues

If you want to parse the output of this script, you can specify the --raw flag to disable the fancy drawing. Example:

$ rqinfo --raw
queue high 20
queue low 12
queue default 8
worker Bricktop.19233 idle low
worker Bricktop.19232 idle high,default,low
worker Bricktop.18349 idle default

You can also sort the same data by queue:

$ rqinfo --by-queue
high       |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 20
low        |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 12
default    |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 8
3 queues, 45 jobs total

high:    Bricktop.19232 (idle)
low:     Bricktop.19233 (idle), Bricktop.19232 (idle)
default: Bricktop.18349 (idle), Bricktop.19232 (idle)
3 workers, 4 queues

By default, these monitoring commands autorefresh every 2.5 seconds, but you can change the refresh interval if you want to. See the monitoring docs for more info.

Limitations

RQ does not try to solve all of your queueing needs. But its codebase is relatively small and certainly not overly complex. Nonetheless, I think it will be helpful for all of the most basic queueing needs that you’ll encounter during Python web development.

Of course, with all this also come some limitations:

  • It’s Python-only
  • It’s Redis-only
  • The workers are Unix-only

Please, give feedback

I’m using RQ for two and a half web projects I’ve worked on during the last few months, and I am currently at the point where I’m satisfied enough to open the curtains to the world. So you’re invited to play with it. I’m very curious to hear your thoughts about this.

If you’d like to contribute, please go fork me on GitHub.

Other posts on this blog

If you want to get in touch, I'm @nvie on Twitter.