Published: March 28, 2012
Published: March 28, 2012
Today, I’m open sourcing a project that I’ve been working for the last few months. It is a Python library to put work in the background, that you’d typically use in a web context. It is designed to be simple to set up and use, and be of help in almost any modern Python web stack.
Of course, there already exist a few solutions to this problem. Celery (by the excellent @asksol) is by far the most popular Python framework for working with asynchronous tasks. It is agnostic about the underlying queueing implementation, which is quite powerful, but also poses a learning curve and requires a fair amount of setup.
Don’t get me wrong—I think Celery is a great library. In fact, I’ve contributed to Celery myself in the past. My experiences are, however, that as your Python web project grows, there comes this moment where you want to start offloading small pieces of code into the background. Setting up Celery for these cases is a substantial effort that isn’t done swiftly and might be holding you back.
I wanted something simpler. Something that you’d use in all of your Python web projects, not only the big and serious ones.
Redis as a broker
In many modern web stacks, chances are that you’re already using
Redis (by @antirez).
Besides being a kick-ass key value store, Redis also provides semantics to
build a perfect queue implementation. The commands
BLPOP are all it takes.
I wanted a solution that was lightweight, easy to adopt, and easy to grasp. So I devised a simple queueing library for Python, and dubbed it RQ. In a nutshell, you define a job like you would any normal Python function.
def myfunc(x, y): return x * y
Now, with RQ, it is ridiculously easy to put it in the background like this:
from rq import use_connection, Queue # Connect to Redis use_connection() # Offload the "myfunc" invocation q = Queue() q.enqueue(myfunc, 318, 62)
This puts the equivalent of
myfunc(318, 62) on the
default queue. Now,
in another shell, run a separate worker process to perform the actual work:
$ rqworker 12:46:56: 12:46:56: *** Listening on default... 12:47:35: default: mymodule.myfunc(318, 62) (38d9c157-e997-40e2-8d20-574a97ec5a99 12:47:35: Job OK, result = 19716 12:47:35: 12:47:35: *** Listening on default... ...
To poll for the asynchronous result in the web backend, you can use:
>>> r = q.enqueue(myfunc, 318, 62) >>> r.return_value None >>> time.sleep(2) >>> r.return_value 19716
Although I must admit that polling for job results through the
isn’t quite useful and probably won’t be a pattern that you’d use in your
day-to-day work. (I would certainly recommend against doing that, at least.)
There’s extensive documentation available at: http://nvie.github.com/rq.
RQ was designed to be as easy as possible to start using it immediately inside your Python web projects. You only need to pass it a Redis connection to use, because I didn’t want it to create new connections implicitly.
To use the default Redis connection (to
localhost:6379), you only have to do
from rq import use_connection use_connection()
You can reuse an existing Redis connection that you are already using and pass
it into RQ’s
import redis from rq import use_connection my_connection = redis.Redis(hostname='example.com', port=6379) use_connection(my_connection)
There are more advanced ways of connection management available however, so please pick your favorite.
You can safely mix your Redis data with RQ, as RQ prefixes all of its keys
Building your own queueing system
RQ offers functionality to put work on queues. It provides FIFO-semantics per queue, but how many queues you create is up to you.
For the simplest cases, simply using the
default queue suffices already.
>>> q = Queue() >>> q.name 'default'
But you can name your queues however you want:
>>> lo = Queue('low') >>> hi = Queue('high') >>> lo.enqueue(myfunc, 2, 3) >>> lo.enqueue(myfunc, 4, 5) >>> hi.enqueue(myfunc, 6, 7) >>> lo.count 2 >>> hi.count 1
Both queues are equally important to RQ. None of these has higher priority as far as RQ is concerned. But when you start a worker, you are defining queue priority by the order of the arguments:
$ rqworker high low 12:47:35: 12:47:35: *** Listening on high, low... 12:47:35: high: mymodule.myfunc(6, 7) (cc183988-a507-4623-b31a-f0338031b613) 12:47:35: Job OK, result = 42 12:47:35: 12:47:35: *** Listening on high, low... 12:47:35: low: mymodule.myfunc(2, 3) (95fe658e-b23d-4aff-9307-a55a0ee55650) 12:47:36: Job OK, result = 6 12:47:36: 12:47:36: *** Listening on high, low... 12:47:36: low: mymodule.myfunc(4, 5) (bfb89229-3ce4-463c-abf8-f19c2808cb7c) 12:47:36: Job OK, result = 20 ...
First, all work on the
high queue is done (with FIFO semantics), then
is emptied. If meanwhile work is enqueued on
high, that work takes
precedence over the
low queue again after the currently running job is
No rocket science here, just what you’d expect.
Insight over performance
One of the things I missed most in other queueing systems is to have a decent view of what’s going on in the system. For example:
- What queues exist?
- How many messages are on each queue?
- What workers are listening on what queues?
- Who’s idle or busy?
- What actual messages are on the queue?
RQ provides an answer to all of these questions (except for the last one,
currently), via the
$ rqinfo high |██████████████████████████ 20 low |██████████████ 12 default |█████████ 8 3 queues, 45 jobs total Bricktop.19233 idle: low Bricktop.19232 idle: high, default, low Bricktop.18349 idle: default 3 workers, 3 queues
Showing only a subset of queues (including empty ones):
$ rqinfo high archive high |██████████████████████████ 20 archive | 0 2 queues, 20 jobs total Bricktop.19232 idle: high 1 workers, 2 queues
If you want to parse the output of this script, you can specify the
flag to disable the fancy drawing. Example:
$ rqinfo --raw queue high 20 queue low 12 queue default 8 worker Bricktop.19233 idle low worker Bricktop.19232 idle high,default,low worker Bricktop.18349 idle default
You can also sort the same data by queue:
$ rqinfo --by-queue high |██████████████████████████ 20 low |██████████████ 12 default |█████████ 8 3 queues, 45 jobs total high: Bricktop.19232 (idle) low: Bricktop.19233 (idle), Bricktop.19232 (idle) default: Bricktop.18349 (idle), Bricktop.19232 (idle) 3 workers, 4 queues
By default, these monitoring commands autorefresh every 2.5 seconds, but you can change the refresh interval if you want to. See the monitoring docs for more info.
RQ does not try to solve all of your queueing needs. But its codebase is relatively small and certainly not overly complex. Nonetheless, I think it will be helpful for all of the most basic queueing needs that you’ll encounter during Python web development.
Of course, with all this also come some limitations:
- It’s Python-only
- It’s Redis-only
- The workers are Unix-only
Please, give feedback
I’m using RQ for two and a half web projects I’ve worked on during the last few months, and I am currently at the point where I’m satisfied enough to open the curtains to the world. So you’re invited to play with it. I’m very curious to hear your thoughts about this.
If you’d like to contribute, please go fork me on GitHub.
If you want to get in touch, I'm @nvie on Twitter.