Today, Iβm open sourcing a project that Iβve been working for the last few months. It is a Python library to put work in the background, that youβd typically use in a web context. It is designed to be simple to set up and use, and be of help in almost any modern Python web stack.
Existing solutions ΒΆ
Of course, there already exist a few solutions to this problem. Celery (by the excellent @asksol) is by far the most popular Python framework for working with asynchronous tasks. It is agnostic about the underlying queueing implementation, which is quite powerful, but also poses a learning curve and requires a fair amount of setup.
Donβt get me wrongβI think Celery is a great library. In fact, Iβve contributed to Celery myself in the past. My experiences are, however, that as your Python web project grows, there comes this moment where you want to start offloading small pieces of code into the background. Setting up Celery for these cases is a substantial effort that isnβt done swiftly and might be holding you back.
I wanted something simpler. Something that youβd use in all of your Python web projects, not only the big and serious ones.
Redis as a broker ΒΆ
In many modern web stacks, chances are that youβre already using
Redis (by @antirez).
Besides being a kick-ass key value store, Redis also provides semantics to
build a perfect queue implementation. The commands
RPUSH
,
LPOP
and
BLPOP
are all it takes.
Inspired by Resque (by defunkt) and the simplicity of this Flask snippet (by @mitsuhiko), Iβve challenged myself to imagine just how hard a job queue library really should be.
Introducing RQ ΒΆ
I wanted a solution that was lightweight, easy to adopt, and easy to grasp. So I devised a simple queueing library for Python, and dubbed it RQ. In a nutshell, you define a job like you would any normal Python function.
def myfunc(x, y): return x * y
Now, with RQ, it is ridiculously easy to put it in the background like this:
from rq import use_connection, Queue # Connect to Redis use_connection() # Offload the "myfunc" invocation q = Queue() q.enqueue(myfunc, 318, 62)
This puts the equivalent of myfunc(318, 62)
on the default
queue. Now, in
another shell, run a separate worker process to perform the actual work:
$ rqworker 12:46:56: 12:46:56: *** Listening on default... 12:47:35: default: mymodule.myfunc(318, 62) (38d9c157-e997-40e2-8d20-574a97ec5a99 12:47:35: Job OK, result = 19716 12:47:35: 12:47:35: *** Listening on default... ...
To poll for the asynchronous result in the web backend, you can use:
>>> r = q.enqueue(myfunc, 318, 62) >>> r.return_value None >>> time.sleep(2) >>> r.return_value 19716
Although I must admit that polling for job results through the return_value
isnβt quite useful and probably wonβt be a pattern that youβd use in your
day-to-day work. (I would certainly recommend against doing that, at least.)
Thereβs extensive documentation available at: http://nvie.github.com/rq.
Near-zero configuration ΒΆ
RQ was designed to be as easy as possible to start using it immediately inside your Python web projects. You only need to pass it a Redis connection to use, because I didnβt want it to create new connections implicitly.
To use the default Redis connection (to localhost:6379
), you only have to do
this:
from rq import use_connection use_connection()
You can reuse an existing Redis connection that you are already using and pass
it into RQβs use_connection
function:
import redis from rq import use_connection my_connection = redis.Redis(hostname='example.com', port=6379) use_connection(my_connection)
There are more advanced ways of connection management available however, so
please pick your favorite. You
can safely mix your Redis data with RQ, as RQ prefixes all of its keys with
rq:
.
Building your own queueing system ΒΆ
RQ offers functionality to put work on queues. It provides FIFO-semantics per
queue, but how many queues you create is up to you. For the simplest cases,
simply using the default
queue suffices already.
>>> q = Queue() >>> q.name 'default'
But you can name your queues however you want:
>>> lo = Queue('low') >>> hi = Queue('high') >>> lo.enqueue(myfunc, 2, 3) >>> lo.enqueue(myfunc, 4, 5) >>> hi.enqueue(myfunc, 6, 7) >>> lo.count 2 >>> hi.count 1
Both queues are equally important to RQ. None of these has higher priority as far as RQ is concerned. But when you start a worker, you are defining queue priority by the order of the arguments:
$ rqworker high low 12:47:35: 12:47:35: *** Listening on high, low... 12:47:35: high: mymodule.myfunc(6, 7) (cc183988-a507-4623-b31a-f0338031b613) 12:47:35: Job OK, result = 42 12:47:35: 12:47:35: *** Listening on high, low... 12:47:35: low: mymodule.myfunc(2, 3) (95fe658e-b23d-4aff-9307-a55a0ee55650) 12:47:36: Job OK, result = 6 12:47:36: 12:47:36: *** Listening on high, low... 12:47:36: low: mymodule.myfunc(4, 5) (bfb89229-3ce4-463c-abf8-f19c2808cb7c) 12:47:36: Job OK, result = 20 ...
First, all work on the high
queue is done (with FIFO semantics), then low
is emptied. If meanwhile work is enqueued on high
, that work takes precedence
over the low
queue again after the currently running job is finished.
No rocket science here, just what youβd expect.
Insight over performance ΒΆ
One of the things I missed most in other queueing systems is to have a decent view of whatβs going on in the system. For example:
- What queues exist?
- How many messages are on each queue?
- What workers are listening on what queues?
- Whoβs idle or busy?
- What actual messages are on the queue?
RQ provides an answer to all of these questions (except for the last one,
currently), via the rqinfo
tool.
$ rqinfo high |ββββββββββββββββββββββββββ 20 low |ββββββββββββββ 12 default |βββββββββ 8 3 queues, 45 jobs total Bricktop.19233 idle: low Bricktop.19232 idle: high, default, low Bricktop.18349 idle: default 3 workers, 3 queues
Showing only a subset of queues (including empty ones):
$ rqinfo high archive high |ββββββββββββββββββββββββββ 20 archive | 0 2 queues, 20 jobs total Bricktop.19232 idle: high 1 workers, 2 queues
If you want to parse the output of this script, you can specify the --raw
flag to disable the fancy drawing. Example:
$ rqinfo --raw queue high 20 queue low 12 queue default 8 worker Bricktop.19233 idle low worker Bricktop.19232 idle high,default,low worker Bricktop.18349 idle default
You can also sort the same data by queue:
$ rqinfo --by-queue high |ββββββββββββββββββββββββββ 20 low |ββββββββββββββ 12 default |βββββββββ 8 3 queues, 45 jobs total high: Bricktop.19232 (idle) low: Bricktop.19233 (idle), Bricktop.19232 (idle) default: Bricktop.18349 (idle), Bricktop.19232 (idle) 3 workers, 4 queues
By default, these monitoring commands autorefresh every 2.5 seconds, but you can change the refresh interval if you want to. See the monitoring docs for more info.
Limitations ΒΆ
RQ does not try to solve all of your queueing needs. But its codebase is relatively small and certainly not overly complex. Nonetheless, I think it will be helpful for all of the most basic queueing needs that youβll encounter during Python web development.
Of course, with all this also come some limitations:
- Itβs Python-only
- Itβs Redis-only
- The workers are Unix-only
Please, give feedback ΒΆ
Iβm using RQ for two and a half web projects Iβve worked on during the last few months, and I am currently at the point where Iβm satisfied enough to open the curtains to the world. So youβre invited to play with it. Iβm very curious to hear your thoughts about this.
If youβd like to contribute, please go fork me on GitHub.