15 May 2019

Priority queues in Java and Python

java-python-queue.jpg

Cutting in line is not nice, and people who do it are often jerks, or VIP. Or both. But sometimes the semantics behind this can be useful, and there's a data structure that allows this called the priority queue.

Values you insert into a priority queue don't necessarily go at the end; instead they get inserted in an order-preserving way. So if I insert a 5 into this priority queue:

1 3 4 7 8

It will look like this after insertion:

1 3 4 5 7 8

"So, it's like a list that keeps itself sorted?" the skeptical reader may ask at this point. Yes, and no. What you see above looks from the outside like a list that keeps itself sorted. But internally, we can store things as a heap or as a binary tree, and gain a speed advantage when we insert or access/remove elements. A good way to think of these internal data structures is that they are like different sorting algorithms, but "frozen in time" as data structures. (That's not just poetic; Wikipedia makes the equivalence explicit here.)

Priority queues are good for a number of things. Maybe you're implementing a job queue of future tasks to process, but some jobs should be given priority over others. (These more important jobs are the jerks, or VIPs, cutting in line so that us regular jobs have to wait longer.) The whole thing affords a fair amount of flexibility; you can have one level of urgency, or several.

Or maybe you're building a discrete event simulation, simulating an elevator or a pool game or a SimCity-like world by processing events in chronological order. A priority queue helps by constantly serving up the next event to be processed. The value that the events are being ordered by here is an increasing time coordinate; the beauty of the data structure is that we don't have to constantly re-sort things — the priority queue upholds the ordering for us.

Priority queues are a generalization of regular queues. If we have a priority queue, we can make it behave like a regular queue by making the priority value be an ever-incrementing sequence number, placing all the inserted values at the end. Regular queue; no jerks.

"Priority queue" is the name of the abstract data structure, similar to "list" or "dictionary". The abstract concept doesn't say anything about how things are implemented under the hood. Implementations of a priority queue (concrete data structures) have names like "binary heap" or "balanced binary tree". This distinction between abstract and concrete will be important in what follows.

Java and Python, both object-oriented languages, each have implementations of priority queues. They end up in the same place and you can do the same things with them, but the way they expose these data structures is quite distinct.

Java

The Java Collections Framework is thorough and impressive, and priority queues are no exception. The PriorityQueueclass gives you a priority queue.

Using our running example (and Java 11's var syntax):

var queue = new PriorityQueue<>(List.of(4,  8,  7,  3, 1));
queue.add(5);

We'd use the add method, as above, to insert new elements into the queue. The 5 ends up between the 4 and the 7in the sense that if we start taking elements out, it will be the fourth element coming out.

queue.poll();      //  1
queue.poll();      //  3
queue.poll();      //  4
queue.poll();      //  5

These integers get stored in ascending order, because that's the natural ordering for Integer. We also have the option, when creating the PriorityQueue, to pass in a custom Comparator specifying a preferred ordering.

PriorityQueue implements the interface Queue. If we wanted a regular queue, we would probably go with ArrayDeque. In the code above, if we changed PriorityQueue to ArrayDeque, the code would still work, but the 5would be inserted at the end instead.

An ArrayDeque can add and poll elements in (amortized) constant time. A PriorityQueue needs to uphold the ordering, and so takes logarithmic time for both add and poll.

Finally, if you are sharing the queue across threads, you might want to swap in a PriorityBlockingQueue. (Again, the code above will Just Work if you do that.) With it, you get thread safety thrown into the deal — many simultaneous addand poll calls from different threads won't screw up the order or throw a ConcurrentModificationException. (Instead, the calls will "line up" to use the queue, as it were. A wee bit meta.)

This plug-and-play aspect of the Collections Framework is one of its many underappreciated advantages. There are a few general interfaces (like Queue or Deque), and many implementations of them.

Python

Python has a different take on priority queues. This Python code (run in the Python REPL) is the moral equivalent of the Java PriorityQueue code:

>>> queue = [4, 8, 7, 3, 1]
>>> import heapq
>>> heapq.heapify(queue)
>>> heapq.heappush(queue, 5)

Wait, what? A priority queue is a list in Python!? Well, yes. More exactly, it's a regular list where we promise to uphold the "heap invariant", which expects elements to be sorted to a certain extent. For example, if we print our queueat this point:

>>> queue
[1, 3, 5, 4, 8, 7]

Just as before, when we start popping out elements at the front, the come out in ascending order:

>>> heapq.heappop(queue)
1
>>> heapq.heappop(queue)
3
>>> heapq.heappop(queue)
4
>>> heapq.heappop(queue)
5
>>> queue
[7, 8]

A colleague of mine who reviewed this article looked at the behavior of heapq above and mumbled "there must be some hidden data structure somewhere, keeping track of the order..." But no, it's all done through the list itself. Coming back to the [1, 3, 5, 4, 8, 7] list and why that upholds the invariant, that's best shown through a picture:

 1
|
+----+----+
     |    |
     3    5--------------+
     |                   |
     +---------+----+    7
               |    |
               4    8

The list encodes an implicit tree. An element at index k have children (if they exist) at index k*2+1 and k*2+2. The heap invariant requires that children be greater than or equal to their parents, which is true for the above list. Running heapq.heapify (and after subsequent operations), there's just enough sorting for the heap invariant to be satisfied.

(My colleague took all this in, and said "I prefer the Java approach". Fair enough.)

The Python heapq has a "bring your own data structure" feel compared to Java's PriorityQueue. The pipes are showing; the underlying list is just a regular list, and the heap invariant is yours to screw up. (But if you do, then that's egg on your face; your code won't work as it's supposed to.)

This shows a difference in philosophy and culture between Java and Python. Java exposes a class PriorityQueue which encapsulates and hides all the implementation details; Python exposes a module of static methods, and gives you the algorithm "priority heap" which you can use to model a priority queue.

I was curious, so I went back in the python-dev email archives to see if this approach was ever discussed. I found this quote by Guido van Rossum back in 2002:

[...] a class seems too heavy for this, just like it is overkill for bisect [...]

(Another developer had written an alternative implementation which hid everything in a class. So they had the chance to encapsulate in a class, but chose to expose everything.)

A tale of two pities

In Java, everything is neatly tucked away. In Python, the pipes are showing — you're given the tools, but it's up to you to use them right. There's no right or wrong here, it's just two approaches to library API design.

But it does nicely mirror how both languages (and their surrounding cultures) think about privacy. In Java, you declare your fields private as a matter of course — this is what helps protect your object's invariants, and allows future refactoring. Python has no private keyword and no corresponding idiomatic notion of protecting your object's data — the general expectation is that you store your object data as attributes, and then no consumer of the object abuses the right to read or write them. Java's system is based on enforcement; Python's is based on social convention.

Could this cultural difference be tied to where the languages are (often) used, and the expectations of their communities? Java, found in the enterprise where "signing off on code" is a thing, operates in environments where a colleague not messing with your class's invariants is an overriding concern; Python, used in education, in startups, and in open-source projects, prioritizes openness and "showing your work", and accepts the risk of abuse with the attitude of "don't do that". Almost two decades of heapq usage in the wild show that the informal rules are easy to stick to.

I'm actually not sure which approach I like the best. Java's enapsulation or Python's implicit trust? I can sort of argue either side on that one. It might just be that both have their place in the world.

Related courses

  • Beginning Java

     This course provides an introduction to the Java language. 

    Category: Java
    Duration: 3 days
    Price: 25 900 SEK
  • Beginning Python

    Many people find it helpful to learn a scripting language these days. Python is one of the more mature ones out there, with a strong tradition, a helpful community, and a massive set of ready-to-use modules. Thus, increasing your knowledge about Python means getting a further reach in solving many everyday problems. 

    Category: Other
    Duration: 3 days
    Price: 25 900 SEK
  • Intermediate Java

    This course will present you with some advanced concepts in Java. 

    Category: Java
    Duration: 3 days
    Price: 25 900 SEK
  • Intermediate Python

    This course presents a straightforward, broad and deep introduction to Python, its syntax and semantics, and its module ecosystem. 

    Category: Other
    Duration: 2 days
    Price: 21 500 SEK