From breadth-first search to task scheduling, queues are extremely useful in computer programming. Although many programming languages support them, I think that it’s important to understand and know how to implement them. I like to implement data structures on-the-fly and customize them for the problem I’m trying to solve. A more practical reason to understand queues is that JavaScript doesn’t have an efficient queue implementation.
I’ll probably update this part later, but according to my programming
techniques syllabus, I’m going to learn how to implement stacks and queues
using linked lists. In paper, it makes lots of sense because the important
operations (enqueue, dequeue) have O(1)
time complexity. But in reality, with
a naive linked list implementation—which I’m 100% sure I will be taught as
the de facto standard—every insertion is a heap allocation, which is
horrible, especially if your item is small. You can reduce allocations by
having each node store multiple items, which is what C++ and Python’s
implementation does. A further optimization would be to reuse the nodes by
putting them in a free
list.
You can use the linked list for both the queue and the free list! When you have
large objects, can’t afford amortized time complexity, and don’t have an
advanced memory allocator, a linked list queue implementation is actually
great. But the code will not be pretty, and there are other alternatives that
you should consider.
In this article, I’ll introduce a queue implementation that:
- Has dynamic size, with amortized
O(1)
insertion complexity. O(1)
complexity for everything else, including random access.- Easy to understand and implement.
- Easy to extend into a deque.
- Highly efficient on modern hardware.
This implementation is nothing new. It’s called a growable ring buffer, and it’s how Rust and many other high-level languages implement queues and deques.
The stack counterpart
In my experiences, queues are slightly more complicated than stacks. For example, in the linked list implementation, you need an extra pointer to the last node along with the pointer to the first one. So I think that it’s easier to show a stack implementation, which is more intuitive, then extrapolate it for queues.
It’s very common to use a dynamic array, e.g., std::vector
for a fast stack
implementation. So let’s implement that from the ground up. For both the stack
and queue, we just need to implement the push and pop operations.
Fixed stack with overflow
First, let’s remove the complexity of a dynamically sized stack and just implement a static one. When there’s no more space, it will overflow. For this, we need an array and a pointer or index pointing to the top of the stack.
1 #define STACK_CAP 1024
2 typedef struct {
3 Item data[STACK_CAP];
4
5 uint32_t top;
6 } Stack;
7
8 bool Stack_push(Stack *stack, Item item) {
9 if (stack->top == STACK_CAP) return false; // overflow
10
11 stack->data[stack->top++] = item;
12 return true;
13 }
14
15 Item *Stack_pop(Stack *stack) {
16 if (stack->top == 0) return NULL; // underflow
17 return stack->data + (--stack->top);
18 }
I use a 32-bit unsigned index instead of a pointer because on 64-bit machines I can save some memory. You can implement more features, but that’s the core of it and what you mostly need. But currently its size is fixed, so let’s try to fix that with some heap memory.
Dynamic-array based stack
For the stack to resize dynamically, you need to use a heap-allocated buffer
and store the current stack’s capacity. When there’s no more space, we can
add more by doubling the buffer using realloc
.
1 typedef struct {
2 Item *data;
3
4 uint32_t top;
5 uint32_t cap;
6 } Stack;
7
8 Stack Stack_new(void) {
9 return (Stack) {
10 .data = malloc(sizeof(Item)),
11 .top = 0,
12 .cap = 1,
13 };
14 }
15
16 void Stack_destroy(Stack stack) {
17 free(stack.data);
18 }
19
20 void Stack_push(Stack *stack, Item item) {
21 if (stack->top == stack->cap) {
22 stack->data = realloc(stack->data, (stack->cap *= 2) * sizeof(Item));
23 }
24
25 stack->data[stack->top++] = item;
26 }
27
28 Item *Stack_pop(Stack *stack) {
29 if (stack->top == 0) return NULL;
30 return stack->data + (--stack->top);
31 }
There’s a bit more code required for dynamic resizing. But in general, I think that the implementation is still easy to understand. Small reallocation is faster and happens more frequently, and large reallocation is slower and happens less frequently. So on average, pushing into the stack is extremely fast. This is the idea behind dynamic arrays, and why they have amortized constant time complexity for appending an element to the end.
Implementing queues
Now we know how to implement a stack using a fixed array and a pointer, and how to expand the stack at runtime with heap memory. Let’s apply this knowledge to implementing queues. The idea is similar: first we build queues on top of a static array, then we try to grow it.
Fixed queue with circular buffer
Let’s first extending the fixed stack and turn it into a queue. To do this, we need to use a data structure known as a circular buffer. Instead of just having a pointer pointing to the head of the stack, we use another one for the tail. And the pointers cycle back to the start of the array instead of overflowing.
1 #define QUEUE_CAP 1024
2 typedef struct {
3 Item data[QUEUE_CAP];
4
5 uint32_t head;
6 uint32_t tail;
7 } Queue;
8
9 void Queue_push(Queue *queue, Item item) {
10 queue->data[queue->head] = item;
11 queue->head = (queue->head + 1) % QUEUE_CAP;
12 }
13
14 Item *Queue_pop(Queue *queue) {
15 Item *item = queue->data + queue->tail;
16 queue->tail = (queue->tail + 1) % QUEUE_CAP;
17 return item;
18 }
Because the pointers always point to valid memory, we don’t have to manually handle overflow and underflow. Currently, the queue will overwrite data when it overflows and return garbage data when it’s empty. But they are data that the queue owns, so that won’t be a problem. However, it’s more convenient if we can detect overflow, and we need it to resize the queue later anyways. One way to do this is to keep track of the size of the array.
#define QUEUE_CAP 1024
typedef struct {
Item data[QUEUE_CAP];
uint32_t head;
uint32_t tail;
+ uint32_t len;
} Queue;
-void Queue_push(Queue *queue, Item item) {
+bool Queue_push(Queue *queue, Item item) {
+ if (queue->len == QUEUE_CAP) return false; // overflow
queue->data[queue->head] = item;
queue->head = (queue->head + 1) % QUEUE_CAP;
+ ++queue->len;
+ return true;
}
Item *Queue_pop(Queue *queue) {
+ if (queue->len == 0) return NULL; // underflow
Item *item = queue->data + queue->tail;
queue->tail = (queue->tail + 1) % QUEUE_CAP;
+ --queue->len;
return item;
}
So to turn a stack into a queue, we have to keep track of two more variables.
There are other ways to remove the length and only use the head and tail by
keeping tail != head
, but this doesn’t feel very natural to me. The modulo
operation to cycle the pointers back to the start is expensive, but we can
optimize it by using a power-of-two capacity and a bitwise operator to speed it
up.
bool Queue_push(Queue *queue, Item item) {
if (queue->len == QUEUE_CAP) return false; // overflow
queue->data[queue->head] = item;
queue->head = (queue->head + 1) & (QUEUE_CAP - 1);
++queue->len;
return true;
}
I don’t know if compilers can automatically detect and optimize power-of-two modulos into bitwise operations, but it’s safer to just do them yourself. From now on, I’ll assume that you’re using a power-of-two capacity and bitwise operations.
Resizing the queue
Resizing the stack array is easy: just give it more memory. For queues, however, as the pointers wrap around, we need to unwrap them after we allocate more memory so that the queue doesn’t overwrite itself.
Before:
[ D E F G H A B C ]
|
tail: ------+
head: ------+
Resize:
[ D E F G H A B C _ _ _ _ _ _ _ _ ]
|
tail: ------+
head: ------+
Unwrap:
[ _ _ _ _ _ A B C D E F G H _ _ _ ]
| |
tail: ------+ |
head: ----------------------+
This can be done easily and efficiently with realloc
and memcpy
.
1 typedef struct {
2 Item *data;
3 uint32_t head;
4 uint32_t tail;
5
6 uint32_t len;
7 uint32_t cap;
8 } Queue;
9
10 Queue Queue_new(void) {
11 return (Queue) {
12 .data = malloc(sizeof(Item)),
13 .head = 0,
14 .tail = 0,
15
16 .len = 0,
17 .cap = 1
18 };
19 }
20
21 void Queue_destroy(Queue queue) {
22 free(queue.data);
23 }
24
25 void Queue_push(Queue *queue, Item item) {
26 if (queue->len == queue->cap) {
27 queue->data = realloc(queue->data, (queue->cap <<= 1) * sizeof(Item));
28 memcpy(queue->data + queue->len, queue->data, queue->head * sizeof(Item));
29 queue->head += queue->len;
30 }
31 queue->data[queue->head] = item;
32 queue->head = (queue->head + 1) & (queue->cap - 1);
33 ++queue->len;
34 }
35
36 Item *Queue_pop(Queue *queue) {
37 if (queue->len == 0) return NULL;
38 Item *item = queue->data + queue->tail;
39 queue->tail = (queue->tail + 1) & (queue->cap - 1);
40 --queue->len;
41 return item;
42 }
That’s the entire implementation of the queue. It looks quite long, but the
core part—the push
and pop
functions—are less than 20 lines of code.
The rest, like memory layout and initialization, are trivial and easy to
remember. Because the data is stored in a contiguous block of memory, the
performance is great, and we don’t have to aggressively optimize it like the
linked list implementation.
Currently, we push items to the head and remove them from the tails, but you can easily do the opposite. Just decrement the pointers instead, and doing it before dereferencing. If you implement all four operations, then you have effectively turned the queue into a double-ended queue! Other than the code size, there’s no overhead to doing it.
JavaScript implementation
At the start of the article, I mentioned the lack of a native, efficient queue implementation in JavaScript. So here’s one using this technique:
1 class Queue {
2 #start = 0
3 #end = 0
4 #len = 0
5
6 #cap = 1 << 10
7
8 #data = new Array(this.#cap)
9
10 enqueue(item) {
11 if (this.#len == this.#cap) {
12 this.#data = this.#data.concat(this.#data)
13 this.#end += this.#cap
14 this.#cap <<= 1
15 }
16 this.#start = (this.#start - 1) & (this.#cap - 1)
17 this.#len++
18 this.#data[this.#start] = item
19 }
20
21 dequeue() {
22 if (this.#len == 0) return null
23 this.#end = (this.#end - 1) & (this.#cap - 1)
24 this.#len--
25 return this.#data[this.#end]
26 }
27 }
It’s the same as the C variant, but I push items backward into the queue because it’s more convenient to return the item when performing dequeue. Also, to resize it, all I have to do is concatenate the buffer with itself. Doing that is very close to a native memory operation and should be really fast.
Conclusion
If you ever need a queue and your programming language doesn’t provide one, then a growable ring buffer is a good choice. It’s fast, easy to implement, and you can extend it into a double-ended queue with no overhead. However, like with dynamic arrays, sometimes you know or can compute the upper bound of how many items can be inside the queue at a given moment; e.g., with BFS, the maximum is the number of nodes in the graph. In that case, it’s better to use a static ring buffer, either stack or heap allocated, instead. You should also consider other stack and queue implementations, even linked list ones, as they might fit your constraints better.