How Redis Implements Its Dictionary

Hash Table

A hash table is an array of size buckets, table[0...size-1]. Hashing an object yields an index; we store the object in table[value]. When a bucket holds multiple objects, they are chained into a linked list. This collision strategy is called separate chaining.

Load Factor

If a hash table has size buckets and used stored elements, used / size is the load factor. When loadFactor <= 1, expected lookup cost is O(1). So on each insert we must keep loadFactor < 1.

Expand & Incremental Rehash

Once loadFactor reaches 1, we cannot simply append another element — lookup would no longer be O(1) on average. We need to grow size so that after the insert, used / size stays at most 1.

How to expand? C++ vector-style doubling is a good model:

Whenever loadFactor == 1, allocate a new bucket array twice as large, then rehash every element from the old array into the new one. Rehashing must recompute each element’s bucket because the mask changes.

The downside is that expansion happens in one big step and can take a long time. While loadFactor == 1, a single insert may stall badly.

dict.c in Redis uses two hash tables for resize and migration: when the first table’s loadFactor hits 1 and we insert another key, Redis allocates a second table twice as large, moves one non-empty bucket from the first table into the second, then stores the new key in the second table. Each further insert moves one more non-empty bucket, then stores in the second table — until the first table is empty.

That spreads the full migration across many inserts; each incremental step has expected O(1) cost, so no single insert waits for a full rehash.

To see how this works, look at the two structs in dict.h:

typedef struct dictht {
    dictEntry **table;
    unsigned long size;
    unsigned long sizemask;
    unsigned long used;
} dictht;

typedef struct dict {
    dictType *type;
    void *privdata;
    dictht ht[2];
    int rehashidx; /* rehashing not in progress if rehashidx == -1 */
    int iterators; /* number of iterators currently running */
} dict;

dictht is the bucket array: size is capacity (usually 2^n), sizemask (usually 2^n - 1, i.e. n low bits set) masks hash values, and used counts entries. dict is the dictionary with two bucket arrays; type holds function pointers for hashing and key/value handling.

d->rehashidx

This field is central:

d->rehashidx decides whether new keys go into bucket array 0 or 1, and which bucket in d->ht[0] is being migrated to d->ht[1]. When d->rehashidx == -1, new keys go into array 0. When d->rehashidx != -1, Redis migrates the next non-empty bucket from array 0 to array 1 (rehash). Each entry must be rehashed because d->ht[1]->sizemask differs from d->ht[0]->sizemask. New keys go into array 1 while array 0 still has loadFactor == 1 and array 1 stays below 1.

When array 0 is empty, free it, point array 0 at array 1, set d->rehashidx = -1, and array 1 becomes unused until the next expansion. When array 0 fills again, allocate a doubled array 1, set d->rehashidx = 0, and new inserts go to array 1.

Deletes, lookups, and updates also check d->rehashidx and may advance rehashing, which speeds migration.

Below is simplified pseudocode for inserting element[1..n] and showing expand/rehash:

// Initialize two hash tables
d->h[0].size = 4 ; d->h[1].used = 0 ;  // allocate four empty buckets
d->h[1].size = 0 ; d->h[1].used = 0 ;  // initialize an empty table

for(i = 1 ; i <= n ; ++ i){
    if( d->rehashidx !=-1 ){
                if(d->h[0]->used != 0){
                   Move one non-empty bucket from d->h[0] to d->h[1] (rehash each entry)
                   // After this step:
                   // d->h[0]->used -= number of moved entries
                   // d->h[1]->used += number of moved entries;
                   Insert element[i] into d->h[1];  // d->h[1]->used ++
                }else{
                   // Replace bucket array 0 with array 1;
                   // free d->h[0] before assignment, then reset d->h[1])
                   d->h[0] = d->h[1] ; 
                   d->rehashidx = -1 ; 
                   Insert element[i] into d->h[0]; // d->h[0]->used ++ ;
                }
    }else if( d->h[0]->used >= d->h[0]->size )
                d->h[1] = new bucket[2*d->h[0]->size ];    
                // d->h[1]->size is twice d->h[0]->size
                Insert element[i] into d->h[1];  // d->h[1]->used ++
                d->rehashidx = 0 ;                             
    }else{
                Insert element[i] into d->h[0];  // d->h[0]->used ++
    }
}

Dictionary Iterators

There are safe iterators and unsafe iterators.

A safe iterator prevents bucket migration between the two hash tables while the iterator is active.

Rehashing breaks naive iteration: if an iterator points at an entry in d->ht[0] and that entry moves to d->ht[1], the iterator cannot tell which entries were already visited. You may visit some entries twice or skip others. A safe iterator guarantees each entry is visited exactly once (and you must not insert or delete during iteration).