RocksDB源码 - Cuckoo Hashing

Cuckoo Hashing

Cuckoo Hashing是一种用”布谷鸟”的方法解决哈希冲突的算法,其最坏的查找时间是O(1)。
Cuckoo指的是“布谷鸟”,为什么是“布谷鸟”呢?因为有一种布谷鸟会把蛋生在别人的鸟巢里,然后把别人家的蛋推走。Cuckoo Hashing使用类似的方法来解决哈希冲突,如果在插入数据时发现冲突了,就将冲突位置上的数据搬走。

算法

丹麦的两位教授在2001年首先发表论文阐述了该算法。paper
论文中一段代码阐述了算法的核心部分:

插入x时,如果找到就返回;否则进入循环,循环次数最大为MaxLoop。
在循环里,如果T1表中x的位置为空,插入x到T1表返回;否则,将x和当前位置上的指交换;
此时x变成被挤出来的值,尝试同样方法插入到T2;直到找到空余的位置。
如下图,x插入T1后,y被推到T2表,又导致了z从T2被推出来,直到z在T1找到了空的位置。


1
2
3
4
5
6
7
8
9
10
procedure insert(x)
if lookup(x) then return
loop MaxLoop times
if T1[h1(x)] == null then { T1[h1(x)] = x; return}
swap(x, T1[h1(x)])
if T2[h2(x)] == null then { T2[h2(x)] = x; return}
swap(x, T2[h2(x)])
end loop
rehash(); insert(x)
end

实现

RocksDB把HashCuckoo用作Memtable的一个可选算法,其他算法还包括SkipList, HashLinkList, HashSkipList。只有HashCuckoo的最坏查找时间是O(1),在某些使用上有一定的优势。
下面分析Memtable主要的两个接口Get和Insert。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
void HashCuckooRep::Get(const LookupKey& key, void* callback_args,
bool (*callback_func)(void* arg, const char* entry)) {
Slice user_key = key.user_key();
for (unsigned int hid = 0; hid < hash_function_count_; ++hid) {
const char* bucket =
cuckoo_array_[GetHash(user_key, hid)].load(std::memory_order_acquire);
if (bucket != nullptr) {
Slice bucket_user_key = UserKey(bucket);
if (user_key == bucket_user_key) {
callback_func(callback_args, bucket);
break;
}
} else {
// as Put() always stores at the vacant bucket located by the
// hash function with the smallest possible id, when we first
// find a vacant bucket in Get(), that means a miss.
break;
}
}
MemTableRep* backup_table = backup_table_.get();
if (backup_table != nullptr) {
backup_table->Get(key, callback_args, callback_func);
}
}

HashCuckoo使用了10个(hash_functioncount)哈希函数来解决冲突。
搜索key键值,依次遍历这个10个哈希函数,依次判断每一个哈希结果的位置是否为空;如果不为空,判断该位置键值是否与key相同,相同则返回;不同则继续下一次哈希。如果位置为空则退出循环,因为HashCuckoo的插入是从最低的哈希函数开始寻找位置的,如果空说明该键值不存在。退出循环在backup哈希表里进行查找。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
void HashCuckooRep::Insert(KeyHandle handle) {
static const float kMaxFullness = 0.90f;
auto* key = static_cast<char*>(handle);
int initial_hash_id = 0;
size_t cuckoo_path_length = 0;
auto user_key = UserKey(key);
// find cuckoo path
if (FindCuckooPath(key, user_key, cuckoo_path_, &cuckoo_path_length,
initial_hash_id) == false) {
// if true, then we can't find a vacant bucket for this key even we
// have used up all the hash functions. Then use a backup memtable to
// store such key, which will further make this mem-table become
// immutable.
if (backup_table_.get() == nullptr) {
VectorRepFactory factory(10);
backup_table_.reset(
factory.CreateMemTableRep(compare_, allocator_, nullptr, nullptr));
is_nearly_full_ = true;
}
backup_table_->Insert(key);
return;
}
// when reaching this point, means the insert can be done successfully.
occupied_count_++;
if (occupied_count_ >= bucket_count_ * kMaxFullness) {
is_nearly_full_ = true;
}
// perform kickout process if the length of cuckoo path > 1.
if (cuckoo_path_length == 0) return;
// the cuckoo path stores the kickout path in reverse order.
// so the kickout or displacement is actually performed
// in reverse order, which avoids false-negatives on read
// by moving each key involved in the cuckoo path to the new
// location before replacing it.
for (size_t i = 1; i < cuckoo_path_length; ++i) {
int kicked_out_bid = cuckoo_path_[i - 1];
int current_bid = cuckoo_path_[i];
// since we only allow one writer at a time, it is safe to do relaxed read.
cuckoo_array_[kicked_out_bid]
.store(cuckoo_array_[current_bid].load(std::memory_order_relaxed),
std::memory_order_release);
}
int insert_key_bid = cuckoo_path_[cuckoo_path_length - 1];
cuckoo_array_[insert_key_bid].store(key, std::memory_order_release);
}

插入流程相对查找流程较为复杂,前面的论文中以两个哈希函数来解决冲突,RocksDB使用10个哈希函数,所以插入时产生冲突了会出现树形结构的遍历,每个树节点都有10个子节点。FindCuckooPath函数就是使用BFS广度优先搜索在树形结构遍历中查找到空的位置,并记录下查找路径需要推出的位置cuckoopath
FindCuckooPath的实现在下面进行分析。如果FindCuckooPath查找不到空的位置,就尝试在backup_table里面进行插入。如果FindCuckooPath返回true,后面就根据FindCuckooPath填充的cuckoopath依次将路径上的键值往后一个位置移动,最后插入新key。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
bool HashCuckooRep::FindCuckooPath(const char* internal_key,
const Slice& user_key, int* cuckoo_path,
size_t* cuckoo_path_length,
const int initial_hash_id) {
int bucket_ids[HashCuckooRepFactory::kMaxHashCount];
*cuckoo_path_length = 0;
if (QuickInsert(internal_key, user_key, bucket_ids, initial_hash_id)) {
return true;
}
// If this step is reached, then it means:
// 1. no vacant bucket in any of the possible locations of the input key.
// 2. none of the possible locations of the input key has the same user
// key as the input `internal_key`.
// the front and back indices for the step_queue_
step_buffer_.reset();
for (unsigned int hid = initial_hash_id; hid < hash_function_count_; ++hid) {
/// CuckooStep& current_step = step_queue_[front_pos++];
CuckooStep& current_step = step_buffer_.NextWriteBuffer();
current_step.bucket_id_ = bucket_ids[hid];
current_step.prev_step_id_ = CuckooStep::kNullStep;
current_step.depth_ = 1;
}
while (step_buffer_.HasNewWrite()) {
int step_id = step_buffer_.read_index_;
const CuckooStep& step = step_buffer_.ReadNext();
// Since it's a BFS process, then the first step with its depth deeper
// than the maximum allowed depth indicates all the remaining steps
// in the step buffer queue will all exceed the maximum depth.
// Return false immediately indicating we can't find a vacant bucket
// for the input key before the maximum allowed depth.
if (step.depth_ >= cuckoo_path_max_depth_) {
return false;
}
// again, we can perform no barrier load safely here as the current
// thread is the only writer.
Slice bucket_user_key =
UserKey(cuckoo_array_[step.bucket_id_].load(std::memory_order_relaxed));
if (step.prev_step_id_ != CuckooStep::kNullStep) {
if (bucket_user_key == user_key) {
// then there is a loop in the current path, stop discovering this path.
continue;
}
}
// if the current bucket stores at its nth location, then we only consider
// its mth location where m > n. This property makes sure that all reads
// will not miss if we do have data associated to the query key.
//
// The n and m in the above statement is the start_hid and hid in the code.
unsigned int start_hid = hash_function_count_;
for (unsigned int hid = 0; hid < hash_function_count_; ++hid) {
bucket_ids[hid] = GetHash(bucket_user_key, hid);
if (step.bucket_id_ == bucket_ids[hid]) {
start_hid = hid;
}
}
// must found a bucket which is its current "home".
assert(start_hid != hash_function_count_);
// explore all possible next steps from the current step.
for (unsigned int hid = start_hid + 1; hid < hash_function_count_; ++hid) {
CuckooStep& next_step = step_buffer_.NextWriteBuffer();
next_step.bucket_id_ = bucket_ids[hid];
next_step.prev_step_id_ = step_id;
next_step.depth_ = step.depth_ + 1;
// once a vacant bucket is found, trace back all its previous steps
// to generate a cuckoo path.
if (cuckoo_array_[next_step.bucket_id_].load(std::memory_order_relaxed) ==
nullptr) {
// store the last step in the cuckoo path. Note that cuckoo_path
// stores steps in reverse order. This allows us to move keys along
// the cuckoo path by storing each key to the new place first before
// removing it from the old place. This property ensures reads will
// not missed due to moving keys along the cuckoo path.
cuckoo_path[(*cuckoo_path_length)++] = next_step.bucket_id_;
int depth;
for (depth = step.depth_; depth > 0 && step_id != CuckooStep::kNullStep;
depth--) {
const CuckooStep& prev_step = step_buffer_.steps_[step_id];
cuckoo_path[(*cuckoo_path_length)++] = prev_step.bucket_id_;
step_id = prev_step.prev_step_id_;
}
assert(depth == 0 && step_id == CuckooStep::kNullStep);
return true;
}
if (step_buffer_.IsFull()) {
// if true, then it reaches maxinum number of cuckoo search steps.
return false;
}
}
}
// tried all possible paths but still not unable to find a cuckoo path
// which path leads to a vacant bucket.
return false;
}

QuickInsert快速的搜索key对应的10个哈希位置是否有位置可以插入。
如果没有查找到,那么就要开始使用BFS算法进行搜索。其中CuckooStep用来记录可以回退的步骤,stepbuffer则保存了BFS搜索的队列。

1
for (unsigned int hid = initial_hash_id; hid < hash_function_count_; ++hid)

将搜索key的10个哈希位置都入到队列,后面While循环进入BFS搜索,每次从队列头取出一个步骤,如果步骤的深度已经超过cuckoo_path_maxdepth返回。如果该位置的key和搜索的key相同,则出现了循环,跳过该次搜索。以上条件都不满足,将改步骤的其他哈希位置(由于cuckooHash总是从最低位的哈希函数开始插入,我们只需要从大于当前哈希的函数的位置开始搜索)入到队列。
一旦找到了一个空的哈希位置,从stepbuffer中回溯出搜索的路径存到cuckoo_path。