Redis扫描命令你了解多少？

terry 2年前 (2023-09-26) 阅读数 79 #数据库

Redis 是单线程的。因此，在使用一些时间复杂度为 O(N) 的命令时要非常小心。您可能会意外阻止该进程，导致卡顿出现在 Redis 中。

有时我们需要使用一些满足条件的命令，例如删除以test_开头的键。那么如何获得这些密钥呢？在Redis 2.8版本之前，我们可以使用keys命令根据正常匹配来获取我们需要的key。但这个命令有两个缺点：

没有限制，我们只能同时检索所有合适的键。如果有数百万个结果，那么您就有一个“无限”的集合在等着您。
keys命令是一种遍历算法，时间复杂度为O(N)。正如我们刚才所说，这个命令很容易让Redis为卡顿服务。因此，我们应该避免在生产环境中使用该命令。

满足需求和现有的Redis卡顿如何选择？面对这种困境，Redis在2.8版本中为我们提供了一个解决方案——scan命令。

与keys命令相比，scan命令有两个明显的优点：

虽然scan命令的时间复杂度也是O(N)，但它是批量执行的，不会阻塞线程。
scan命令提供了一个limit参数，可以控制每次返回结果的最大数量。

这两个优点帮助我们解决了上述问题，但是scan命令并不完美。它返回的结果可能是重复的，因此客户端必须删除重复的结果。为什么会重复，相信看完本文你就会得到答案。

scan命令的基本使用，可以查看Redis命令详解：keys一文中对SCAN命令的介绍。

今天我们主要从基本结构和源码方面来讨论scanning是如何工作的。

Redis 结构

Redis 采用哈希表作为基本实现，效率高，实现简单。说起哈希表，很多Java程序员的第一反应就是HashMap。没错，Redis的主键存储结构是类似HashMap的字符串+链表结构。第一维数组的大小为2n(n>=0)。每次扩展数组时，数组的长度都会增加一倍。命令

scan 遍历这个一维数组。每次返回的指针值也是该数组的索引。 limit参数表示遍历数组的多少个元素，并返回附加到这些元素的所有符合条件的结果。由于每个元素附加的链表大小不同，因此每次返回的结果数量也不同。

SCAN遍历顺序

至于scan命令的遍历顺序，我们可以用一个小栗子来仔细看看。

127.0.0.1:6379> keys *
1) "db_number"
2) "key1"
3) "myKey"
127.0.0.1:6379> scan 0 MATCH * COUNT 1
1) "2"
2) 1) "db_number"
127.0.0.1:6379> scan 2 MATCH * COUNT 1
1) "1"
2) 1) "myKey"
127.0.0.1:6379> scan 1 MATCH * COUNT 1
1) "3"
2) 1) "key1"
127.0.0.1:6379> scan 3 MATCH * COUNT 1
1) "0"
2) (empty list or set)
复制代码

在Redis中我们有3个键，每次我们只遍历一维数组中的元素。如上图，SCAN命令的遍历顺序是

0->2->1->3

这个顺序看起来有点奇怪。如果我们把它转换成二进制就更容易理解了。

00->10->01->11

我们发现这个序列的高位每次加1，普通的二进制加法是从右到左带进位。并且这个序列是从左到右添加和传输的。 redis源码也证实了这一点。

在文件dict.c的dictScan函数中对指针进行如下处理

v = rev(v);
v++;
v = rev(v);
复制代码

即指针反转，加一，然后再次反转，称为“指针加1”的操作高位”。

这里你可能会有疑问为什么我们要使用这种遍历顺序而不是通常的0,1,2...这是因为我们需要在遍历的情况下考虑字典的扩展和收缩（我有佩服开发者处理问题的全面性）。

我们看看SCAN遍历过程中发生扩容时遍历会怎样，我们原来的数组有4个元素，也就是说索引是2位数字。目前，我们需要将其扩展为3位，并再次重新格式化。

原来放在xx下的所有元素都被分配为0xx和1xx。上图中，当我们即将跨越10时，又重复了该语句。此时，扫描命令会从010、000、100开始交叉（原来00附加的元素），不会再交叉。

我们来看看宫缩情况。假设字典从 3 位数字缩小到 2 位数字。当要遍历110时，dict收缩，扫描会遍历10，此时放置在010以下的元素会被多次遍历，但010之前的元素不会被多次遍历。因此，在缩减过程中，仍然可能会出现一些重复的元素。

Redis重生

重复是一个相对复杂的过程。为了不阻塞Redis进程，它采用了渐进式重新处理机制。

/* 字典 */
typedef struct dict {
    // 类型特定函数
    dictType *type;
    // 私有数据
    void *privdata;
    // 哈希表
    dictht ht[2];
    // rehash 索引
    // 当 rehash 不在进行时，值为 -1
    int rehashidx; /* rehashing not in progress if rehashidx == -1 */
    // 目前正在运行的安全迭代器的数量
    int iterators; /* number of iterators currently running */
} dict;
复制代码

Redis字典结构中有两张哈希表，一张新表和一张旧表。在重新处理过程中，redis逐渐将旧表中的元素迁移到新表中。接下来我们看一下dict变换操作的源码。

/* Performs N steps of incremental rehashing. Returns 1 if there are still
 * keys to move from the old to the new hash table, otherwise 0 is returned.
 *
 * Note that a rehashing step consists in moving a bucket (that may have more
 * than one key as we use chaining) from the old to the new hash table, however
 * since part of the hash table may be composed of empty spaces, it is not
 * guaranteed that this function will rehash even a single bucket, since it
 * will visit at max N*10 empty buckets in total, otherwise the amount of
 * work it does would be unbound and the function may block for a long time. */
int dictRehash(dict *d, int n) {
    int empty_visits = n*10; /* Max number of empty buckets to visit. */
    if (!dictIsRehashing(d)) return 0;

    while(n-- && d->ht[0].used != 0) {
        dictEntry *de, *nextde;

        /* Note that rehashidx can't overflow as we are sure there are more
         * elements because ht[0].used != 0 */
        assert(d->ht[0].size > (unsigned long)d->rehashidx);
        while(d->ht[0].table[d->rehashidx] == NULL) {
            d->rehashidx++;
            if (--empty_visits == 0) return 1;
        }
        de = d->ht[0].table[d->rehashidx];
        /* Move all the keys in this bucket from the old to the new hash HT */
        while(de) {
            uint64_t h;

            nextde = de->next;
            /* Get the index in the new hash table */
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;
            d->ht[0].used--;
            d->ht[1].used++;
            de = nextde;
        }
        d->ht[0].table[d->rehashidx] = NULL;
        d->rehashidx++;
    }

    /* Check if we already rehashed the whole table... */
    if (d->ht[0].used == 0) {
        zfree(d->ht[0].table);
        d->ht[0] = d->ht[1];
        _dictReset(&d->ht[1]);
        d->rehashidx = -1;
        return 0;
    }

    /* More to rehash... */
    return 1;
}
复制代码

从评论中我们可以看出，再处理过程是以桶为基本单位进行迁移的。所谓桶，其实就是我们前面提到的一维数组的一个元素。一次迁移一个列表。我们来解释一下这段代码。