Redis数据结构源码解析(三) 跳跃表

概念

跳跃表类似一个多层的链表，首先从最高层开始查找，如果下一个节点的值大于要查找的值或者下一个节点为null,则往下一层查找。通过空间换时间的策略，将时间复杂度控制在O(logn)。

图例

avatar

例如查找51这个数

首先从第一层开始查找，找到第二个节点，发现后面为null。

从第二层查找查找到第四个节点，发现后面的节点为61，大于当前的数。

从第三层查找查找到第六个节点结束一共查找四次，比遍历一次少了两次。数据量大的情况下，这个性能会提升的很明显。

核心概念

表头：负责维护跳跃表的节点指针。
跳跃表节点：保存着元素值，以及多个层。
层：保存着指向其他元素的指针。高层的指针越过的元素数量大于等于低层的指针，为了提高查找的效率，程序总是从高层先开始访问，然后随着元素值范围的缩小，慢慢降低层次。
表尾：全部由 NULL 组成，表示跳跃表的末尾。

基本结构

zskiplistNode

typedef struct zskiplistNode {
    // 内容
    sds ele;
    // 分值
    double score;
    // 后退指针，指向当前节点底层 前一个节点
    struct zskiplistNode *backward;
    struct zskiplistLevel {
        struct zskiplistNode *forward; // 指向当前层的前一个节点
        unsigned long span;		// forward 指向前一个节点的与当前节点的间距
    } level[]; // level是一个数组
} zskiplistNode;

zskiplist

typedef struct zskiplist {  
    struct zskiplistNode *header, *tail; // 分别指向头结点和尾结点
    unsigned long length; // 跳跃表总长度
    int level; // 跳跃表总高度
} zskiplist;

其中，头节点是跳跃表的一个特殊节点，它的level数组元素个数为64。头节点在有序集合中不存储任何member和score值，ele值为NULL, score值为0；也不计入跳跃表的总长度。头节点在初始化时，64个元素的forward都指向NULL, span值都为0。

创建跳跃表

zslCreate

zskiplist *zslCreate(void) {
    int j;
    zskiplist *zsl;

    zsl = zmalloc(sizeof(*zsl)); // 申请内存
    zsl->level = 1; // 初创建 级数为1
    zsl->length = 0; // 初创建 长度为1
    zsl->header = zslCreateNode(ZSKIPLIST_MAXLEVEL,0,NULL); // 创建一个默认的头结点
    // 此处循环了32次
    for (j = 0; j < ZSKIPLIST_MAXLEVEL; j++) {
        zsl->header->level[j].forward = NULL;
        zsl->header->level[j].span = 0;
    }
    zsl->header->backward = NULL;
    zsl->tail = NULL;
    return zsl;
}

ZSKIPLIST_MAXLEVEL 跳跃表最大等级(默认值为32)，可以容纳2^64个元素
为跳跃表创建了一个初始的头节点，并设置了一个32层高的索引

创建表节点

zslCreateNode

/* Create a skiplist node with the specified number of levels.
 * The SDS string 'ele' is referenced by the node after the call. */
zskiplistNode *zslCreateNode(int level, double score, sds ele) {
    zskiplistNode *zn =
        zmalloc(sizeof(*zn)+level*sizeof(struct zskiplistLevel));
    zn->score = score;
    zn->ele = ele;
    return zn;
}

又是先创建节点再赋值

随机层高

zslRandomLevel

int zslRandomLevel(void) {
    int level = 1;
    while ((random()&0xFFFF) < (ZSKIPLIST_P * 0xFFFF))
        level += 1;
    return (level<ZSKIPLIST_MAXLEVEL) ? level : ZSKIPLIST_MAXLEVEL;
}

在首节点直接拉满设置了一个层高的最大值32，但是如果每一个元素的层高都是32跳跃表也就没意义了。这个方法就是确定节点的层高，返回一个1 ~ ZSKIPLIST_MAXLEVEL 的数。

插入节点

插入节点总的来说一共四步

查找插入位置
调整高度
插入节点
调整 backward

zslInsert

zskiplistNode *zslInsert(zskiplist *zsl, double score, sds ele) {  
    zskiplistNode *update[ZSKIPLIST_MAXLEVEL], *x;
    unsigned int rank[ZSKIPLIST_MAXLEVEL];
    int i, level;

    serverAssert(!isnan(score));
    // 查找节点
    x = zsl->header;
    for (i = zsl->level-1; i >= 0; i--) {
        /* store rank that is crossed to reach the insert position */
        rank[i] = i == (zsl->level-1) ? 0 : rank[i+1];
        while (x->level[i].forward &&
                (x->level[i].forward->score < score ||
                    (x->level[i].forward->score == score &&
                    sdscmp(x->level[i].forward->ele,ele) < 0)))
        {
            rank[i] += x->level[i].span;
            x = x->level[i].forward;
        }
        update[i] = x;
    }
  
    // 随机一个层数
    level = zslRandomLevel();
    if (level > zsl->level) {
        for (i = zsl->level; i < level; i++) {
            rank[i] = 0;
            update[i] = zsl->header;
            update[i]->level[i].span = zsl->length;
        }
        zsl->level = level;
    }
    x = zslCreateNode(level,score,ele);
    //插入节点
    for (i = 0; i < level; i++) {
        x->level[i].forward = update[i]->level[i].forward;
        update[i]->level[i].forward = x;

        /* update span covered by update[i] as x is inserted here */
        x->level[i].span = update[i]->level[i].span - (rank[0] - rank[i]);
        update[i]->level[i].span = (rank[0] - rank[i]) + 1;
    }

    /* increment span for untouched levels */
    for (i = level; i < zsl->level; i++) {
        update[i]->level[i].span++;
    }

    x->backward = (update[0] == zsl->header) ? NULL : update[0];
    if (x->level[0].forward)
        x->level[0].forward->backward = x;
    else
        zsl->tail = x;
    zsl->length++;
    return x;
}

删除节点

删除相比较会简单一些，如果插入或者查询时需要先查到删除节点的位置。
删除节点（其实就是一个链表删除元素）
是否为唯一的高节点，如果是则更新 level,不是则无需额外处理

其他

为什么不使用红黑树

引用一下原作者的话

There are a few reasons: They are not very memory intensive. It’s up to you basically. Changing parameters about the probability of a node to have a given number of levels will make then less memory intensive than btrees.
A sorted set is often target of many ZRANGE or ZREVRANGE operations, that is, traversing the skip list as a linked list. With this operation the cache locality of skip lists is at least as good as with other kind of balanced trees.
They are simpler to implement, debug, and so forth. For instance thanks to the skip list simplicity I received a patch (already in Redis master) with augmented skip lists implementing ZRANK in O(log(N)). It required little changes to the code.About the Append Only durability & speed, I don’t think it is a good idea to optimize Redis at cost of more code and more complexity for a use case that IMHO should be rare for the Redis target (fsync() at every command). Almost no one is using this feature even with ACID SQL databases, as the performance hint is big anyway.About threads: our experience shows that Redis is mostly I/O bound. I’m using threads to serve things from Virtual Memory. The long term solution to exploit all the cores, assuming your link is so fast that you can saturate a single core, is running multiple instances of Redis (no locks, almost fully scalable linearly with number of cores), and using the “Redis Cluster” solution that I plan to develop in the future.

简单总结一下：
- 这并不会浪费太多的空间，并且树的高度可以动态调整的。
- ZRANGE 和 ZREVRANGE命令，跳表性能比红黑树好
- 红黑树比较复杂…作者懒得实现

总结

Redis使用跳跃表结构存储有序集合数据,通过概率平衡实现近似平衡p叉书的存取效率。

概念

图例