redis專屬鏈表ziplist的使用
問題拋出
用過 Python 的列表嗎?就是那種可以存儲(chǔ)任意類型數(shù)據(jù)的,支持隨機(jī)讀取的數(shù)據(jù)結(jié)構(gòu)。
沒有用過的話那就沒辦法了。
本質(zhì)上這種列表可以使用數(shù)組、鏈表作為其底層結(jié)構(gòu),不知道Python中的列表是以什么作為底層結(jié)構(gòu)的。
但是redis的列表既不是用鏈表,也不是用數(shù)組作為其底層實(shí)現(xiàn)的,原因也顯而易見:數(shù)組不方便,弄個(gè)二維的?柔性的?怎么寫?鏈表可以實(shí)現(xiàn),通用鏈表嘛,數(shù)據(jù)域放 void* 就可以實(shí)現(xiàn)列表功能。但是,鏈表的缺點(diǎn)也很明顯,容易造成內(nèi)存碎片。
在這個(gè)大環(huán)境下,秉承著“能省就省”的指導(dǎo)思想,請(qǐng)你設(shè)計(jì)一款數(shù)據(jù)結(jié)構(gòu)。
結(jié)構(gòu)設(shè)計(jì)
這個(gè)圖里要注意,右側(cè)是沒有記錄“當(dāng)前元素的大小”的
這個(gè)圖挺詳細(xì)哈,都省得我對(duì)每一個(gè)字段釋義了,整挺好。
其他話,文件開頭的注釋也講的很清楚了。(ziplist.c)
/* The ziplist is a specially encoded dually linked list that is designed * to be very memory efficient. It stores both strings and integer values, * where integers are encoded as actual integers instead of a series of * characters. It allows push and pop operations on either side of the list * in O(1) time. However, because every operation requires a reallocation of * the memory used by the ziplist, the actual complexity is related to the * amount of memory used by the ziplist. * * ---------------------------------------------------------------------------- * * ZIPLIST OVERALL LAYOUT * ====================== * * The general layout of the ziplist is as follows: * * <zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend> * * NOTE: all fields are stored in little endian, if not specified otherwise. * * <uint32_t zlbytes> is an unsigned integer to hold the number of bytes that * the ziplist occupies, including the four bytes of the zlbytes field itself. * This value needs to be stored to be able to resize the entire structure * without the need to traverse it first. * * <uint32_t zltail> is the offset to the last entry in the list. This allows * a pop operation on the far side of the list without the need for full * traversal. * * <uint16_t zllen> is the number of entries. When there are more than * 2^16-2 entries, this value is set to 2^16-1 and we need to traverse the * entire list to know how many items it holds. * * <uint8_t zlend> is a special entry representing the end of the ziplist. * Is encoded as a single byte equal to 255. No other normal entry starts * with a byte set to the value of 255. * * ZIPLIST ENTRIES * =============== * * Every entry in the ziplist is prefixed by metadata that contains two pieces * of information. First, the length of the previous entry is stored to be * able to traverse the list from back to front. Second, the entry encoding is * provided. It represents the entry type, integer or string, and in the case * of strings it also represents the length of the string payload. * So a complete entry is stored like this: * * <prevlen> <encoding> <entry-data> * * Sometimes the encoding represents the entry itself, like for small integers * as we'll see later. In such a case the <entry-data> part is missing, and we * could have just: * * <prevlen> <encoding> * * The length of the previous entry, <prevlen>, is encoded in the following way: * If this length is smaller than 254 bytes, it will only consume a single * byte representing the length as an unsinged 8 bit integer. When the length * is greater than or equal to 254, it will consume 5 bytes. The first byte is * set to 254 (FE) to indicate a larger value is following. The remaining 4 * bytes take the length of the previous entry as value. * * So practically an entry is encoded in the following way: * * <prevlen from 0 to 253> <encoding> <entry> * * Or alternatively if the previous entry length is greater than 253 bytes * the following encoding is used: * * 0xFE <4 bytes unsigned little endian prevlen> <encoding> <entry> * * The encoding field of the entry depends on the content of the * entry. When the entry is a string, the first 2 bits of the encoding first * byte will hold the type of encoding used to store the length of the string, * followed by the actual length of the string. When the entry is an integer * the first 2 bits are both set to 1. The following 2 bits are used to specify * what kind of integer will be stored after this header. An overview of the * different types and encodings is as follows. The first byte is always enough * to determine the kind of entry. * * |00pppppp| - 1 byte * String value with length less than or equal to 63 bytes (6 bits). * "pppppp" represents the unsigned 6 bit length. * |01pppppp|qqqqqqqq| - 2 bytes * String value with length less than or equal to 16383 bytes (14 bits). * IMPORTANT: The 14 bit number is stored in big endian. * |10000000|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt| - 5 bytes * String value with length greater than or equal to 16384 bytes. * Only the 4 bytes following the first byte represents the length * up to 2^32-1. The 6 lower bits of the first byte are not used and * are set to zero. * IMPORTANT: The 32 bit number is stored in big endian. * |11000000| - 3 bytes * Integer encoded as int16_t (2 bytes). * |11010000| - 5 bytes * Integer encoded as int32_t (4 bytes). * |11100000| - 9 bytes * Integer encoded as int64_t (8 bytes). * |11110000| - 4 bytes * Integer encoded as 24 bit signed (3 bytes). * |11111110| - 2 bytes * Integer encoded as 8 bit signed (1 byte). * |1111xxxx| - (with xxxx between 0000 and 1101) immediate 4 bit integer. * Unsigned integer from 0 to 12. The encoded value is actually from * 1 to 13 because 0000 and 1111 can not be used, so 1 should be * subtracted from the encoded 4 bit value to obtain the right value. * |11111111| - End of ziplist special entry. * * Like for the ziplist header, all the integers are represented in little * endian byte order, even when this code is compiled in big endian systems. * * EXAMPLES OF ACTUAL ZIPLISTS * =========================== * * The following is a ziplist containing the two elements representing * the strings "2" and "5". It is composed of 15 bytes, that we visually * split into sections: * * [0f 00 00 00] [0c 00 00 00] [02 00] [00 f3] [02 f6] [ff] * | | | | | | * zlbytes zltail entries "2" "5" end * * The first 4 bytes represent the number 15, that is the number of bytes * the whole ziplist is composed of. The second 4 bytes are the offset * at which the last ziplist entry is found, that is 12, in fact the * last entry, that is "5", is at offset 12 inside the ziplist. * The next 16 bit integer represents the number of elements inside the * ziplist, its value is 2 since there are just two elements inside. * Finally "00 f3" is the first entry representing the number 2. It is * composed of the previous entry length, which is zero because this is * our first entry, and the byte F3 which corresponds to the encoding * |1111xxxx| with xxxx between 0001 and 1101. We need to remove the "F" * higher order bits 1111, and subtract 1 from the "3", so the entry value * is "2". The next entry has a prevlen of 02, since the first entry is * composed of exactly two bytes. The entry itself, F6, is encoded exactly * like the first entry, and 6-1 = 5, so the value of the entry is 5. * Finally the special entry FF signals the end of the ziplist. * * Adding another element to the above string with the value "Hello World" * allows us to show how the ziplist encodes small strings. We'll just show * the hex dump of the entry itself. Imagine the bytes as following the * entry that stores "5" in the ziplist above: * * [02] [0b] [48 65 6c 6c 6f 20 57 6f 72 6c 64] * * The first byte, 02, is the length of the previous entry. The next * byte represents the encoding in the pattern |00pppppp| that means * that the entry is a string of length <pppppp>, so 0B means that * an 11 bytes string follows. From the third byte (48) to the last (64) * there are just the ASCII characters for "Hello World". * * ---------------------------------------------------------------------------- * * Copyright (c) 2009-2012, Pieter Noordhuis <pcnoordhuis at gmail dot com> * Copyright (c) 2009-2017, Salvatore Sanfilippo <antirez at gmail dot com> * All rights reserved. */
看完了么?接下來就是基操階段了,對(duì)于任何一種數(shù)據(jù)結(jié)構(gòu),基操無非增刪查改。
實(shí)際節(jié)點(diǎn)
typedef struct zlentry { unsigned int prevrawlensize; /* Bytes used to encode the previous entry len*/ unsigned int prevrawlen; /* Previous entry len. */ unsigned int lensize; /* Bytes used to encode this entry type/len. For example strings have a 1, 2 or 5 bytes header. Integers always use a single byte.*/ unsigned int len; /* Bytes used to represent the actual entry. For strings this is just the string length while for integers it is 1, 2, 3, 4, 8 or 0 (for 4 bit immediate) depending on the number range. */ unsigned int headersize; /* prevrawlensize + lensize. */ unsigned char encoding; /* Set to ZIP_STR_* or ZIP_INT_* depending on the entry encoding. However for 4 bits immediate integers this can assume a range of values and must be range-checked. */ unsigned char *p; /* Pointer to the very start of the entry, that is, this points to prev-entry-len field. */ } zlentry;
基本操作
我覺得這張圖還是要再擺一下:
這個(gè)圖里要注意,右側(cè)是沒有記錄“當(dāng)前元素的大小”的
增
真實(shí)插入的是這個(gè)函數(shù):
講真,頭皮有點(diǎn)發(fā)麻。那么我們等下還是用老套路,按步驟拆開來看。
/* Insert item at "p". */ unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) { size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen; unsigned int prevlensize, prevlen = 0; size_t offset; int nextdiff = 0; unsigned char encoding = 0; long long value = 123456789; /* initialized to avoid warning. Using a value that is easy to see if for some reason we use it uninitialized. */ zlentry tail; /* Find out prevlen for the entry that is inserted. */ if (p[0] != ZIP_END) { ZIP_DECODE_PREVLEN(p, prevlensize, prevlen); } else { unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl); if (ptail[0] != ZIP_END) { prevlen = zipRawEntryLength(ptail); } } /* See if the entry can be encoded */ if (zipTryEncoding(s,slen,&value,&encoding)) { /* 'encoding' is set to the appropriate integer encoding */ reqlen = zipIntSize(encoding); } else { /* 'encoding' is untouched, however zipStoreEntryEncoding will use the * string length to figure out how to encode it. */ reqlen = slen; } /* We need space for both the length of the previous entry and * the length of the payload. */ reqlen += zipStorePrevEntryLength(NULL,prevlen); reqlen += zipStoreEntryEncoding(NULL,encoding,slen); /* When the insert position is not equal to the tail, we need to * make sure that the next entry can hold this entry's length in * its prevlen field. */ int forcelarge = 0; nextdiff = (p[0] != ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : 0; if (nextdiff == -4 && reqlen < 4) { nextdiff = 0; forcelarge = 1; } /* Store offset because a realloc may change the address of zl. */ offset = p-zl; zl = ziplistResize(zl,curlen+reqlen+nextdiff); p = zl+offset; /* Apply memory move when necessary and update tail offset. */ if (p[0] != ZIP_END) { /* Subtract one because of the ZIP_END bytes */ memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff); /* Encode this entry's raw length in the next entry. */ if (forcelarge) zipStorePrevEntryLengthLarge(p+reqlen,reqlen); else zipStorePrevEntryLength(p+reqlen,reqlen); /* Update offset for tail */ ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen); /* When the tail contains more than one entry, we need to take * "nextdiff" in account as well. Otherwise, a change in the * size of prevlen doesn't have an effect on the *tail* offset. */ zipEntry(p+reqlen, &tail); if (p[reqlen+tail.headersize+tail.len] != ZIP_END) { ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff); } } else { /* This element will be the new tail. */ ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl); } /* When nextdiff != 0, the raw length of the next entry has changed, so * we need to cascade the update throughout the ziplist */ if (nextdiff != 0) { offset = p-zl; zl = __ziplistCascadeUpdate(zl,p+reqlen); p = zl+offset; } /* Write the entry */ p += zipStorePrevEntryLength(p,prevlen); p += zipStoreEntryEncoding(p,encoding,slen); if (ZIP_IS_STR(encoding)) { memcpy(p,s,slen); } else { zipSaveInteger(p,value,encoding); } ZIPLIST_INCR_LENGTH(zl,1); return zl; }
對(duì)“鏈表”插入數(shù)據(jù)有幾個(gè)步驟?
1、偏移
2、插進(jìn)去
3、縫合
那這個(gè)“列表”,比較特殊一點(diǎn),特殊在哪里?特殊在它比較緊湊,而且數(shù)據(jù)類型,其實(shí)也就兩種,要么integer,要么string。所以它的步驟是?
1、數(shù)據(jù)重新編碼
2、解析數(shù)據(jù)并分配空間
3、接入數(shù)據(jù)
重新編碼
什么是重新編碼?插入一個(gè)元素,是不是需要對(duì):“前一個(gè)元素的大小、本身大小、當(dāng)前元素編碼” 這些數(shù)據(jù)進(jìn)行一個(gè)統(tǒng)計(jì),然后一并插入。就編這個(gè)。
插入位置無非三個(gè),頭中尾。
頭:前一個(gè)元素大小為0,因?yàn)榍懊鏇]有元素。
中:待插入位置后一個(gè)元素記錄的“前一個(gè)元素大小”,當(dāng)然,之后本身大小就成為了后一個(gè)元素眼中的“前一個(gè)元素大小”。
尾:那就要把三個(gè)字段加起來了。
具體怎么重新編碼就不看了吧,這篇本來就已經(jīng)很長了。
解析數(shù)據(jù)
再往下就是解析數(shù)據(jù)了。
首先嘗試將數(shù)據(jù)解析為整數(shù),如果可以解析,就按照壓縮列表整數(shù)類型編碼存儲(chǔ);如果解析失敗,就按照壓縮列表字節(jié)數(shù)組類型編碼存儲(chǔ)。
解析之后,數(shù)值存儲(chǔ)在 value 中,編碼格式存儲(chǔ)在 encoding中。如果解析成功,還要計(jì)算整數(shù)所占字節(jié)數(shù)。變量 reqlen 存儲(chǔ)當(dāng)前元素所需空間大小,再累加其他兩個(gè)字段的空間大小,就是本節(jié)點(diǎn)所需空間大小了。
重新分配空間
看注釋這架勢(shì),咋滴,還存在沒地方給它塞?
來我們看看。
這里的分配空間不是簡(jiǎn)單的就新插進(jìn)來的數(shù)據(jù)多少空間就分配多少,如果沒有仔細(xì)閱讀上面那段英文的話,嗯,可以選擇繞回去仔細(xì)閱讀一下那個(gè)節(jié)點(diǎn)組成。特別是那個(gè):
/* * The length of the previous entry, <prevlen>, is encoded in the following way: * If this length is smaller than 254 bytes, it will only consume a single * byte representing the length as an unsinged 8 bit integer. When the length * is greater than or equal to 254, it will consume 5 bytes. The first byte is * set to 254 (FE) to indicate a larger value is following. The remaining 4 * bytes take the length of the previous entry as value. */
所以這個(gè) previous 就是個(gè)不確定因素。有可能人家本來是 1 1 排列的,中間插進(jìn)來一個(gè)之后變成 1 1 5 排列了;也有可能人家是1 5 排列的、5 1 排列的,總之就是不確定。
所以,在 entryX 的位置插入一個(gè)數(shù)據(jù)之后,entryX+1 的 previous 可能不變,可能加四,也可能減四,誰也說不準(zhǔn)。說不準(zhǔn)那不就得測(cè)一下嘛。所以就測(cè)一下,僅此而已。
接入數(shù)據(jù)
數(shù)據(jù)怎么接入?鑒于這里真心不是鏈表,是列表。
所以,按數(shù)組那一套來。對(duì)。
很麻煩吧。其實(shí)不麻煩,你在redis里見過它給你中間插入的機(jī)會(huì)了嗎?更不要說頭插了,你見過它給你頭插的機(jī)會(huì)了嗎?
插個(gè)題外話:大數(shù)據(jù)插入時(shí),數(shù)組不一定輸給鏈表。在尾插的時(shí)候,數(shù)組的優(yōu)勢(shì)是遠(yuǎn)超鏈表的(當(dāng)然,僅限于尾插)。在我兩個(gè)月前的博客里有做過這一系列的實(shí)驗(yàn)。
刪就不寫了吧,增的逆操作,從系列開始就沒寫過刪。不過這里刪就不可避免的大量數(shù)據(jù)進(jìn)行復(fù)制了(如果不真刪,只是做個(gè)刪除標(biāo)志呢?這樣會(huì)省時(shí)間,但是時(shí)候會(huì)造成內(nèi)存碎片化。不過可以設(shè)計(jì)一個(gè)定期調(diào)整內(nèi)存的函數(shù),比方說重用三分之一的塊之后緊湊一下?內(nèi)存不夠用的時(shí)候緊湊一下?STL就是這么干的)。
查也沒啥好講的了吧,這個(gè)數(shù)據(jù)結(jié)構(gòu)的應(yīng)用場(chǎng)景一般就是對(duì)鍵進(jìn)行檢索,這里就是個(gè)值,不一樣的是這個(gè)值是一串的。
所以除了提供原有的前后向遍歷之外,還提供了 range 查詢,不難的。
到此這篇關(guān)于redis專屬鏈表ziplist的使用的文章就介紹到這了,更多相關(guān)redis專屬鏈表ziplist內(nèi)容請(qǐng)搜索本站以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持本站!
版權(quán)聲明:本站文章來源標(biāo)注為YINGSOO的內(nèi)容版權(quán)均為本站所有,歡迎引用、轉(zhuǎn)載,請(qǐng)保持原文完整并注明來源及原文鏈接。禁止復(fù)制或仿造本網(wǎng)站,禁止在非www.sddonglingsh.com所屬的服務(wù)器上建立鏡像,否則將依法追究法律責(zé)任。本站部分內(nèi)容來源于網(wǎng)友推薦、互聯(lián)網(wǎng)收集整理而來,僅供學(xué)習(xí)參考,不代表本站立場(chǎng),如有內(nèi)容涉嫌侵權(quán),請(qǐng)聯(lián)系alex-e#qq.com處理。