内存淘汰算法是一个比较重要的算法,经常会被问道:如果让你设计一个LRU算法你会怎么做?尽可能的保持存取的高效。那么就依着这道算法题对LRU进行一个简单的介绍。
题目地址:https://oj.leetcode.com/problems/lru-cache/
1.什么是LRU算法?为什么使用LRU?
LRU即Least Recently Used的缩写,即最近最少使用的意思。我们知道在内存空间是有限的,那么当内存被数据占满的时候,而需要访问的数据又不在内存中,那么就需要将内存中的最少使用的数据淘汰掉,给新加入的数据腾出空间。LRU经常在用于缓存服务器中的数据淘汰,缓存对数据的访问速度要求较高,因此要设计一个高效的算法。
2.题目的要求是这样的:
Design and implement a data structure for Least Recently Used (LRU) cache. It should support the following operations: get
and set
.
get(key)
- Get the value (will always be positive) of the key if the key exists in the cache,
otherwise return -1.
set(key, value)
- Set or insert the value if the key is not already present. When the cache
reached its capacity, it should invalidate the least recently used item before inserting a new item.
即实现一个LRU算法,实现get和set操作。
get(key)--如果key存在于cache中,返回key对应的value值;否则返回-1。
set(key,value)--如果key不在cache中,那么插入或者设置value值。当cache达到存储空间,在插入新的值之前,要删除掉最近最少使用的元素。
3.算法设计:
注意,重要的是高效,高效,再高效。尽可能在插入和删除的时间复杂度为O(1)。那么要怎么设计数据结构才能达到?不着急,先说两个相对低效的实现方案。
3.1:纯map实现(超时)
用纯map实现,key就是key,value为一个结构体,里面放的是value以及time(time就是距离0的时间,key越不用,那么time越小)。
set(key,value):如果长度小于capacity,那么放进去,time置0,同时其他所有的key对应的time减1;如果等于capacity,那么去找最小的time对应的key然后删掉。
get(key):如果获取到,那么将key对应的time值置0,其他的key对应的time值减1。
超时:算法在O(1)(准确来说是O(logn),用unordered_map可以做到O(1))的时间内获取key值是否存在,在get(key)的时候方便,但是同时要对所有的time减1,这是很费时的操作。
Show the Code:
class LRUCache{ private: typedef struct value { int val; int times; }tValue; map<int,tValue> mkv; map<int,tValue> :: iterator it; int capacity; public: LRUCache(int capacity) { this->capacity = capacity; } int get(int key) { if( is_exist(key) ) { mkv[key].times = 0; it = mkv.begin(); while(it!=mkv.end()) { if(it->first == key) { it++; continue; } (it->second).times -= 1; it++; } return mkv[key].val; } else { return -1; } } void set(int key , int value) { if( is_exist(key) ) return ; int mint,mink; if(mkv.size()==capacity) { it = mkv.begin(); mint = (it->second).times; mink = it->first; it++; while(it!=mkv.end()) { if( (it->second).times < mint ) { mint = (it->second).times; mink = it->first; } it++; } mkv.erase(mink); } mkv[key].val = value; mkv[key].times = 0; it = mkv.begin(); while(it!=mkv.end()) { if(it->first == key) { it++; continue; } it->second.times -= 1; it++; } } bool is_exist(int key) { return ( mkv.find(key) != mkv.end() ) ; } void print() { it = mkv.begin(); while(it!=mkv.end()) { cout<<"key="<<it->first<<",value="<<it->second.val<<",times="<<it->second.times<<endl; it++; } cout<<endl; } };
3.2:用单链表实现(单链表实现)
get(int key):
{
1.如果key存在于cache中,那么返回相应的value值。
1.1并且将这个<key,value>对取下来放到链表的第一个结点。
2.如果key不存在在cache中,那么返回-1
}
set(int key,int value)
{
1.如果key值已经存在于链表中,那么return;
2.如果链表的总长度未到capacity,那么将新的节点插入到表头
3.如果链表的总长度等于capacity,那么将最后一个节点删除,然后将新的节点插入到表头
}
超时:链表的顺序及是重要的顺序,尾结点一定是least recently used结点。判断key是否存在要遍历一遍十分费时;找到尾结点要遍历一遍,同样十分费时。
Show the code:
class LRUCache { private: int capacity; int capacity_cur; typedef struct ListNode { int key; int value; struct ListNode* next; struct ListNode* pre; ListNode(int k , int v):key(k),value(v),next(NULL),pre(NULL) {} }ListNode; ListNode *head; typedef struct Value { int val; ListNode *p; Value(int _v,ListNode *pp):val(_v),p(pp){} Value() {} //if this constructor is not defined,there will be wrong }tValue; map< int,tValue > mkv; map< int,tValue > :: iterator it; public: LRUCache(int capacity) { this->capacity = capacity; head = new ListNode(0,0); capacity_cur = 0; } int get(int key) { if( is_exist(key) ) { //get down the loc node ListNode *loc = mkv[key].p; loc->pre->next = loc->next; loc->next->pre = loc->pre; //insert into the front of the head loc->next = head->next; loc->pre = head; head->next = loc; loc->next->pre = loc; return mkv[key].val; } else { return -1; } } void set(int key , int value) { if( is_exist(key) ) { mkv[key].val = value; ListNode *q = mkv[key].p; q->value = value; return ; } ListNode *tmp = new ListNode(key,value); if(capacity_cur<capacity) { if(head->next==NULL) //the list is empty { head->next = tmp ; head->pre = tmp; tmp->next = head; tmp->pre = head; tValue tv(value,tmp); mkv[key] = tv; ++capacity_cur; } else //insert the tmp into the front of the list { tmp->next = head->next; tmp->pre = head; head->next->pre = tmp; head->next = tmp; tValue tv(value,tmp); mkv[key] = tv; ++capacity_cur; } } else { //get rid of the lru node ListNode *tail = head->pre; head->pre = tail->pre; tail->pre->next = head; mkv.erase(tail->key); delete tail; //insert into the new node tmp->next = head->next; tmp->pre = head; head->next->pre = tmp; head->next = tmp; tValue tv(value,tmp); mkv[key] = tv; } } bool is_exist(int key) { return ( mkv.find(key) != mkv.end() ); } void print() { ListNode *p = head->next; while(p!=head) { cout<<"key = "<<p->key<<" Value = "<<p->value<<endl; p = p->next; } cout<<endl; } };
那么如何能够在在O(1)的时间内进行get和set操作呢?其实对于第二种实现只需要我们能在O(1)的时间复杂度内获取元素所在链表中的位置就可以了。因此我们使用map和list结合的数据结构来进行操作。map用于元素的位置,list链表来维持元素的内容和顺序。
3.3:双向链表+map实现(用空间换时间)
双向链表:存放key,value结构体,而且可以在O(1)的时间内找到尾结点,然后快速的将尾结点删除。
map(当然也可以用unordered_map):存放key,value以及key所在链表的指针,目的是能快速判断key是否存在;其次是能在存在的情况下快速找到key对应在链表中的指针,快速将其取下插到尾部。
AC代码:
class LRUCache { private: int capacity; int capacity_cur; typedef struct ListNode { int key; int value; struct ListNode* next; struct ListNode* pre; ListNode(int k , int v):key(k),value(v),next(NULL),pre(NULL) {} }ListNode; ListNode *head; typedef struct Value { int val; ListNode *p; Value(int _v,ListNode *pp):val(_v),p(pp){} Value() {} //if this constructor is not defined,there will be wrong }tValue; map< int,tValue > mkv; map< int,tValue > :: iterator it; public: LRUCache(int capacity) { this->capacity = capacity; head = new ListNode(0,0); capacity_cur = 0; } int get(int key) { if( is_exist(key) ) { //get down the loc node ListNode *loc = mkv[key].p; loc->pre->next = loc->next; loc->next->pre = loc->pre; //insert into the front of the head loc->next = head->next; loc->pre = head; head->next = loc; loc->next->pre = loc; return mkv[key].val; } else { return -1; } } void set(int key , int value) { if( is_exist(key) ) { mkv[key].val = value; ListNode *q = mkv[key].p; q->value = value; return ; } ListNode *tmp = new ListNode(key,value); if(capacity_cur<capacity) { if(head->next==NULL) //the list is empty { head->next = tmp ; head->pre = tmp; tmp->next = head; tmp->pre = head; tValue tv(value,tmp); mkv[key] = tv; ++capacity_cur; } else //insert the tmp into the front of the list { tmp->next = head->next; tmp->pre = head; head->next->pre = tmp; head->next = tmp; tValue tv(value,tmp); mkv[key] = tv; ++capacity_cur; } } else { //get rid of the lru node ListNode *tail = head->pre; head->pre = tail->pre; tail->pre->next = head; mkv.erase(tail->key); delete tail; //insert into the new node tmp->next = head->next; tmp->pre = head; head->next->pre = tmp; head->next = tmp; tValue tv(value,tmp); mkv[key] = tv; } } bool is_exist(int key) { return ( mkv.find(key) != mkv.end() ); } void print() { ListNode *p = head->next; while(p!=head) { cout<<"key = "<<p->key<<" Value = "<<p->value<<endl; p = p->next; } cout<<endl; } };
完整测试代码:
#include<iostream> #include<vector> #include<map> using namespace std; class LRUCache { private: int capacity; int capacity_cur; typedef struct ListNode { int key; int value; struct ListNode* next; struct ListNode* pre; ListNode(int k , int v):key(k),value(v),next(NULL),pre(NULL) {} }ListNode; ListNode *head; typedef struct Value { int val; ListNode *p; Value(int _v,ListNode *pp):val(_v),p(pp){} Value() {} //if this constructor is not defined,there will be wrong }tValue; map< int,tValue > mkv; map< int,tValue > :: iterator it; public: LRUCache(int capacity) { this->capacity = capacity; head = new ListNode(0,0); capacity_cur = 0; } int get(int key) { if( is_exist(key) ) { //get down the loc node ListNode *loc = mkv[key].p; loc->pre->next = loc->next; loc->next->pre = loc->pre; //insert into the front of the head loc->next = head->next; loc->pre = head; head->next = loc; loc->next->pre = loc; return mkv[key].val; } else { return -1; } } void set(int key , int value) { if( is_exist(key) ) { //change the value in map and the list mkv[key].val = value; ListNode *q = mkv[key].p; q->value = value; //get the node and insert into the head of the list q->pre->next = q->next; q->next->pre = q->pre; q->next = head->next; q->pre = head; head->next->pre = q; head->next = q; return ; } ListNode *tmp = new ListNode(key,value); if(capacity_cur<capacity) { if(head->next==NULL) //the list is empty { head->next = tmp ; head->pre = tmp; tmp->next = head; tmp->pre = head; tValue tv(value,tmp); mkv[key] = tv; ++capacity_cur; } else //insert the tmp into the front of the list { tmp->next = head->next; tmp->pre = head; head->next->pre = tmp; head->next = tmp; tValue tv(value,tmp); mkv[key] = tv; ++capacity_cur; } } else { //get rid of the lru node ListNode *tail = head->pre; head->pre = tail->pre; tail->pre->next = head; mkv.erase(tail->key); delete tail; //insert into the new node tmp->next = head->next; tmp->pre = head; head->next->pre = tmp; head->next = tmp; tValue tv(value,tmp); mkv[key] = tv; } } bool is_exist(int key) { return ( mkv.find(key) != mkv.end() ); } void print() { ListNode *p = head->next; while(p!=head) { cout<<"key = "<<p->key<<" Value = "<<p->value<<endl; p = p->next; } cout<<endl; } }; int main() { /* LRUCache lru(3); lru.set(1,10); lru.print(); lru.set(2,20); lru.print(); lru.set(3,30); lru.print(); cout<<"get key = "<<1<<",Value = "<<lru.get(1)<<endl; lru.print(); lru.set(4,40); lru.print(); cout<<"get key = "<<3<<",Value = "<<lru.get(3)<<endl; lru.print(); lru.set(5,50); lru.print(); LRUCache lru1(2); lru1.set(2,1); lru1.print(); lru1.set(2,2); lru1.print(); cout<<"get key = "<<2<<",Value = "<<lru1.get(2)<<endl; lru1.set(1,1); lru1.print(); lru1.set(4,1); lru1.print(); cout<<"get key = "<<2<<",Value = "<<lru1.get(2)<<endl; */ LRUCache lru1(2); lru1.set(2,1); lru1.print(); lru1.set(1,1); lru1.print(); lru1.set(2,3); lru1.print(); lru1.set(4,1); lru1.print(); cout<<"get key = "<<1<<",Value = "<<lru1.get(1)<<endl; cout<<"get key = "<<2<<",Value = "<<lru1.get(2)<<endl; return 0; }
这道题速度的瓶颈就在于:
1).如果快速的找到key(在O(1)的时间内);
2)如果保持key的序列来符合LRU的条件(用链表,插入删除);
3).如何快速找到尾结点,即LRU的那个结点,然后替换掉,将新的结点插入到尾部(双向量表,指向尾结点的指针);
由于LRU是在内存中进行数据的淘汰,因此对速度的要求极高,所有的get,set操作都尽可能在O(1)的时间内完成。
Hints:
1)有可能有相同的key但是不同的value到队列里面,因此如果有相同的key,那么value要替换成新的值,然后插入到头结点!
2)AC代码中的map可以改成unordered_map,效率应该会更高一些。
注明出处:http://blog.csdn.net/lavorange/article/details/41852921