一、HashMap
特别说明:jdk8中对HashMap进行了优化,这里以jdk8为基准说明.
1.1 hashmap 结构
HashMap使用hash算法来实现字典数据结构(dictionary),任何hash算法必然伴随着hash冲突问题,而常用解决的方案有大致上有线性探测,二次探测,再hash等等,Hashmap中使用重链表法来解决。思路非常简单,即hash table中存储的不是数据,而是一个链表的数据结构,当发生冲突的时候,使用链表一并保存。
static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; V value; Node<K,V> next; //....部分源码
1.2 hash算法的实现
hash算法是什么?简单来说就是通过一种将Object->int将任何的对象转换成一个int数值,根据这个int值进行再次计算,(可以是对整个数组长度去模)实现迅速定位数组下标,从而进行高效put和get操作的算法。HashMap中并不是使用取模,而是使用了更加巧妙的方法,下文中会进行描述。
static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }
说明:(h>>>16),将得到的32位int往右移动了16位和 h=(key.hashCode())计算到的hash码做^运算,结果就是高16位不变,低16位和高16位异或计算。同时用到了高16位和低16位,计算结果更加发散。
取数组下标的时候运算是 (n-1) & hash进行计算,其中n=tab.length,hash是计算到的hash结果,一般的想法都是计算hash值然后对n-1进行取模,设计者先抑或计算可以使数据更加的分散,从而减少hash聚集(hash聚集就是hash冲突的诱因),减少hash冲突。而且(n-1)&hash算法使得HashMap resize更加的巧妙和容易。下面进行描述
1.3 resize算法的实现
HashMap中的长度默认是16,即2的4次方,每次扩容也是2倍,是2的整数倍,根据上面的定位算法,(n-1)&hash。当从16扩容为32时
因此,在扩容的时候,不需要重新的计算hash,hash算法也是有相当cost的,只需要看原来的hash值往左一位是0还是1,是0的话不变,是1的话+oldCap,例如
这样设计非常巧妙,即省去了重新计算hash值的时间,同时保证了一定的均匀。
1.4 put(Object key)实现思路
(1) key对象必须重写hashCode方法以计算hash值,通过上文描述的方式来计算索引位置index
(2) 如果没有发生hash碰撞,以链表的的形式存入buckets数组对应位置
(3) 如果发生了hash碰撞:
a. 如果碰撞导致链表过长>TREEIFY_THRESHOLD,将链表结构转换为红黑树结构(TreeNode就是红黑树),提高lookup效率
b. 没有过长就直接放到链表结构中
(4) 注意会调用key的Equals()方法判断对象已经存在,存在就替换
(5) 如果是新增且bucket超过了factor*capacity限制,resize操作
1.5 get(Object key)实现思路
根据key的hashcode计算索引:
(1) 不存在就返回null
(2) 存在,链表结构,根据equals()方法依次判断..O(n)的时间复杂度
(3) 存在,红黑树结构,根据equals()方法查找... O(logN)的时间复杂度
1.6 总结
HashMap的文档其实非常详细,而根据以上描述的内容,应该比较好理解该文档。
Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly[粗糙的] equivalent[等价] to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses[分散] the elements properly among the buckets. Iteration over collection views requires time proportional[比例,均衡] to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it‘s very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.
If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.
Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:
Map m = Collections.synchronizedMap(new HashMap(...));
The iterators returned by all of this class‘s "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator‘s own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
(跟hash算法有关的数据结构中,Key对象Class必须重写Equals()方法和hashCode()方法)
上述文档中描述了以下内容:
(1) 基于map接口实现的hash表,实现了map接口所有方法,允许null values和the null key(多个null values 和唯一的null key),不保证map中元素的有序;
(2) 算法实现中,put和get方法保证常量级别的get()和put()效率,且有较好的分散性;
(3) 对集合的迭代操作耗时与(hashmap对象的大小,即number of buckets * bucket中元素个数)成比例
(4) 初始化hashmap的时候不要把size调太高或者loadfactor调太低;
(5) 一个hashmap的对象实例有2个参数很重要:
initital capacity: The capacity is the number of the buckets in the hash table
load factor: The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically incresed.
When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed. (that is , 内部数据结构会重建),因此hash table大约扩容2倍
(6) 通用准备,默认的load factor为0.75比较合适。较大的load factor会decrease the space overhead,即降低空间上的负载(增长的慢),但是会增加the lookup cost(reflected in most of the operations of the HashMap class),为什么较大的factor会降低get和put的效率,hashmap设计采用重链表法来解决hash,较大的factor值导致了更少的可用索引,自然会导致更多的hash聚集,增加了hash冲突的概率,每个数据节点链表中mappings多的话,查找和删除的成本自然就增大了。
(7) 如果非常多的映射要存,设置一个相对较大的capacity可以减少resize次数,提高效率.如果Key Class实现的hashcode()方法不高明,或者说这个算法容易出现hash聚集现象,HashMap性能势必不高.
(8) HashMap线程不安全的
(9) 迭代器操作采取是的fail-fast的方式,即假如正在进行迭代操作,如果有其他的线程或者当前线程中调用remove()或者put()方法修改了数据,直接跑出异常ConcurrentModificationException,迭代迅速失败
(10) fail-fast并不能保证线程安全,如果对并发安全有要求,请不要使用HashMap
二、LinkedHashMap
public class LinkedHashMap<K,V> extends HashMap<K,V> implements Map<K,V>
继承于HashMap,LinkedHashMap实现和HashMap实现的不同之处在于,LinkedHashMap维护着一个运行于所有条目的双重链接列表。此链接列表定义了迭代顺序,这个顺序可以使插入元素的顺序,也可以是访问元素的顺序。
private static class Entry<K,V> extends HashMap.Entry<K,V> { Entry<K,V> before, after; …… }
2.1 构造器
public LinkedHashMap(int initialCapacity, float loadFactor,boolean accessOrder) { super(initialCapacity, loadFactor); this.accessOrder = accessOrder; }
当accessOrder=false时候表示元素的顺序按照插入元素的顺序,true则表示是访问元素的顺序
由于改成了双向链表,因此,重写了方法
void reinitialize() { super.reinitialize(); head = tail = null; }
下面是一个小demo:
//1.插入元素的顺序 Map<String, String> map = new LinkedHashMap<String, String>(16,0.75f,false); //2.访问元素的顺序 //Map<String, String> map = new LinkedHashMap<String, String>(16,0.75f,true); map.put("apple", "苹果"); map.put("watermelon", "西瓜"); map.put("banana", "香蕉"); map.put("peach", "桃子"); map.get("watermelon"); map.get("peach"); Iterator iter = map.entrySet().iterator(); while (iter.hasNext()) { Map.Entry entry = (Map.Entry) iter.next(); System.out.println(entry.getKey() + "=" + entry.getValue()); }
2.2 实现
这里根据源码进行简介
LinkedHashMap维护顺序的思路其实非常简单,用和table无关的2个变量head,tail根据是访问顺序还是读取顺序来记录,将以前的数据结构改为双端链表结构即可。由于需要双端链表来维护访问顺序或者插入顺序,需要重写节点的构造以及删除方法,实现如下:
Node<K,V> newNode(int hash, K key, V value, Node<K,V> e) { LinkedHashMap.Entry<K,V> p = new LinkedHashMap.Entry<K,V>(hash, key, value, e); linkNodeLast(p); return p; } Node<K,V> replacementNode(Node<K,V> p, Node<K,V> next) { LinkedHashMap.Entry<K,V> q = (LinkedHashMap.Entry<K,V>)p; LinkedHashMap.Entry<K,V> t = new LinkedHashMap.Entry<K,V>(q.hash, q.key, q.value, next); transferLinks(q, t); return t; } TreeNode<K,V> newTreeNode(int hash, K key, V value, Node<K,V> next) { TreeNode<K,V> p = new TreeNode<K,V>(hash, key, value, next); linkNodeLast(p); return p; } TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) { LinkedHashMap.Entry<K,V> q = (LinkedHashMap.Entry<K,V>)p; TreeNode<K,V> t = new TreeNode<K,V>(q.hash, q.key, q.value, next); transferLinks(q, t); return t; }
在jdk1.8中,HashMap为LinkedHashMap专门提供了钩子方法:
// Callbacks to allow LinkedHashMap post-actions void afterNodeAccess(Node<K,V> p) { } void afterNodeInsertion(boolean evict) { } void afterNodeRemoval(Node<K,V> p) { }
实现如下:
// 在节点删除之后unlink,即双端都设置为null void afterNodeRemoval(Node<K,V> e) { // unlink LinkedHashMap.Entry<K,V> p = (LinkedHashMap.Entry<K,V>)e, b = p.before, a = p.after; p.before = p.after = null; if (b == null) head = a; else b.after = a; if (a == null) tail = b; else a.before = b; } //插入节点之后,很有可能要移除元素 void afterNodeInsertion(boolean evict) { // possibly remove eldest LinkedHashMap.Entry<K,V> first; if (evict && (first = head) != null && removeEldestEntry(first)) { K key = first.key; removeNode(hash(key), key, null, false, true); } } // 在元素被访问之后,用于修改链表顺序,将被访问的的元素放置到最后 void afterNodeAccess(Node<K,V> e) { // move node to last LinkedHashMap.Entry<K,V> last; if (accessOrder && (last = tail) != e) { LinkedHashMap.Entry<K,V> p = (LinkedHashMap.Entry<K,V>)e, b = p.before, a = p.after; p.after = null; if (b == null) head = a; else b.after = a; if (a != null) a.before = b; else last = b; if (last == null) head = p; else { p.before = last; last.after = p; } tail = p; ++modCount; } }
在读取的时候,有可能会根据访问顺序来重新排序
public V get(Object key) { Node<K,V> e; if ((e = getNode(hash(key), key)) == null) return null; if (accessOrder) afterNodeAccess(e); return e.value; } /** * {@inheritDoc} */ public V getOrDefault(Object key, V defaultValue) { Node<K,V> e; if ((e = getNode(hash(key), key)) == null) return defaultValue; if (accessOrder) afterNodeAccess(e); return e.value; }
2.3 LinkedHashMap与LRU算法
LRU 缓存利用了这样的一种思想。LRU 是 Least Recently Used 的缩写,翻译过来就是“最近最少使用”,也就是说,LRU 缓存把最近最少使用的数据移除,让给最新读取的数据。而往往最常读取的,也是读取次数最多的,所以,利用 LRU 缓存,我们能够提高系统的 performance。以下主要用于演示LRU Cache的实现思路,实践中LRU缓存更倾向于使用Guava中的实现。
https://github.com/tracylihui/LRUcache-Java上有LRUCache的简单实现,思路非常简单,重写LinkedHashMap中的钩子方法removeEldestEntry方法即可
package ysz.demo; import java.util.LinkedHashMap; import java.util.Collection; import java.util.Map; import java.util.ArrayList; /** * An LRU cache, based on <code>LinkedHashMap</code>. * * <p> * This cache has a fixed maximum number of elements (<code>cacheSize</code>). * If the cache is full and another entry is added, the LRU (least recently * used) entry is dropped. * * <p> * This class is thread-safe. All methods of this class are synchronized. * * <p> * Author: Christian d‘Heureuse, Inventec Informatik AG, Zurich, Switzerland<br> * Multi-licensed: EPL / LGPL / GPL / AL / BSD. */ public class LRUCache<K, V> { private static final float hashTableLoadFactor = 0.75f; private LinkedHashMap<K, V> map; private int cacheSize; /** * Creates a new LRU cache. 在该方法中,new LinkedHashMap<K,V>(hashTableCapacity, * hashTableLoadFactor, true)中,true代表使用访问顺序 * * @param cacheSize * the maximum number of entries that will be kept in this cache. */ public LRUCache(int cacheSize) { this.cacheSize = cacheSize; int hashTableCapacity = (int) Math .ceil(cacheSize / hashTableLoadFactor) + 1; map = new LinkedHashMap<K, V>(hashTableCapacity, hashTableLoadFactor, true) { // (an anonymous inner class) private static final long serialVersionUID = 1; @Override protected boolean removeEldestEntry(Map.Entry<K, V> eldest) { return size() > LRUCache.this.cacheSize; } }; } /** * Retrieves an entry from the cache.<br> * The retrieved entry becomes the MRU (most recently used) entry. * * @param key * the key whose associated value is to be returned. * @return the value associated to this key, or null if no value with this * key exists in the cache. */ public synchronized V get(K key) { return map.get(key); } /** * Adds an entry to this cache. The new entry becomes the MRU (most recently * used) entry. If an entry with the specified key already exists in the * cache, it is replaced by the new entry. If the cache is full, the LRU * (least recently used) entry is removed from the cache. * * @param key * the key with which the specified value is to be associated. * @param value * a value to be associated with the specified key. */ public synchronized void put(K key, V value) { map.put(key, value); } /** * Clears the cache. */ public synchronized void clear() { map.clear(); } /** * Returns the number of used entries in the cache. * * @return the number of entries currently in the cache. */ public synchronized int usedEntries() { return map.size(); } /** * Returns a <code>Collection</code> that contains a copy of all cache * entries. * * @return a <code>Collection</code> with a copy of the cache content. */ public synchronized Collection<Map.Entry<K, V>> getAll() { return new ArrayList<Map.Entry<K, V>>(map.entrySet()); } // Test routine for the LRUCache class. public static void main(String[] args) { LRUCache<String, String> c = new LRUCache<String, String>(3); c.put("1", "one"); // 1 c.put("2", "two"); // 2 1 c.put("3", "three"); // 3 2 1 c.put("4", "four"); // 4 3 2 if (c.get("2") == null) throw new Error(); // 2 4 3 c.put("5", "five"); // 5 2 4 c.put("4", "second four"); // 4 5 2 // Verify cache content. if (c.usedEntries() != 3) throw new Error(); if (!c.get("4").equals("second four")) throw new Error(); if (!c.get("5").equals("five")) throw new Error(); if (!c.get("2").equals("two")) throw new Error(); // List cache content. for (Map.Entry<String, String> e : c.getAll()) System.out.println(e.getKey() + " : " + e.getValue()); } }