HashMap与TreeMap源码分析

1. 引言

    在红黑树——算法导论(15)中学习了红黑树的原理。本来打算自己来试着实现一下,然而在看了JDK(1.8.0)TreeMap的源码后恍然发现原来它就是利用红黑树实现的(很惭愧学了Java这么久,也写过一些小项目,也使用过TreeMap无数次,但到现在才明白它的实现原理)。因此本着“不要重复造轮子”的思想,就用这篇博客来记录分析TreeMap源码的过程,也顺便瞅一瞅HashMap。

2. 继承结构

(1) 继承结构

下面是HashMap与TreeMap的继承结构:

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {

 

public class TreeMap<K,V> extends AbstractMap<K,V> implements NavigableMap<K,V>, Cloneable, java.io.Serializable

可以看出它们都继承了AbstractMap。而HashMap是直接实现的Map,TreeMap实现的是NavigableMap(Cloneable和Serializable忽略)。

(2) Map接口

Map接口相信大家应该都很熟悉。下面贴出JDK中它的说明:

An object that maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value.

This interface takes the place of the Dictionary class, which was a totally abstract class rather than an interface.

The Map interface provides three collection views, which allow a map‘s contents to be viewed as a set of keys, collection of values, or set of key-value mappings. The order of a map is defined as the order in which the iterators on the map‘s collection views return their elements. Some map implementations, like the TreeMap class, make specific guarantees as to their order; others, like the HashMap class, do not.

Note: great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map. A special case of this prohibition is that it is not permissible for a map to contain itself as a key. While it is permissible for a map to contain itself as a value, extreme caution is advised: the equals and hashCode methods are no longer well defined on such a map.

All general-purpose map implementation classes should provide two "standard" constructors: a void (no arguments) constructor which creates an empty map, and a constructor with a single argument of type Map, which creates a new map with the same key-value mappings as its argument. In effect, the latter constructor allows the user to copy any map, producing an equivalent map of the desired class. There is no way to enforce this recommendation (as interfaces cannot contain constructors) but all of the general-purpose map implementations in the JDK comply.

The "destructive" methods contained in this interface, that is, the methods that modify the map on which they operate, are specified to throw UnsupportedOperationException if this map does not support the operation. If this is the case, these methods may, but are not required to, throw an UnsupportedOperationException if the invocation would have no effect on the map. For example, invoking the putAll(Map) method on an unmodifiable map may, but is not required to, throw the exception if the map whose mappings are to be "superimposed" is empty.

Some map implementations have restrictions on the keys and values they may contain. For example, some implementations prohibit null keys and values, and some have restrictions on the types of their keys. Attempting to insert an ineligible key or value throws an unchecked exception, typically NullPointerException or ClassCastException. Attempting to query the presence of an ineligible key or value may throw an exception, or it may simply return false; some implementations will exhibit the former behavior and some will exhibit the latter. More generally, attempting an operation on an ineligible key or value whose completion would not result in the insertion of an ineligible element into the map may throw an exception or it may succeed, at the option of the implementation. Such exceptions are marked as "optional" in the specification for this interface.

Many methods in Collections Framework interfaces are defined in terms of the equals method. For example, the specification for the containsKey(Object key) method says: "returns true if and only if this map contains a mapping for a key k such that (key==null ? k==null : key.equals(k))." This specification should not be construed to imply that invoking Map.containsKey with a non-null argument key will cause key.equals(k) to be invoked for any key k. Implementations are free to implement optimizations whereby the equals invocation is avoided, for example, by first comparing the hash codes of the two keys. (The Object.hashCode() specification guarantees that two objects with unequal hash codes cannot be equal.) More generally, implementations of the various Collections Framework interfaces are free to take advantage of the specified behavior of underlying Object methods wherever the implementor deems it appropriate.

Some map operations which perform recursive traversal of the map may fail with an exception for self-referential instances where the map directly or indirectly contains itself. This includes the clone(), equals(), hashCode() and toString() methods. Implementations may optionally handle the self-referential scenario, however most current implementations do not do so.

This interface is a member of the Java Collections Framework.

    大意就是,map是一个将keys与values映射起来的对象,不能包含重复的key;map提供了3个collection views(集合视图?。就是我们经常使用的public Set<K> keySet(),public Collection<V> values()和public Set<Map.Entry<K,V>> entrySet())有些map有序,有些map无序;有些map对key和value有些特别的要求(如是否允许为null)。

(3) SortedMap接口

那么NavigableMap接口是什么呢?

通过源码可以看出,NavigableMap接口又继承了SortedMap。从SortedMap的名字就可以看出,它要求有序。

下面是JDK中对SortedMap的说明:

A Map that further provides a total ordering on its keys. The map is ordered according to the natural ordering of its keys, or by a Comparator typically provided at sorted map creation time. This order is reflected when iterating over the sorted map‘s collection views (returned by the entrySet, keySet and values methods). Several additional operations are provided to take advantage of the ordering. (This interface is the map analogue of SortedSet.)

All keys inserted into a sorted map must implement the Comparable interface (or be accepted by the specified comparator). Furthermore, all such keys must be mutually comparable: k1.compareTo(k2) (or comparator.compare(k1, k2)) must not throw a ClassCastException for any keys k1 and k2 in the sorted map. Attempts to violate this restriction will cause the offending method or constructor invocation to throw a ClassCastException.

Note that the ordering maintained by a sorted map (whether or not an explicit comparator is provided) must be consistent with equals if the sorted map is to correctly implement the Map interface. (See the Comparable interface or Comparator interface for a precise definition of consistent with equals.) This is so because the Map interface is defined in terms of the equals operation, but a sorted map performs all key comparisons using its compareTo (or compare) method, so two keys that are deemed equal by this method are, from the standpoint of the sorted map, equal. The behavior of a tree map is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Map interface.

All general-purpose sorted map implementation classes should provide four "standard" constructors. It is not possible to enforce this recommendation though as required constructors cannot be specified by interfaces. The expected "standard" constructors for all sorted map implementations are:

  1. A void (no arguments) constructor, which creates an empty sorted map sorted according to the natural ordering of its keys.
  2. A constructor with a single argument of type Comparator, which creates an empty sorted map sorted according to the specified comparator.
  3. A constructor with a single argument of type Map, which creates a new map with the same key-value mappings as its argument, sorted according to the keys‘ natural ordering.
  4. A constructor with a single argument of type SortedMap, which creates a new sorted map with the same key-value mappings and the same ordering as the input sorted map.

Note: several methods return submaps with restricted key ranges. Such ranges are half-open, that is, they include their low endpoint but not their high endpoint (where applicable). If you need a closed range (which includes both endpoints), and the key type allows for calculation of the successor of a given key, merely request the subrange from lowEndpoint to successor(highEndpoint). For example, suppose that m is a map whose keys are strings. The following idiom obtains a view containing all of the key-value mappings in m whose keys are between low and high, inclusive:

   SortedMap<String, V> sub = m.subMap(low, high+"\0");

A similar technique can be used to generate an open range (which contains neither endpoint). The following idiom obtains a view containing all of the key-value mappings in m whose keys are between low and high, exclusive:

   SortedMap<String, V> sub = m.subMap(low+"\0", high);

This interface is a member of the Java Collections Framework.

    大意就是说, SortedMap是一个按key的顺序排序了的Map。它是根据key的“自然顺序”或创建map时提供的Comparator来排序的。相比于普通的map,它还利用排序的特点支持一些其他的集合操作。插入到SortedMap中的key必须实现Comparable接口或者为map提供一个Comparator对象,并且它们的比较结果要和调用equals方法比较的结果一致,这么做是因为Map接口依据equals操作被定义,而SortedMap是依据compareTo方法或compare方法被定义,因此要保证二者的一致性。

    查看SortedMap的outline可以发现它需要实现的额外方法是获取Comparator的Comparator<? super K> comparator()方法,以及一些截取Map的方法,如SortedMap<K,V> subMap(K fromKey, K toKey)等。

(4) NavigableMap接口

再来看看NavigableMap:

A SortedMap extended with navigation methods returning the closest matches for given search targets. Methods lowerEntry, floorEntry, ceilingEntry, and higherEntry return Map.Entry objects associated with keys respectively less than, less than or equal, greater than or equal, and greater than a given key, returning null if there is no such key. Similarly, methods lowerKey, floorKey, ceilingKey, and higherKey return only the associated keys. All of these methods are designed for locating, not traversing entries.

A NavigableMap may be accessed and traversed in either ascending or descending key order. The descendingMap method returns a view of the map with the senses of all relational and directional methods inverted. The performance of ascending operations and views is likely to be faster than that of descending ones. Methods subMap, headMap, and tailMap differ from the like-named SortedMap methods in accepting additional arguments describing whether lower and upper bounds are inclusive versus exclusive. Submaps of any NavigableMap must implement the NavigableMap interface.

This interface additionally defines methods firstEntry, pollFirstEntry, lastEntry, and pollLastEntry that return and/or remove the least and greatest mappings, if any exist, else returning null.

Implementations of entry-returning methods are expected to return Map.Entry pairs representing snapshots of mappings at the time they were produced, and thus generally do not support the optional Entry.setValue method. Note however that it is possible to change mappings in the associated map using method put.

Methods subMap(K, K), headMap(K), and tailMap(K) are specified to return SortedMap to allow existing implementations of SortedMap to be compatibly retrofitted to implement NavigableMap, but extensions and implementations of this interface are encouraged to override these methods to return NavigableMap. Similarly, keySet() can be overriden to return NavigableSet.

This interface is a member of the Java Collections Framework.

Navigable的意思是“可驾驶的,适于航行的”,NavigableMap提供了一些便捷的方法用来搜索map中最匹配的对象,如Map.Entry<K,V> lowerEntry(K key)、Map.Entry<K,V> floorEntry(K key)等方法。

(5) AbstractMap抽象类

    AbstractMap抽象类给出了Map的一些方法的实现。

下面正式开始分析HashMap与TreeMap的实现,我们主要关心的是基本动态集合操作方法的实现。

3. HashMap

    首先分析HashMap。  

① 首先看一看JDK关于HashMap的介绍:

Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it‘s very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.

Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:

   Map m = Collections.synchronizedMap(new HashMap(...));

The iterators returned by all of this class‘s "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator‘s own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.

Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.

This class is a member of the Java Collections Framework.

    大意是说:HashMap是基于Hash table实现的Map,它实现了Map中所有的可选的操作,并且允许key或value为null,近似的等价于Hashtable(除了HashMap是非同步并且允许null值);它不保证元素的顺序;如果插入的元素被Hash函数正确的分散在不同的桶(槽,bucket)中,get和put操作都只需要常量时间。而迭代随需时间与map的size和capacity(容量)成正比,因此如果很看重迭代效率,不要将它的容量设的过大,这个效率反映在HashMap的两个参数上,即初始容量(initial capacity)和装载因子(load factor)(至于原因可看散列表(hash table)——算法导论(13))。当HashMap“过载”时,会自动“翻新”,容量大约为之前的两倍。通常,装载因子默认为0.75,它在时间和空间之间的权衡下表现的很好。

②再来看看HashMap的2个静态内部类:

Node:

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey()        { return key; }
    public final V getValue()      { return value; }
    public final String toString() { return key + "=" + value; }

    public final int hashCode() {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;
    }
}

    代码很简单,它实现了Map中的一个内部接口Entry。注意到成员变量中有一个next,也是Node类型,我们猜测他的作用是当put进的元素冲突时,利用它把冲突的元素以链表的形式串联起来,究竟是不是这样还要看完后面的分析才知道;我们还注意到它重写了hashCode()方法,其中计算hash值的方法是:用key的hash值“异或”value的hash值。

TreeNode:

static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
        TreeNode<K,V> parent;  // red-black tree links
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        boolean red;

        TreeNode(int hash, K key, V val, Node<K,V> next) {
            super(hash, key, val, next);
        }

        ...
}

    由于篇幅有限,上面只贴出了成员变量和构造方法,还有一些对树的操作方法省略。它继承自LinkedHashMap.Entry,而LinkedHashMap.Entry又继承自Map.Node,即TreeNode实际上是Node的“孙子”。从成员变量可以看出,TreeNode很可能是一棵红黑树(关于红黑树介绍见红黑树——算法导论(15))的结点类,做这样封装可能也是为了处理冲突的情况(将冲突的元素以红黑树的形式组织起来)。

    这时我们可能会有疑惑了,为啥它有两种不同数据结构的Node,难道它会根据冲突元素的个数来选择不同的解决策略?

    带着这个疑问,我们在源码中又找到一处说明:

Implementation notes.
This map usually acts as a binned (bucketed) hash table, but when bins get too large, they are transformed into bins of TreeNodes, each structured similarly to those in * java.util.TreeMap.
...

   它验证了我们的猜测:当在一个槽位中冲突的元素较少时,会直接以链表的形式将它们串联起来;但当冲突的元素太多时,它会以红黑树的方式将这些冲突的元素组织起来。

    这样的处理也是很好理解的,因为链表的查找效率较低。

③HashMap有4个构造方法

public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // DEFAULT_LOAD_FACTOR,默认装载因子,值为0.75
}

public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY) // MAXIMUM_CAPACITY = 1<<30;
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
}

public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m, false);
}

    无参构造器只是设置了装载因子值为0.75;HashMap(int initialCapacity)构造方法允许指定初始容量,而装载因子设为默认。HashMap(int initialCapacity, float loadFactor)构造方法允许指定初始容量和装载因子;HashMap(Map<? extends K, ? extends V> m)则允许根据另一个Map来构造HashMap。

    HashMap(int initialCapacity)只是调用了HashMap(int initialCapacity, float loadFactor)构造方法,而在HashMap(int initialCapacity, float loadFactor)中,先是检查参数的合法性,最后调用了tableSizeFor方法(返回值赋给了threshold变量);HashMap(Map<? extends K, ? extends V> m)则调用了putMapEntries方法。下面再分析tableSizeFor方法和putMapEntries方法。

tableSizeFor方法

static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

    参数cap表示容量(≥0),该方法的计算结果threshold为大于或等于cap的平方数中最小的平方数(1,2,4,8,16...这样的数被称为平方数),它表示下一次resize时size的值。

putMapEntries方法

final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
        int s = m.size();
        if (s > 0) {
            if (table == null) { // pre-size
                float ft = ((float)s / loadFactor) + 1.0F;
                int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                         (int)ft : MAXIMUM_CAPACITY);
                if (t > threshold)
                    threshold = tableSizeFor(t);
            }
            else if (s > threshold)
                resize();
            for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
                K key = e.getKey();
                V value = e.getValue();
                putVal(hash(key), key, value, false, evict);
            }
        }
}

    这里没对m做非null检查,因此我们使用HashMap(Map<? extends K, ? extends V> m)构造方法时,如果传入的map为null,将引发NullPointerException;如果传入的map是empty,则不做处理;否则,分table是否为null进行处理(简单看一眼table的说明,它是Node<K,V>[]类型,在第一次使用时被初始化,具体是什么先不管):

    a)若table==null,首先根据loadFactor和m的size计算capacity,计算方法就是根据loadfactor的定义式计算,至于为啥加1,结合后面的类型转换和比较操作,这样做实际是在做向上取整,这样既保证了capacity要充分大,而又不能超过MAXIMUM_CAPACITY,很巧妙!),然后如果计算出的capacity(即代码中的t)比下一次resize时size的值(即代码中的threshold)还大,那就和1)中一样,重新计算threshold。

    b)如table不为null,且m.size()大于threshold,那么调用resize()方法;

    执行完上述步骤(或都不执行)后,遍历传入的map,然后调用putVal()方法;

    下面再来分析resize()方法和putVal()方法;

resize方法

final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
//=============================分隔线====================================
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
}

    代码不难,但比较繁琐,因此不一一细讲。

    总体而言,上半部分(分割线以上)的工作是重新计算threshold,和新的capacity(newCap),计算规则是如果之前有初始化过,就将其这两个变量的值扩大为原来的2倍(当然若扩大后capacity大于MAXIMUM_CAPACITY,则threshold = Integer.MAX_VALUE, newCap = MAXIMUM_CAPACITY);否则threshold  = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY),newCap = DEFAULT_INITIAL_CAPACITY。下半部分(分割线以下)的工作是以newCap为长度new一个新的数组(table指向它),然后把旧的table中的值copy(浅copy)到新的table中。由此我们也搞清楚了table的作用,它是数据存放的“载体”(即数据实际上就是放在了table数组中)。

   在散列表(hash table)——算法导论(13)中我们已经分析了hash table 面临的最关键的问题有2个,一是hash 函数的选取,二是冲突的解决。那么JDK的代码中是如何解决这两个问题的呢?

    先看第一个问题。通过newTab[e.hash & (newCap - 1)] = e;这一句我们可以看出,他是通过e.hash & (newCap - 1)来计算槽位的。“与” (newCap - 1)操作实际上就是取“被与数”(e.hash)的低n位(n等于newCap的有效位数减1),这样不仅可以避免ArrayIndexOutOfBounds(这个是很容易看出的),而且可以尽量减少冲突的产生(大致解释是因为hashcode本身具有一定的唯一性,它的低n位也具有一定的唯一性。严格解释还有待论证)。

    再看第二个问题。前面我们就知道了,它会根据冲突元素的个数的多少,选择不同的策略去处理冲突元素,代码中我们也确实看出,它分了e instanceof TreeNode是否成立在处理。

    如果结点类型是TreeNode,则调用TreeNode类中的split方法。该方法根据(e.hash & bit) == 0是否成立会把红黑树一分为二,并把这两部分的元素以链表形式窜起来得到“高”“低”两条链(条件成立的元素构成的是“低链”)。之后把“低链”置于原来的槽位,而“高链”置于下标为index + bit槽位(index 是原来槽位的下标,而bit是扩充前table数组的长度。这样做数组下标肯定是不会越界的,因为我们都是以原先的2倍在扩充)。还需要说明的是,“高”“低”两条链也不是直接以链表的形式置于相应的槽位,而是同样根据链的长短进行判断。这个长短的标准是:若小于或等于UNTREEIFY_THRESHOLD(6),做链表处理;否则做红黑树处理。

put方法

put方法很简单,就调用了一下putVal方法:

public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
}

在调用putVal方法时,调用了hash方法

static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

计算过程还是很简单的,若key为null,则hash为0,否则为key.hashCode()^(key.hashCode()>>>16),但至于为什么要这么计算(或者说这么计算的好处),还不清楚(知道的请求告知我)。

然后看putVal方法:

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
}

    同样,具体细节不全讲,只讲大概做了什么和比较关键的细节。

    首先,检查table是否初始化了,没有就resize。

    然后,检查相应的槽位上是否已经有元素,没有就直接插入;有则比较槽位上的元素与待插入的元素是否相等,相等的标准是:(k = e.key) == key || (key != null && key.equals(k)))。若相等,直接e=p(p“指向”该槽位上的元素);若不相等,就根据该槽位上元素的类型做不同的处理。

    如果元素是TreeNode类型,则调用putTreeVal方法将元素“挂载”到红黑树上,此情况结束后e的指向与接下来的情况类似;

    如果元素是普通Node类型,就不断的向后“移动”p(看作一个指针),直到两种情况的出现:一是“移动”到了链表的末尾,此时会把待插入的元素链接在链表的末尾,并且判断链表的长度是否大于或等于TREEIFY_THRESHOLD(8)-1,是,则调用treeifyBin方法把链表转化为红黑树。该情况结束后e=null;二是遇到了相等(相等的标准同上)的元素。该情况结束后e = p;

    接下来,如果e不为null,即遇到了上面的第二种情况,此时e = p,会根据情况做是否覆盖原先的value处理,这个根据是!onlyIfAbsent || oldValue == null如果是true就覆盖,否则就不覆盖。这样就能理解为什么put方法在遇到key相等(相等的标准同上)时,会覆盖之前的value,而putIfAbsent方法则不会(事实上,putIfAbsent方法就一句代码,调用putVal(hash(key), key, value, true, true))。

    最后++modCount;如果size大于了threshold,就resize。

    另外补充一下,从tab[i] = newNode(hash, key, value, null); 一句我们可以看出,实际上Node结点中的hash值是由上面提到的hash(Object key)方法计算得出的;

    至于putTreeVal方法与treeifyBin方法的分析,放在TreeMap的源码分析中(可能方法名不同,但思想是一样的,都是对红黑树进行操作)

    到此为止,put方法便分析完了。

get方法

final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
}

get方法比put方法就简单了许多,就是一个查找过程:首先是根据key索引到相应槽位,若该槽位为null则返回null;若不为null且槽位上的对象的key就“等于”get的key,那么直接返回该对象;否则分红黑树和链表两种情况进行查找。

    基本上分析到这里HashMap的核心就清楚了。TreeMap的分析会另写一篇。

    如有不对,欢迎指正。

时间: 2024-10-18 10:37:11

HashMap与TreeMap源码分析的相关文章

Java集合系列之TreeMap源码分析

一.概述 TreeMap是基于红黑树实现的.由于TreeMap实现了java.util.sortMap接口,集合中的映射关系是具有一定顺序的,该映射根据其键的自然顺序进行排序或者根据创建映射时提供的Comparator进行排序,具体取决于使用的构造方法.另外TreeMap中不允许键对象是null. 1.什么是红黑树? 红黑树是一种特殊的二叉排序树,主要有以下几条基本性质: 每个节点都只能是红色或者黑色 根节点是黑色 每个叶子节点是黑色的 如果一个节点是红色的,则它的两个子节点都是黑色的 从任意一

JDK1.8 HashMap中put源码分析

一.存储结构      在JDK1.8之前,HashMap采用桶+链表实现,本质就是采用数组+单向链表组合型的数据结构.它之所以有相当快的查询速度主要是因为它是通过计算散列码来决定存储的位置.HashMap通过key的hashCode来计算hash值,不同的hash值就存在数组中不同的位置,当多个元素的hash值相同时(所谓hash冲突),就采用链表将它们串联起来(链表解决冲突),放置在该hash值所对应的数组位置上.结构图如下:     图中,紫色部分代表哈希表,也称为哈希数组,数组中每个元素

面试(1)-HashMap原理与源码分析(JDK1.8)

1.HashMap介绍 HashMap为Map接口的一个实现类,实现了Map所有的操作.HashMap除了允许key.value为null值和非线程安全外,其他实现几乎和HashTable一致.HashMap使用散列存储的方式保存kay-value键值对,因此其不支持数据保存的顺序.如果想要使用有序容器可以使用LinkedHashMap.在性能上当HashMap中保存的key的哈希算法能够均匀的分布在每个bucket中的时候,HashMap在基本的get和set操作的的时间复杂度都是O(n).在

死磕 java集合之TreeMap源码分析(四)-内含彩蛋

欢迎关注我的公众号"彤哥读源码",查看更多源码系列文章, 与彤哥一起畅游源码的海洋. 二叉树的遍历 我们知道二叉查找树的遍历有前序遍历.中序遍历.后序遍历. (1)前序遍历,先遍历我,再遍历我的左子节点,最后遍历我的右子节点: (2)中序遍历,先遍历我的左子节点,再遍历我,最后遍历我的右子节点: (3)后序遍历,先遍历我的左子节点,再遍历我的右子节点,最后遍历我: 这里的前中后都是以"我"的顺序为准的,我在前就是前序遍历,我在中就是中序遍历,我在后就是后序遍历. 下

HashMap底层实现(源码分析)

一.数据结构 Map将实际数据存储在Entry类的数组中. 代码片段: Java代码     transient Entry[] table;//HashMap的成员变量,存放数据 static class Entry<K,V> implements Map.Entry<K,V> {//内部类Entry final K key; V value; Entry<K,V> next;//指向下一个数据 final int hash; /** * Creates new en

死磕 java集合之TreeMap源码分析(一)

欢迎关注我的公众号"彤哥读源码",查看更多源码系列文章, 与彤哥一起畅游源码的海洋. 简介 TreeMap使用红黑树存储元素,可以保证元素按key值的大小进行遍历. 继承体系 TreeMap实现了Map.SortedMap.NavigableMap.Cloneable.Serializable等接口. SortedMap规定了元素可以按key的大小来遍历,它定义了一些返回部分map的方法. public interface SortedMap<K,V> extends Ma

死磕 java集合之TreeMap源码分析(三)- 内含红黑树分析全过程

欢迎关注我的公众号"彤哥读源码",查看更多源码系列文章, 与彤哥一起畅游源码的海洋. 删除元素 删除元素本身比较简单,就是采用二叉树的删除规则. (1)如果删除的位置有两个叶子节点,则从其右子树中取最小的元素放到删除的位置,然后把删除位置移到替代元素的位置,进入下一步. (2)如果删除的位置只有一个叶子节点(有可能是经过第一步转换后的删除位置),则把那个叶子节点作为替代元素,放到删除的位置,然后把这个叶子节点删除. (3)如果删除的位置没有叶子节点,则直接把这个删除位置的元素删除即可.

TreeMap源码分析

TreeMap继承关系,和HashMap同继承于AbstractMap,属于兄弟关系. public class TreeMap<K,V> extends AbstractMap<K,V> implements NavigableMap<K,V>, Cloneable, java.io.Serializable -1. 基于 NavigableMap 实现的红黑树 -2. 按 natrual ordering(Comparator==null,默认) 或者 Compar

TreeMap 源码分析

TreeMap 1)TreeMap 是基于红黑树的 NavigableMap 接口实现类. 2)TreeMap 根据键的自然顺序或构造时提供的比较器进行键的排序. 3){@code containsKey}, {@code get}, {@code put} and {@code remove} 方法的时间复杂度为 log(n). 4)TreeMap 不是线程同步的,多线程并发访问 TreeMap 并且至少有一个线程修改了其结构,则它必须在外部实现同步. SortedMap m = Collec