Java常用集合类(1)

一、HashMap

参考文章: http://yikun.github.io/2015/04/01/Java-HashMap%E5%B7%A5%E4%BD%9C%E5%8E%9F%E7%90%86%E5%8F%8A%E5%AE%9E%E7%8E%B0/

特别说明:jdk8中对HashMap进行了优化,这里以jdk8为基准说明.

1.1 hashmap 结构

  HashMap使用hash算法来实现字典数据结构(dictionary),任何hash算法必然伴随着hash冲突问题,而常用解决的方案有大致上有线性探测,二次探测,再hash等等,Hashmap中使用重链表法来解决。思路非常简单,即hash table中存储的不是数据,而是一个链表的数据结构,当发生冲突的时候,使用链表一并保存。

  

static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
        //....部分源码

1.2 hash算法的实现

hash算法是什么?简单来说就是通过一种将Object->int将任何的对象转换成一个int数值,根据这个int值进行再次计算,(可以是对整个数组长度去模)实现迅速定位数组下标,从而进行高效put和get操作的算法。HashMap中并不是使用取模,而是使用了更加巧妙的方法,下文中会进行描述。

static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

说明:(h>>>16),将得到的32位int往右移动了16位和 h=(key.hashCode())计算到的hash码做^运算,结果就是高16位不变,低16位和高16位异或计算。同时用到了高16位和低16位,计算结果更加发散。

 

取数组下标的时候运算是 (n-1) & hash进行计算,其中n=tab.length,hash是计算到的hash结果,一般的想法都是计算hash值然后对n-1进行取模,设计者先抑或计算可以使数据更加的分散,从而减少hash聚集(hash聚集就是hash冲突的诱因),减少hash冲突。而且(n-1)&hash算法使得HashMap resize更加的巧妙和容易。下面进行描述

1.3 resize算法的实现

HashMap中的长度默认是16,即2的4次方,每次扩容也是2倍,是2的整数倍,根据上面的定位算法,(n-1)&hash。当从16扩容为32时

因此,在扩容的时候,不需要重新的计算hash,hash算法也是有相当cost的,只需要看原来的hash值往左一位是0还是1,是0的话不变,是1的话+oldCap,例如

这样设计非常巧妙,即省去了重新计算hash值的时间,同时保证了一定的均匀。

1.4 put(Object key)实现思路

(1) key对象必须重写hashCode方法以计算hash值,通过上文描述的方式来计算索引位置index

(2) 如果没有发生hash碰撞,以链表的的形式存入buckets数组对应位置

(3) 如果发生了hash碰撞:

  a. 如果碰撞导致链表过长>TREEIFY_THRESHOLD,将链表结构转换为红黑树结构(TreeNode就是红黑树),提高lookup效率

  b. 没有过长就直接放到链表结构中

(4) 注意会调用key的Equals()方法判断对象已经存在,存在就替换

(5) 如果是新增且bucket超过了factor*capacity限制,resize操作

1.5 get(Object key)实现思路

根据key的hashcode计算索引:

  (1) 不存在就返回null

  (2) 存在,链表结构,根据equals()方法依次判断..O(n)的时间复杂度

  (3) 存在,红黑树结构,根据equals()方法查找... O(logN)的时间复杂度

1.6 总结

HashMap的文档其实非常详细,而根据以上描述的内容,应该比较好理解该文档。

Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly[粗糙的] equivalent[等价] to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses[分散] the elements properly among the buckets. Iteration over collection views requires time proportional[比例,均衡] to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it‘s very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.
If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.
Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:
Map m = Collections.synchronizedMap(new HashMap(...));
The iterators returned by all of this class‘s "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator‘s own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.

(跟hash算法有关的数据结构中,Key对象Class必须重写Equals()方法和hashCode()方法)

上述文档中描述了以下内容:

(1) 基于map接口实现的hash表,实现了map接口所有方法,允许null values和the null key(多个null values 和唯一的null key),不保证map中元素的有序;

(2) 算法实现中,put和get方法保证常量级别的get()和put()效率,且有较好的分散性;

(3) 对集合的迭代操作耗时与(hashmap对象的大小,即number of buckets * bucket中元素个数)成比例

(4) 初始化hashmap的时候不要把size调太高或者loadfactor调太低;

(5) 一个hashmap的对象实例有2个参数很重要:
  initital capacity: The capacity is the number of the buckets in the hash table
  load factor: The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically incresed.
  When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed. (that is , 内部数据结构会重建),因此hash table大约扩容2倍

(6) 通用准备,默认的load factor为0.75比较合适。较大的load factor会decrease the space overhead,即降低空间上的负载(增长的慢),但是会增加the lookup cost(reflected in most of the operations of the HashMap class),为什么较大的factor会降低get和put的效率,hashmap设计采用重链表法来解决hash,较大的factor值导致了更少的可用索引,自然会导致更多的hash聚集,增加了hash冲突的概率,每个数据节点链表中mappings多的话,查找和删除的成本自然就增大了。

(7) 如果非常多的映射要存,设置一个相对较大的capacity可以减少resize次数,提高效率.如果Key Class实现的hashcode()方法不高明,或者说这个算法容易出现hash聚集现象,HashMap性能势必不高.

(8) HashMap线程不安全的

(9) 迭代器操作采取是的fail-fast的方式,即假如正在进行迭代操作,如果有其他的线程或者当前线程中调用remove()或者put()方法修改了数据,直接跑出异常ConcurrentModificationException,迭代迅速失败

(10) fail-fast并不能保证线程安全,如果对并发安全有要求,请不要使用HashMap

二、LinkedHashMap

public class LinkedHashMap<K,V>
    extends HashMap<K,V>
    implements Map<K,V>

继承于HashMap,LinkedHashMap实现和HashMap实现的不同之处在于,LinkedHashMap维护着一个运行于所有条目的双重链接列表。此链接列表定义了迭代顺序,这个顺序可以使插入元素的顺序,也可以是访问元素的顺序。

private static class Entry<K,V> extends HashMap.Entry<K,V> {
    Entry<K,V> before, after;
    ……
}

2.1 构造器

public LinkedHashMap(int initialCapacity, float loadFactor,boolean accessOrder) {
    super(initialCapacity, loadFactor);
    this.accessOrder = accessOrder;
}

当accessOrder=false时候表示元素的顺序按照插入元素的顺序,true则表示是访问元素的顺序

由于改成了双向链表,因此,重写了方法

void reinitialize() {
        super.reinitialize();
        head = tail = null;
 }

下面是一个小demo:

    //1.插入元素的顺序
        Map<String, String> map = new LinkedHashMap<String, String>(16,0.75f,false);
        //2.访问元素的顺序
        //Map<String, String> map = new LinkedHashMap<String, String>(16,0.75f,true);

        map.put("apple", "苹果");
        map.put("watermelon", "西瓜");
        map.put("banana", "香蕉");
        map.put("peach", "桃子");

        map.get("watermelon");
        map.get("peach");

        Iterator iter = map.entrySet().iterator();
        while (iter.hasNext()) {
            Map.Entry entry = (Map.Entry) iter.next();
            System.out.println(entry.getKey() + "=" + entry.getValue());
        }

2.2 实现

这里根据源码进行简介

LinkedHashMap维护顺序的思路其实非常简单,用和table无关的2个变量head,tail根据是访问顺序还是读取顺序来记录,将以前的数据结构改为双端链表结构即可。由于需要双端链表来维护访问顺序或者插入顺序,需要重写节点的构造以及删除方法,实现如下:

 Node<K,V> newNode(int hash, K key, V value, Node<K,V> e) {
        LinkedHashMap.Entry<K,V> p =
            new LinkedHashMap.Entry<K,V>(hash, key, value, e);
        linkNodeLast(p);
        return p;
    }

    Node<K,V> replacementNode(Node<K,V> p, Node<K,V> next) {
        LinkedHashMap.Entry<K,V> q = (LinkedHashMap.Entry<K,V>)p;
        LinkedHashMap.Entry<K,V> t =
            new LinkedHashMap.Entry<K,V>(q.hash, q.key, q.value, next);
        transferLinks(q, t);
        return t;
    }

    TreeNode<K,V> newTreeNode(int hash, K key, V value, Node<K,V> next) {
        TreeNode<K,V> p = new TreeNode<K,V>(hash, key, value, next);
        linkNodeLast(p);
        return p;
    }

    TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) {
        LinkedHashMap.Entry<K,V> q = (LinkedHashMap.Entry<K,V>)p;
        TreeNode<K,V> t = new TreeNode<K,V>(q.hash, q.key, q.value, next);
        transferLinks(q, t);
        return t;
    }

在jdk1.8中,HashMap为LinkedHashMap专门提供了钩子方法:

 // Callbacks to allow LinkedHashMap post-actions
    void afterNodeAccess(Node<K,V> p) { }
    void afterNodeInsertion(boolean evict) { }
    void afterNodeRemoval(Node<K,V> p) { }

实现如下:

    // 在节点删除之后unlink,即双端都设置为null
    void afterNodeRemoval(Node<K,V> e) { // unlink
        LinkedHashMap.Entry<K,V> p =
            (LinkedHashMap.Entry<K,V>)e, b = p.before, a = p.after;
        p.before = p.after = null;
        if (b == null)
            head = a;
        else
            b.after = a;
        if (a == null)
            tail = b;
        else
            a.before = b;
    }

    //插入节点之后,很有可能要移除元素
    void afterNodeInsertion(boolean evict) { // possibly remove eldest
        LinkedHashMap.Entry<K,V> first;
        if (evict && (first = head) != null && removeEldestEntry(first)) {
            K key = first.key;
            removeNode(hash(key), key, null, false, true);
        }
    }

   // 在元素被访问之后,用于修改链表顺序,将被访问的的元素放置到最后
    void afterNodeAccess(Node<K,V> e) { // move node to last
        LinkedHashMap.Entry<K,V> last;
        if (accessOrder && (last = tail) != e) {
            LinkedHashMap.Entry<K,V> p =
                (LinkedHashMap.Entry<K,V>)e, b = p.before, a = p.after;
            p.after = null;
            if (b == null)
                head = a;
            else
                b.after = a;
            if (a != null)
                a.before = b;
            else
                last = b;
            if (last == null)
                head = p;
            else {
                p.before = last;
                last.after = p;
            }
            tail = p;
            ++modCount;
        }
    }

在读取的时候,有可能会根据访问顺序来重新排序

public V get(Object key) {
        Node<K,V> e;
        if ((e = getNode(hash(key), key)) == null)
            return null;
        if (accessOrder)
            afterNodeAccess(e);
        return e.value;
    }

    /**
     * {@inheritDoc}
     */
    public V getOrDefault(Object key, V defaultValue) {
       Node<K,V> e;
       if ((e = getNode(hash(key), key)) == null)
           return defaultValue;
       if (accessOrder)
           afterNodeAccess(e);
       return e.value;
   }

2.3 LinkedHashMap与LRU算法

LRU 缓存利用了这样的一种思想。LRU 是 Least Recently Used 的缩写,翻译过来就是“最近最少使用”,也就是说,LRU 缓存把最近最少使用的数据移除,让给最新读取的数据。而往往最常读取的,也是读取次数最多的,所以,利用 LRU 缓存,我们能够提高系统的 performance。以下主要用于演示LRU Cache的实现思路,实践中LRU缓存更倾向于使用Guava中的实现。

https://github.com/tracylihui/LRUcache-Java上有LRUCache的简单实现,思路非常简单,重写LinkedHashMap中的钩子方法removeEldestEntry方法即可

package ysz.demo;

import java.util.LinkedHashMap;
import java.util.Collection;
import java.util.Map;
import java.util.ArrayList;

/**
 * An LRU cache, based on <code>LinkedHashMap</code>.
 *
 * <p>
 * This cache has a fixed maximum number of elements (<code>cacheSize</code>).
 * If the cache is full and another entry is added, the LRU (least recently
 * used) entry is dropped.
 *
 * <p>
 * This class is thread-safe. All methods of this class are synchronized.
 *
 * <p>
 * Author: Christian d‘Heureuse, Inventec Informatik AG, Zurich, Switzerland<br>
 * Multi-licensed: EPL / LGPL / GPL / AL / BSD.
 */
public class LRUCache<K, V> {
    private static final float hashTableLoadFactor = 0.75f;
    private LinkedHashMap<K, V> map;
    private int cacheSize;

    /**
     * Creates a new LRU cache. 在该方法中,new LinkedHashMap<K,V>(hashTableCapacity,
     * hashTableLoadFactor, true)中,true代表使用访问顺序
     *
     * @param cacheSize
     *            the maximum number of entries that will be kept in this cache.
     */
    public LRUCache(int cacheSize) {
        this.cacheSize = cacheSize;
        int hashTableCapacity = (int) Math
                .ceil(cacheSize / hashTableLoadFactor) + 1;
        map = new LinkedHashMap<K, V>(hashTableCapacity, hashTableLoadFactor,
                true) {
            // (an anonymous inner class)
            private static final long serialVersionUID = 1;

            @Override
            protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
                return size() > LRUCache.this.cacheSize;
            }
        };
    }

    /**
     * Retrieves an entry from the cache.<br>
     * The retrieved entry becomes the MRU (most recently used) entry.
     *
     * @param key
     *            the key whose associated value is to be returned.
     * @return the value associated to this key, or null if no value with this
     *         key exists in the cache.
     */
    public synchronized V get(K key) {
        return map.get(key);
    }

    /**
     * Adds an entry to this cache. The new entry becomes the MRU (most recently
     * used) entry. If an entry with the specified key already exists in the
     * cache, it is replaced by the new entry. If the cache is full, the LRU
     * (least recently used) entry is removed from the cache.
     *
     * @param key
     *            the key with which the specified value is to be associated.
     * @param value
     *            a value to be associated with the specified key.
     */
    public synchronized void put(K key, V value) {
        map.put(key, value);
    }

    /**
     * Clears the cache.
     */
    public synchronized void clear() {
        map.clear();
    }

    /**
     * Returns the number of used entries in the cache.
     *
     * @return the number of entries currently in the cache.
     */
    public synchronized int usedEntries() {
        return map.size();
    }

    /**
     * Returns a <code>Collection</code> that contains a copy of all cache
     * entries.
     *
     * @return a <code>Collection</code> with a copy of the cache content.
     */
    public synchronized Collection<Map.Entry<K, V>> getAll() {
        return new ArrayList<Map.Entry<K, V>>(map.entrySet());
    }

    // Test routine for the LRUCache class.
    public static void main(String[] args) {
        LRUCache<String, String> c = new LRUCache<String, String>(3);
        c.put("1", "one"); // 1
        c.put("2", "two"); // 2 1
        c.put("3", "three"); // 3 2 1
        c.put("4", "four"); // 4 3 2
        if (c.get("2") == null)
            throw new Error(); // 2 4 3
        c.put("5", "five"); // 5 2 4
        c.put("4", "second four"); // 4 5 2
        // Verify cache content.
        if (c.usedEntries() != 3)
            throw new Error();
        if (!c.get("4").equals("second four"))
            throw new Error();
        if (!c.get("5").equals("five"))
            throw new Error();
        if (!c.get("2").equals("two"))
            throw new Error();
        // List cache content.
        for (Map.Entry<String, String> e : c.getAll())
            System.out.println(e.getKey() + " : " + e.getValue());
    }
}
时间: 2024-10-08 17:19:56

Java常用集合类(1)的相关文章

Java常用集合类详解

在Java中有一套设计优良的接口和类组成了Java集合框架,使程序员操作成批的数据或对象元素极为方便.所有的Java集合都在java.util包中. 在编写程序的过程中,使用到集合类,要根据不同的需求,来决定使用哪种集合类,比如,要经常遍历集合内元素,就要使用List,如果要保证集合中不存在重复的数据,就要用Set;如果要通过某一键来查找某一值,就要使用Map. 1).列表  List接口(继承于Collection接口)及其实现类    List接口及其实现类是容量可变的列表,可按索引访问集合

Java常用集合类

上述类图中,实线边框的是实现类,比如ArrayList,LinkedList,HashMap等,折线边框的是抽象类,比如AbstractCollection,AbstractList,AbstractMap等,而点线边框的是接口,比如Collection,Iterator,List等 所有的类都实现了Iterator接口,这是一个用于遍历结合中元素的接口,主要包含了hasNext(),remove(),next()三个方法,它的一个子接口LinkIterator在它的基础上又添加了三种方法,分别

java常用集合类:Deque,ArrayList,HashMap,HashSet

Queue家族 无论是queue还是stack,现在常用的是Deque的实现类:如单线程的ArrayQueue,多线程的ArrayBlockingQueue Deque:读作“deck”,算是一种“双端队列”,即支持头部和尾部的数据访问和增删.----支持stack和queue的操作 关系:ArrayDeque --> Deque(I) -> Queue(I) -> Collection(I),其中->表示继承,-->表示实现,(I)表示接口. 关系:Stack ->

java常用集合类继承关系

Collection List ArrayList LinkedList Vector Stack Set HashSet LinkedHashSet TreeSet Queue DequeMap HashMap LinkedHashMap Hashtable Properties ConcurrentHashMap TreeMap

【总结】Java常用集合接口与集合类

目录 常见集合接口概述 Collection<E> Map<K,V> Collection接口 Map接口 补充内容 ? 常见集合接口概述 Java中包含许多集合接口.其中比较常见的主要是Collection接口和Map接口: 1.1 Collection<E> 由单元素组成的集合.其比较常见的直接子接口是List.Set和Queue接口. ? ? ? ? 表1.1 Collection<e>接口常用方法 编号 方法原型 解释 备注 1 boolean?ad

常用集合类使用方法

在编程中,我们经常需要考虑数据的储存方法——到底用什么东西去安置这些数据呢? 一般,少的可以用数组,但是数组只能存放规定大小.同一类型的数据:这样的话,不定长的可以试试链表,但是链表检索起来需要从头到尾,一旦数据多了会很慢.如果需要为一个对象存放多个不同类型的数据,可以用结构体,制作成链表.但是在Java中,更推荐用封装类来实现,用一个类来封装这些不同数据类型的数据,然后我们只需储存.管理类对象即可.如果事先知道有多少数据的话,可以用对象数组来储存.然而,实际应用中更多是不知道有多少数据的,那该

java常用代码优化

摘要: 本文大多技术围绕调整磁盘文件 I/O,但是有些内容也同样适合网络 I/O 和窗口输出. 第一部分技术讨论底层的I/O问题,然后讨论诸如压缩,格式化和串行化等高级I/O问题.然而这个讨论没有包含应用设计问题,例如搜索算法和数据结构,也没有讨论系统级的问题,例如文件高速缓冲. Java语言采取两种截然不同的磁盘文件结构.一个是基于字节流,另一个是字符序列.在Java 语言中一个字符有两个字节表示,而不是像通常的语言如c语言那样是一个字节.因此,从一个文件读取字符时需要进行转换.这个不同在某些

Java:集合类的区别详解

Java中集合类的区别 Array是数组,不在集合框架范畴之内,一旦选定了,它的容量大小就不能改变了,所以通常在编程中不选用数组来存放. 集合 : 集合对象:用于管理其他若干对象的对象 数组:长度不可变 List: 有顺序的,元素可以重复 遍历:for .迭代 排序:Comparable Comparator Collections.sort() ArrayList:底层用数组实现的List 特点:查询效率高,增删效率低 轻量级 线程不安全 LinkedList:底层用双向循环链表 实现的Lis

Google的Java常用类库 Guava

Guava 中文是石榴的意思,该项目是 Google 的一个开源项目,包含许多 Google 核心的 Java 常用库. 1. 基本工具 [Basic utilities] 让使用Java语言变得更舒适 1.1 使用和避免null:null是模棱两可的,会引起令人困惑的错误,有些时候它让人很不舒服.很多Guava工具类用快速失败拒绝null值,而不是盲目地接受 1.2 前置条件: 让方法中的条件检查更简单 1.3 常见Object方法: 简化Object方法实现,如hashCode()和toSt