(转)list 、set 、map 粗浅性能对比分析 / 憋错料

本文分开介绍了List、Map、Set：

（测试环境：win7、jdk、4G、i3；文章示例为了节省篇幅，只会列出测试大体形式和遍历次数）

第一部分：List

1.add(E e)

==============================性能测试=================================

（1）表格统计的差异在于new的方式不同：

for (int i = 0; i < 10000; i++) {
List<Integer> list = new LinkedList<Integer>();
for (int j = 0; j < 10000; j++) {
list.add(j);
}
list = null;
}

new方式	消耗时间（毫秒）
new LinkedList<Integer>()	3420
new ArrayList<Integer>()	3048
new ArrayList<Integer>(10000)	2280
new Vector<Integer>()	4045
new Vector<Integer>(10000)	3859

（2）我们变化一下集合的长度和遍历次数：

[java] view plain copy

for (int i = 0; i < 1000000; i++) {
List<Integer> list = new LinkedList<Integer>();
for (int j = 0; j < 100; j++) {
list.add(j);
}
list = null;
}

new方式	消耗时间（毫秒）
new LinkedList<Integer>()	2286
new ArrayList<Integer>()	2832
new ArrayList<Integer>(10000)	1714
new Vector<Integer>()	3937
new Vector<Integer>(10000)	3184

*************************************************源码分析*************************************************

似乎并没有看到期待的差异，若非“冒充下大牛”纠结一下区别：

1、若集合长度比较小（小于百位），LinkedList比ArrayList的插入性能稍胜一筹。

2、但是当长度达到五位数时，ArrayList的插入性能反而会比LinkedList好一些。

3、若你能预知ArrayList的长度，可以完胜LinkedList。（原因后面会说）

4、Vector的作用不在一个纬度上，不过也没有想象中的那么差。

LinkedList.add:

[java] view plain copy

先调用了自身的add：
public boolean add(E e) {
addBefore(e, header);
return true;
}
然后又调用了addBefore，表示插入在谁之前（有点绕，因为是双项链）：
private Entry<E> addBefore(E e, Entry<E> entry) {
Entry<E> newEntry = new Entry<E>(e, entry, entry.previous);
newEntry.previous.next = newEntry;
newEntry.next.previous = newEntry;
size++;
modCount++;
return newEntry;
}
用于维护指定前后关系：
private static class Entry<E> {
E element;
Entry<E> next;
Entry<E> previous;
Entry(E element, Entry<E> next, Entry<E> previous) {
this.element = element;
this.next = next;
this.previous = previous;
}
}

也就是说LinkedList的上下级维护关系是借助了指针，若任意位置插入，则是先查找到before前的entry，然后重指原有位置前后元素的指针。

ArrayList.add：

首先判断下是否超过了最大长度数组长度，若超过会重新扩展拷贝（这也就说明了，为什么当我们在newArrayList时指定合适的长度会提升性能）

[java] view plain copy

相比LinkedList来说就简单了狠多：
public boolean add(E e) {
ensureCapacity(size + 1); // Increments modCount!!
elementData[size++] = e;
return true;
}
//首先判断下是否超过了最大长度数组长度，若超过会重新扩展拷贝（这也就说明了，为什么当我们在newArrayList时指定合适的长度会提升性能）
public void ensureCapacity(int minCapacity) {
modCount++;
int oldCapacity = elementData.length;
if (minCapacity > oldCapacity) {
Object oldData[] = elementData;
int newCapacity = (oldCapacity * 3)/2 + 1;
if (newCapacity < minCapacity)
newCapacity = minCapacity;
// minCapacity is usually close to size, so this is a win:
elementData = Arrays.copyOf(elementData, newCapacity);
}
}
//默认长度10
public ArrayList() {
this(10);
}

通过上面的源码可以理解出为什么当集合的长度会左右LinkedList和ArrayList的插入性能PK。影响ArrayList性能损耗的一大元凶就是数组的不断增长拷贝，LinkedList由于自身的特性当数据插入时需要借助特殊的内部类“Entry”来维护插入数据的上下级链。

2.add(int index, E element)

通过上面的分析出现了一个想法，如果是

随机插入位置呢？

==============================性能测试=================================

(1)

[java] view plain copy

for (int i = 0; i < 100; i++) {
List<Integer> list = new LinkedList<Integer>();
for (int j = 0; j < 10000; j++) {
list.add(list.size() / 2, j);
}
list = null;
}

new方式	消耗时间
new LinkedList<Integer>()	15782
new ArrayList<Integer>()	3513
new ArrayList<Integer>(10000)	3490

(2)我们变化一下集合的长度和遍历次数：

[java] view plain copy

for (int i = 0; i < 100000; i++) {
List<Integer> list = new LinkedList<Integer>();
for (int j = 0; j < 100; j++) {
list.add(list.size() / 2, j);
}
list = null;
}

new方式	消耗时间
new LinkedList<Integer>()	880
new ArrayList<Integer>()	1262
new ArrayList<Integer>(10000)	2308

*************************************************源码分析*************************************************

从纯插入性能上来说，随机插入LikedList要比ArrayList性能好：（至于为什么当LinkedList集合长度为10000时性能变化会如此之大，我们后面的get会有介绍）

LikedList.add(int index, E element) :

大体分为两个步骤

[java] view plain copy

public void add(int index, E element) {
addBefore(element, (index==size ? header : entry(index)));
}
//第一步获取到addBefore需要的插入位置（entry），第二步如上
private Entry<E> entry(int index) {
if (index < 0 || index >= size)
throw new IndexOutOfBoundsException("Index: "+index+
", Size: "+size);
Entry<E> e = header;
if (index < (size >> 1)) {
for (int i = 0; i <= index; i++)
e = e.next;
} else {
for (int i = size; i > index; i--)
e = e.previous;
}
return e;
}

所以从代码上看，LikedList.add(int index, E element) 没有理由在集合长度为10000时会有这么大的差异，除非entry(int index)存在问题（我们后面可能要说的hash也正是为了避免遍历）。

ArrayList.add(int index, E element) :

大体分为三步

第二步暴漏了为什么现在初始指定了集合长度反而不一定是好事。

[java] view plain copy

public void add(int index, E element) {
if (index > size || index < 0)
throw new IndexOutOfBoundsException(
"Index: "+index+", Size: "+size);
//第一步：扩充长度
ensureCapacity(size+1); // Increments modCount!!
//第二步：挪动数据（这一步决定了为什么现在初始指定了集合长度反而不一定是好事）
System.arraycopy(elementData, index, elementData, index + 1,
size - index);
//第三步：插入
elementData[index] = element;
size++;
}

3.get(int index)

==============================性能测试=================================

[java] view plain copy

List<Integer> list = new LinkedList<Integer>();
for (int j = 0; j < 10000; j++) {
list.add(j);
}
for (int i = 0; i < 100; i++) {
for (int j = 0; j < 10000; j++) {
list.get(j);
}
}

new方式	消耗时间
new LinkedList<Integer>()	8349
new ArrayList<Integer>(10000)	15

*************************************************源码分析*************************************************

通过这个表首先能够知道如果直接或间接（如模板语言）调用for循环遍历，LikedList的性能要差很多。

LinkedList.get(int index)：

[java] view plain copy

public E get(int index) {
return entry(index).element;
}
//性能损耗在这里，每次都需要线性搜索（遍历链定位位置）
private Entry<E> entry(int index) {
if (index < 0 || index >= size)
throw new IndexOutOfBoundsException("Index: "+index+
", Size: "+size);
Entry<E> e = header;
if (index < (size >> 1)) {
for (int i = 0; i <= index; i++)
e = e.next;
} else {
for (int i = size; i > index; i--)
e = e.previous;
}
return e;
}

ArrayList.get(int index)：

[java] view plain copy

public E get(int index) {
//检查是否越界
RangeCheck(index);
//直接从数组中取出对应index的数据
return (E) elementData[index];
}

4.迭代器

==============================性能测试=================================

[java] view plain copy

List<Integer> list = new LinkedList<Integer>();
for (int j = 0; j < 100; j++) {
list.add(j);
}
for (int i = 0; i < 1000000; i++) {
Iterator<Integer> it = list.iterator();
while (it.hasNext()) {
it.next();
}
it = null;
}

new方式	消耗时间
new LinkedList<Integer>()	1335
new ArrayList<Integer>();	3940

*************************************************源码分析*************************************************

LinkedList.iterator():

[java] view plain copy

private class ListItr implements ListIterator<E> {
//header为LikedList的成员变量Entry
private Entry<E> lastReturned = header;
private Entry<E> next;
private int nextIndex;
private int expectedModCount = modCount;
//这里我们会默认传入0，
ListItr(int index) {
if (index < 0 || index > size)
throw new IndexOutOfBoundsException("Index: "+index+
", Size: "+size);
//下面的if和else很有意思，为了实现“所有操作都是按照双重链接列表的需要执行的。在列表中编索引的操作将从开头或结尾遍历列表（从靠近指定索引的一端）。”
//size为外部LikedList的成员变量
if (index < (size >> 1)) {
next = header.next;
for (nextIndex=0; nextIndex<index; nextIndex++)
next = next.next;
} else {
next = header;
for (nextIndex=size; nextIndex>index; nextIndex--)
next = next.previous;
}
}
//判断是否已经超过长度
public boolean hasNext() {
return nextIndex != size;
}
//便宜至下一个元素并返回element
public E next() {
checkForComodification();
if (nextIndex == size)
throw new NoSuchElementException();
lastReturned = next;
next = next.next;
nextIndex++;
return lastReturned.element;
}
...
final void checkForComodification() {
if (modCount != expectedModCount)
throw new ConcurrentModificationException();
}
}

也就是说我们的LinkedList的Iterator性能损耗主要在变换指针，所以对比for遍历方式来说性能提升了很多。

ArrayList.iterator():

[java] view plain copy

public Iterator<E> iterator() {
return new Itr();
}
private class Itr implements Iterator<E> {
/**
* Index of element to be returned by subsequent call to next.
*/
int cursor = 0;
/**
* Index of element returned by most recent call to next or
* previous. Reset to -1 if this element is deleted by a call
* to remove.
*/
int lastRet = -1;
/**
* The modCount value that the iterator believes that the backing
* List should have. If this expectation is violated, the iterator
* has detected concurrent modification.
*/
int expectedModCount = modCount;
public boolean hasNext() {
return cursor != size();
}
public E next() {
checkForComodification();
try {
//主要内容在这里，又绕回了ArrayList的获取方式
E next = get(cursor);
lastRet = cursor++;
return next;
} catch (IndexOutOfBoundsException e) {
checkForComodification();
throw new NoSuchElementException();
}
}
}

ArrayList“很懒”的没有做什么，主要借助了父类AbstractList，获取方式采用了模板方法设计模式（Template Method Pattern）。

总结：无论你遍历ArrayList还是LinkedList都可以尽量采用迭代器。

通过add和get的分析，我们最常用的场景就是单页数据获取，然后利用jstl或velocity等遍历展示：

1、如果能确定遍历采用的for循环建议使用ArrayList并采用带参的构造器指定集合长度（也就是每页的个数）；

2、若遍历采用迭代器建议采用LinkedList，因为我们还会有时对dao获取的数据采用缓存策略。

（jstl用的迭代器，可看源码org.apache.taglibs.standard.tag.common.core.ForEachSupport.toForEachIterator方法）

第二部分：Map

1.put(K key, V value)

==============================性能测试=================================

表格统计的差异在于new的方式不同：

[java] view plain copy

for (int i = 0; i < 100000; i++) {
Map<Integer, String> map = new TreeMap<Integer, String>();
for (int j = 0; j < 100; j++) {
map.put(j, "s" + j);
}
map = null;
}

new方式	运行时间
new TreeMap<Integer, String>()	2945
new HashMap<Integer, String>()	2029
new HashMap<Integer, String>(100)	1845

*************************************************源码分析*************************************************

TreeMap.put(K key, V value)：

首先确认一点TreeMap采用的“红黑树”二叉查找方式。（具体算法请自行google）

该映射根据其键的自然顺序进行排序，或者根据创建映射时提供的 Comparator 进行排序，具体取决于使用的构造方法。此实现为 containsKey、get、put 和 remove 操作提供受保证的 log(n) 时间开销。

插入大体分为四步：

1、判断是否为初次；

2、获取指定Comparator，若没有指定获取key的父类的实现方式。

3、利用Comparator确定数据存放位置；

4、树的旋转。

[java] view plain copy

public V put(K key, V value) {
Entry<K,V> t = root;
//判断是否存在根节点，或者集合是否存在了数据
if (t == null) {
root = new Entry<K,V>(key, value, null);
size = 1;
modCount++;
return null;
}
int cmp;
Entry<K,V> parent;
// split comparator and comparable paths
//这点是获取构造器传入的Comparator
Comparator<? super K> cpr = comparator;
if (cpr != null) {
do {
parent = t;
cmp = cpr.compare(key, t.key);
if (cmp < 0)
t = t.left;
else if (cmp > 0)
t = t.right;
else
return t.setValue(value);
} while (t != null);
}
//若没有指定Comparator，查找父类实现
else {
if (key == null)
throw new NullPointerException();
Comparable<? super K> k = (Comparable<? super K>) key;
//遍历放入确切位置
do {
parent = t;
cmp = k.compareTo(t.key);
if (cmp < 0)
t = t.left;
else if (cmp > 0)
t = t.right;
else
return t.setValue(value);
} while (t != null);
}
Entry<K,V> e = new Entry<K,V>(key, value, parent);
if (cmp < 0)
parent.left = e;
else
parent.right = e;
//树的旋转
fixAfterInsertion(e);
size++;
modCount++;
return null;
}

HashMap.put(K key, V value)：

1、HashMap 的实例有两个参数影响其性能：初始容量和加载因子。容量是哈希表中桶的数量，初始容量只是哈希表在创建时的容量。加载因子是哈希表在其容量自动增加之前可以达到多满的一种尺度。当哈希表中的条目数超出了加载因子与当前容量的乘积时，则要对该哈希表进行 rehash 操作（即重建内部数据结构），从而哈希表将具有大约两倍的桶数；

2、key的hashcode至关重要。

[java] view plain copy

//构造一个带指定初始容量和加载因子的空 HashMap
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
// Find a power of 2 >= initialCapacity
int capacity = 1;
//取大于等于初始容量最近的2的倍数
while (capacity < initialCapacity)
capacity <<= 1;
this.loadFactor = loadFactor;
threshold = (int)(capacity * loadFactor);
table = new Entry[capacity];
init();
}
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
//计算key的hash值
int hash = hash(key.hashCode());
//计算出在第几个“水桶”，计算方式是&与了table.length-1，这应该也是水桶数据是2的倍数的原因
int i = indexFor(hash, table.length);
//通过这步可以理解为什么看到好多文档要求key的hashCode“均匀分布”便于均匀落入各个水桶
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}

2.get

==============================性能测试=================================

（1）表格统计的差异在于new的方式不同：

[java] view plain copy

Map<Integer, String> map = new TreeMap<Integer, String>();
for (int j = 0; j < 100; j++) {
map.put(j, "s" + j);
}
for (int i = 0; i < 100000; i++) {
for (int j = 0; j < 100; j++) {
map.get(j);
}
}

new方式	运行时间
new TreeMap<Integer, String>()	542
new HashMap<Integer, String>(100)	280

（2）变化下集合的大小（有时我们使用Map存放上万条乃至几十万条数据用于小数据本地快速缓存）：

[java] view plain copy

Map<Integer, String> map = new HashMap<Integer, String>(300000);
for (int j = 0; j < 300000; j++) {
map.put(j, "sasd");
}
beginTime = System.currentTimeMillis();
for (int i = 0; i < 100; i++) {
for (int j = 0; j < 300000; j++) {
map.get(j);
}
}

new方式	运行时间
new TreeMap<Integer, String>()	5100
new HashMap<Integer, String>(300000)	1400

（3）当hashcode为极端情况下：

[java] view plain copy

Map<HashKeyTest, String> map = new TreeMap<HashKeyTest, String>();
for (int j = 0; j < 10000; j++) {
map.put(new HashKeyTest(j), "sasd");
}
beginTime = System.currentTimeMillis();
for (int i = 0; i < 10; i++) {
for (int j = 0; j < 10000; j++) {
map.get(new HashKeyTest(j));
}
}
public class HashKeyTest implements Comparable<HashKeyTest> {
private int value;
...
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
HashKeyTest other = (HashKeyTest) obj;
if (value != other.value)
return false;
return true;
}
//模拟hashCode的极端情况，返回相同的值
@Override
public int hashCode() {
return 345345435;
}
//用于TreeMap的二叉树对比排序
@Override
public int compareTo(HashKeyTest o) {
int thisVal = this.value;
int anotherVal = o.getValue();
return (thisVal < anotherVal ? -1 : (thisVal == anotherVal ? 0 : 1));
}
}

new方式	消耗时间
new TreeMap<HashKeyTest, String>()	18
new HashMap<HashKeyTest, String>()	3480

*************************************************源码分析*************************************************

TreeMap.get(Object key):

基于红黑树查找

[java] view plain copy

public V get(Object key) {
Entry<K,V> p = getEntry(key);
return (p==null ? null : p.value);
}
final Entry<K,V> getEntry(Object key) {
// Offload comparator-based version for sake of performance
if (comparator != null)
return getEntryUsingComparator(key);
if (key == null)
throw new NullPointerException();
Comparable<? super K> k = (Comparable<? super K>) key;
Entry<K,V> p = root;
while (p != null) {
int cmp = k.compareTo(p.key);
if (cmp < 0)
p = p.left;
else if (cmp > 0)
p = p.right;
else
return p;
}
return null;
}

HashMap.get(Object key)：

代码中很简明的可以看出，如果hashCode不均为出现的问题

[java] view plain copy

public V get(Object key) {
if (key == null)
return getForNullKey();
int hash = hash(key.hashCode());
//若hashCode相同，会导致查找方式就是这里的不停的遍历
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
return e.value;
}
return null;
}

总结：

1、key如果是自定义对象，一定要有效地重载hashCode()，可参考《覆盖equals时总要覆盖hashCode》；

2、尽量保证hashCode“均为分布”，以便于均匀填充“水桶”；

3、当我们确定Map存放的数据比较多且hashCode分配严重不均匀，切记不要使用HashMap，建议使用TreeMap；

4、正常情况下，也就是我们常把表id作为key时，建议使用HashMap。

Set留给同学自己学习吧。（HashSet其实就是使用的HashMap作为容器）

时间： 2024-10-26 14:16:37

(转)list 、set 、map 粗浅性能对比分析

(转)list 、set 、map 粗浅性能对比分析的相关文章

（转）netty、mina性能对比分析

ArrayList和LinkedList的几种循环遍历方式及性能对比分析

ArrayList和LinkedList的几种循环遍历方式及性能对比分析(转)

【转】ArrayList和LinkedList的几种循环遍历方式及性能对比分析

Intel台式机CPU性能对比分析(不定期更新)

常见CPU及显卡性能对比分析(不定期更新)

浅谈C++之冒泡排序、希尔排序、快速排序、插入排序、堆排序、基数排序性能对比分析(好戏在后面，有图有真相)

冒泡排序、希尔排序、快速排序、插入排序、堆排序、基数排序性能对比分析

AMD台式机CPU性能对比分析(不定期更新)