Efficient Counter in Java

Reference:  http://www.programcreek.com/2013/10/efficient-counter-in-java/

You may often need a counter to understand the frequency of something (e.g., words) from a database or text file. A counter can be easily implemented by using a HashMap in Java. This article compares different approaches to implement a counter. Finally, an efficient one will be concluded.

UPDATE: Check out Java 8 counter, writing a counter is just 2 simple lines now.

1. The Naive Counter

Naively, it can be implemented as the following:

String s = "one two three two three three";String[] sArr = s.split(" "); //naive approach HashMap<String, Integer> counter = new HashMap<String, Integer>(); for (String a : sArr) { if (counter.containsKey(a)) { int oldValue = counter.get(a); counter.put(a, oldValue + 1); } else { counter.put(a, 1); }}

In each loop, you check if the key exists or not. If it does, increment the old value by 1, if not, set it to 1. This approach is simple and straightforward, but it is not the most efficient approach. This method is considered less efficient for the following reasons:

  • containsKey(), get() are called twice when a key already exists. That means searching the map twice.
  • Since Integer is immutable, each loop will create a new one for increment the old value

2. The Better Counter

Naturally we want a mutable integer to avoid creating many Integer objects. A mutable integer class can be defined as follows:

class MutableInteger {  private int val;  public MutableInteger(int val) { this.val = val; }  public int get() { return val; }  public void set(int val) { this.val = val; }  //used to print value convinently public String toString(){ return Integer.toString(val); }}

And the counter is improved and changed to the following:

HashMap<String, MutableInteger> newCounter = new HashMap<String, MutableInteger>();  for (String a : sArr) { if (newCounter.containsKey(a)) { MutableInteger oldValue = newCounter.get(a); oldValue.set(oldValue.get() + 1); } else { newCounter.put(a, new MutableInteger(1)); }}

This seems better because it does not require creating many Integer objects any longer. However, the search is still twice in each loop if a key exists.

3. The Efficient Counter

The HashMap.put(key, value) method returns the key‘s current value. This is useful, because we can use the reference of the old value to update the value without searching one more time!

HashMap<String, MutableInteger> efficientCounter = new HashMap<String, MutableInteger>(); for (String a : sArr) { MutableInteger initValue = new MutableInteger(1); MutableInteger oldValue = efficientCounter.put(a, initValue);  if(oldValue != null){ initValue.set(oldValue.get() + 1); }}

4. Performance Difference

To test the performance of the three different approaches, the following code is used. The performance test is on 1 million times. The raw results are as follows:

Naive Approach : 222796000Better Approach: 117283000Efficient Approach: 96374000

The difference is significant - 223 vs. 117 vs. 96. There is huge difference between Naive and Better, which indicates that creating objects are expensive!

String s = "one two three two three three";String[] sArr = s.split(" "); long startTime = 0;long endTime = 0;long duration = 0; // naive approachstartTime = System.nanoTime();HashMap<String, Integer> counter = new HashMap<String, Integer>(); for (int i = 0; i < 1000000; i++) for (String a : sArr) { if (counter.containsKey(a)) { int oldValue = counter.get(a); counter.put(a, oldValue + 1); } else { counter.put(a, 1); } } endTime = System.nanoTime();duration = endTime - startTime;System.out.println("Naive Approach : " + duration); // better approachstartTime = System.nanoTime();HashMap<String, MutableInteger> newCounter = new HashMap<String, MutableInteger>(); for (int i = 0; i < 1000000; i++) for (String a : sArr) { if (newCounter.containsKey(a)) { MutableInteger oldValue = newCounter.get(a); oldValue.set(oldValue.get() + 1); } else { newCounter.put(a, new MutableInteger(1)); } } endTime = System.nanoTime();duration = endTime - startTime;System.out.println("Better Approach: " + duration); // efficient approachstartTime = System.nanoTime(); HashMap<String, MutableInteger> efficientCounter = new HashMap<String, MutableInteger>(); for (int i = 0; i < 1000000; i++) for (String a : sArr) { MutableInteger initValue = new MutableInteger(1); MutableInteger oldValue = efficientCounter.put(a, initValue);  if (oldValue != null) { initValue.set(oldValue.get() + 1); } } endTime = System.nanoTime();duration = endTime - startTime;System.out.println("Efficient Approach: " + duration);

When you use a counter, you probably also need a function to sort the map by value. You can check out the frequently used method of HashMap.

5. Solutions from Keith

Added a couple tests:
1)
Refactored "better approach" to just call get instead of containsKey. Usually,
the elements you want are in the HashMap so that reduces from two searches to
one.
2)
Added a test with AtomicInteger, which michal mentioned.
3)
Compared to singleton int array, which uses less memory according to http://amzn.com/0748614079

I
ran the test program 3x and took the min to remove variance from other programs.
Note that you can‘t do this within the program or the results are affected too
much, probably due to gc.

Naive: 201716122Better Approach: 112259166Efficient
Approach: 93066471Better Approach (without containsKey): 69578496Better Approach
(without containsKey, with AtomicInteger): 94313287Better Approach (without
containsKey, with int[]): 65877234

Better
Approach (without containsKey):

HashMap<String,
MutableInteger>
efficientCounter2 =
new
HashMap<String,
MutableInteger>();for
(int
i =
0;
i <
NUM_ITERATIONS;
i++)
{
for
(String
a :
sArr)
{
MutableInteger value =
efficientCounter2.get(a); 
if
(value
!=
null)
{
value.set(value.get()
+
1);
}
else
{
efficientCounter2.put(a,
new
MutableInteger(1));
}
}}

Better
Approach (without containsKey, with AtomicInteger):

HashMap<String,
AtomicInteger>
atomicCounter =
new
HashMap<String,
AtomicInteger>();for
(int
i =
0;
i <
NUM_ITERATIONS;
i++)
{
for
(String
a :
sArr)
{
AtomicInteger value =
atomicCounter.get(a); 
if
(value
!=
null)
{
value.incrementAndGet();
}
else
{
atomicCounter.put(a,
new
AtomicInteger(1));
}
}}

Better
Approach (without containsKey, with int[]):


HashMap<String,
int[]>
intCounter =
new
HashMap<String,
int[]>();

for
(int
i =
0;
i <
NUM_ITERATIONS;
i++)

{
for
(String
a :
sArr)
{
int[]
valueWrapper =
intCounter.get(a); 
if
(valueWrapper
==
null)
{
intCounter.put(a,
new
int[]
{
1
});
}
else
{
valueWrapper[0]++;
}
}}

Guava‘s
MultiSet is probably faster still.

6.
Conclusion

The
winner is the last one which uses int arrays.

时间: 2024-10-11 04:19:54

Efficient Counter in Java的相关文章

How to Check if an Array Contains a Value in Java Efficiently?

How to check if an array (unsorted) contains a certain value? This is a very useful and frequently used operation in Java. It is also a top voted question on Stack Overflow. As shown in top voted answers, this can be done in several different ways, b

Spring中基于Java的配置@Configuration和@Bean用法 (转)

spring中为了减少xml中配置,可以生命一个配置类(例如SpringConfig)来对bean进行配置. 一.首先,需要xml中进行少量的配置来启动Java配置: [java] view plain copy print? <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmln

java设计模式--观察者模式和事件监听器模式

文章转载于:http://www.java2000.net/p9452 复习设计模式,看到observer观察者模式,说法是该模式和iterator迭代器模式类似已经被整合进jdk,但是jdk提供了两种接口: 一.java.util.Observer -- 观察者接口 对应: java.util.Observable --受查者根类 二.java.util.EventListener -- 事件监听/处理接口 对应: java.util.EventObject -- 事件(状态)对象根类 研究了

Java concurrency (multi-threading) - Tutorial

Java concurrency (multi-threading) This article describes how to do concurrent programming with Java. It covers the concepts of parallel programming, immutability, threads, the executor framework (thread pools), futures, callables and the fork-join f

传智Java面向对象知识测试

共40道选择题,每题2.5分.多选题有错则全错,全对才满分. 面向对象部分测试题 下面描述函数重写错误的是 C A.  要有子类继承或实现 B.  子类方法的权限必须大于等于父类的权限 C.  父类中被private权限修饰的方法可以被子类重写 D.  子类重写接口中的抽象方法,子类的方法权限必须是public的 关于封装下面介绍错误的是().D A.  封装将变化隔离 B.  封装提高重用性. C.  封装安全性 D.  只有被private修饰才叫做封装 试图编译运行下面的代码会发生什么情况

狄慧201771010104《面向对象程序设计(java)》第十八周学习总结

实验十八  总复习 实验时间 2018-12-30 1.实验目的与要求 (1) 综合掌握java基本程序结构: (2) 综合掌握java面向对象程序设计特点: (3) 综合掌握java GUI 程序设计结构: (4) 综合掌握java多线程编程模型: (5) 综合编程练习. 2.实验内容和步骤 任务1:填写课程课后调查问卷,网址:https://www.wjx.cn/jq/33108969.aspx. 任务2:综合编程练习 练习1:设计一个用户信息采集程序,要求如下: (1) 用户信息输入界面如

东文财201771010106《面向对象程序设计(java)》.18

1.实验目的与要求 (1) 综合掌握java基本程序结构: (2) 综合掌握java面向对象程序设计特点: (3) 综合掌握java GUI 程序设计结构: (4) 综合掌握java多线程编程模型: (5) 综合编程练习. 2.实验内容和步骤 任务1:填写课程课后调查问卷,网址:https://www.wjx.cn/jq/33108969.aspx. 任务2:综合编程练习 练习1:设计一个用户信息采集程序,要求如下: (1) 用户信息输入界面如下图所示: (1)用户点击提交按钮时,用户输入信息显

java内部类之js闭包

前言: 今天写了一个关于Java内部的博客,在内部类的最后一点中谈到了Java闭包的概念,他是这样定义闭包的:闭包是一个可调用的对象,它记录了一些信息,这些信息来自创建它的作用域.结合Java的内部类可以很好的理解这一点(如有需要可参考https://www.cnblogs.com/jinliang374003909/p/10351877.html).突然之间想到js中的闭包,一直都无法很好的理解,故借此又看了一下js中的闭包,对我个人而言,感悟良多,借此也与大家分享一下,希望可以帮助大家,并一

九大内置对象

JSP有九个内置对象(又叫隐含对象),不需要预先声明就可以在脚本代码和表达式中随意使用 JSP九大内置对象分为四类: 输入输出对象:out对象.response对象.request对象 通信控制对象:pageContext对象.session对象.application对象 Servlet对象:page对象.config对象 错误处理对象:exception对象 九种对象简介: out对象:用于向客户端.浏览器输出数据. request对象:封装了来自客户端.浏览器的各种信息. response