前段时间，实验室需要开发一个用户日志模块，来对实验室的Web项目监控，获取用户的行为日志。个人首先觉得应该主要使用js来实现相关功能，无奈js水平着实太低，最终采用了servlet的方式来实现。

自己先从github上查询到了一个相关项目，clickstream，我先来介绍一下该项目是怎么实现的。

Clickstream的实现

它首先使用了一个Listener来监听ServletContext和HttpSession，代码如下

public class ClickstreamListener implements ServletContextListener, HttpSessionListener {

	private static final Log log = LogFactory.getLog(ClickstreamListener.class);

	/** The servlet context attribute key. */
	public static final String CLICKSTREAMS_ATTRIBUTE_KEY = "clickstreams";

	/**
	 * The click stream (individual) attribute key: this is
	 * the one inserted into the HttpSession.
	 */
	public static final String SESSION_ATTRIBUTE_KEY = "clickstream";

	/** The current clickstreams, keyed by session ID. */
	private Map<String, Clickstream> clickstreams = new ConcurrentHashMap<String, Clickstream>();

	public ClickstreamListener() {
		log.debug("ClickstreamLogger constructed");
	}

	/**
	 * Notification that the ServletContext has been initialized.
	 *
	 * @param sce The context event
	 */
	public void contextInitialized(ServletContextEvent sce) {
		log.debug("ServletContext initialised");
		sce.getServletContext().setAttribute(CLICKSTREAMS_ATTRIBUTE_KEY, clickstreams);
	}

	/**
	 * Notification that the ServletContext has been destroyed.
	 *
	 * @param sce The context event
	 */
	public void contextDestroyed(ServletContextEvent sce) {
		log.debug("ServletContext destroyed");
		// help gc, but should be already clear except when exception was thrown during sessionDestroyed
		clickstreams.clear();
	}

	/**
	 * Notification that a Session has been created.
	 *
	 * @param hse The session event
	 */
	public void sessionCreated(HttpSessionEvent hse) {
		final HttpSession session = hse.getSession();
		if (log.isDebugEnabled()) {
			log.debug("Session " + session.getId() + " was created, adding a new clickstream.");
		}

		Object attrValue = session.getAttribute(SESSION_ATTRIBUTE_KEY);
		if (attrValue != null) {
			log.warn("Session " + session.getId() + " already has an attribute named " +
					SESSION_ATTRIBUTE_KEY + ": " + attrValue);
		}

		final Clickstream clickstream = new Clickstream();
		session.setAttribute(SESSION_ATTRIBUTE_KEY, clickstream);
		clickstreams.put(session.getId(), clickstream);
	}

	/**
	 * Notification that a session has been destroyed.
	 *
	 * @param hse The session event
	 */
	public void sessionDestroyed(HttpSessionEvent hse) {
		final HttpSession session = hse.getSession();

		// check if the session is not null (expired)
		if (session == null) {
	  	  return;
		}

		if (log.isDebugEnabled()) {
			log.debug("Session " + session.getId() + " was destroyed, logging the clickstream and removing it.");
		}

		final Clickstream stream = clickstreams.get(session.getId());
		if (stream == null) {
			log.warn("Session " + session.getId() + " doesn't have a clickstream.");
			return;
		}

		try {
			if (stream.getSession() != null) {
				ClickstreamLoggerFactory.getLogger().log(stream);
			}
		}
		catch (Exception e) {
			log.error(e.getMessage(), e);
		}
		finally {
			clickstreams.remove(session.getId());
		}
	}
}

在这里，读者应该明白session和request之间的区别，一次session可以对应多个request，而多个request可以封装成一个Clickstream。所以使用了

private Map<String, Clickstream> clickstreams = new ConcurrentHashMap<String, Clickstream>();

来存储session和Clickstream之间的映射。

每次创建一个session的时候，就在session里面绑定一个Clickstream。

Clickstream的定义如下：

public class Clickstream implements Serializable {

	private static final long serialVersionUID = 1;

	/** The stream itself: a list of click events. */
	private List<ClickstreamRequest> clickstream = new CopyOnWriteArrayList<ClickstreamRequest>();

	/** The attributes. */
	private Map<String, Object> attributes = new HashMap<String, Object>();

	/** The host name. */
	private String hostname;

	/** The original referer URL, if any. */
	private String initialReferrer;

	/**  The stream start time. */
	private Date start = new Date();

	/** The time of the last request made on this stream. */
	private Date lastRequest = new Date();

	/** Flag indicating this is a bot surfing the site. */
	private boolean bot = false;

	/**
	 * The session itself.
	 *
	 * Marked as transient so that it does not get serialized when the stream is serialized.
	 * See JIRA issue CLK-14 for details.
	 */
	private transient HttpSession session;

	/**
	 * Adds a new request to the stream of clicks. The HttpServletRequest is converted
	 * to a ClickstreamRequest object and added to the clickstream.
	 *
	 * @param request The serlvet request to be added to the clickstream
	 */
	public void addRequest(HttpServletRequest request) {
		lastRequest = new Date();

		if (hostname == null) {
			hostname = request.getRemoteHost();
			session = request.getSession();
		}

		// if this is the first request in the click stream
		if (clickstream.isEmpty()) {
			// setup initial referrer
			if (request.getHeader("REFERER") != null) {
				initialReferrer = request.getHeader("REFERER");
			}
			else {
				initialReferrer = "";
			}

			// decide whether this is a bot
			bot = BotChecker.isBot(request);
		}

		clickstream.add(new ClickstreamRequest(request, lastRequest));
	}

	/**
	 * Gets an attribute for this clickstream.
	 *
	 * @param name
	 */
	public Object getAttribute(String name) {
		return attributes.get(name);
	}

	/**
	 * Gets the attribute names for this clickstream.
	 */
	public Set<String> getAttributeNames() {
		return attributes.keySet();
	}

	/**
	 * Sets an attribute for this clickstream.
	 *
	 * @param name
	 * @param value
	 */
	public void setAttribute(String name, Object value) {
		attributes.put(name, value);
	}

	/**
	 * Returns the host name that this clickstream relates to.
	 *
	 * @return the host name that the user clicked through
	 */
	public String getHostname() {
		return hostname;
	}

	/**
	 * Returns the bot status.
	 *
	 * @return true if the client is bot or spider
	 */
	public boolean isBot() {
		return bot;
	}

	/**
	 * Returns the HttpSession associated with this clickstream.
	 *
	 * @return the HttpSession associated with this clickstream
	 */
	public HttpSession getSession() {
		return session;
	}

	/**
	 * The URL of the initial referer. This is useful for determining
	 * how the user entered the site.
	 *
	 * @return the URL of the initial referer
	 */
	public String getInitialReferrer() {
		return initialReferrer;
	}

	/**
	 * Returns the Date when the clickstream began.
	 *
	 * @return the Date when the clickstream began
	 */
	public Date getStart() {
		return start;
	}

	/**
	 * Returns the last Date that the clickstream was modified.
	 *
	 * @return the last Date that the clickstream was modified
	 */
	public Date getLastRequest() {
		return lastRequest;
	}

	/**
	 * Returns the actual List of ClickstreamRequest objects.
	 *
	 * @return the actual List of ClickstreamRequest objects
	 */
	public List<ClickstreamRequest> getStream() {
		return clickstream;
	}

ClickstreamRequest是对HttpServletRequest的简化封装，定义如下：

public class ClickstreamRequest implements Serializable {

	private static final long serialVersionUID = 1;

	private final String protocol;
	private final String serverName;
	private final int serverPort;
	private final String requestURI;
	private final String queryString;
	private final String remoteUser;
	private final long timestamp;

	public ClickstreamRequest(HttpServletRequest request, Date timestamp) {
		protocol = request.getProtocol();
		serverName = request.getServerName();
		serverPort = request.getServerPort();
		requestURI = request.getRequestURI();
		queryString = request.getQueryString();
		remoteUser = request.getRemoteUser();
		this.timestamp = timestamp.getTime();
	}

	public String getProtocol() {
		return protocol;
	}

	public String getServerName() {
		return serverName;
	}

	public int getServerPort() {
		return serverPort;
	}

	public String getRequestURI() {
		return requestURI;
	}

	public String getQueryString() {
		return queryString;
	}

	public String getRemoteUser() {
		return remoteUser;
	}

	public Date getTimestamp() {
		return new Date(timestamp);
	}

	/**
	 * Returns a string representation of the HTTP request being tracked.
	 * Example: <b>www.opensymphony.com/some/path.jsp?arg1=foo&arg2=bar</b>
	 *
	 * @return a string representation of the HTTP request being tracked.
	 */
	@Override
	public String toString() {
		return serverName + (serverPort != 80 ? ":" + serverPort : "") + requestURI
				+ (queryString != null ? "?" + queryString : "");
	}
}

所以，当每次有请求时，使用Filter对request进行过滤，完善Clickstream的内容

public void doFilter(ServletRequest req, ServletResponse res, FilterChain chain) throws IOException, ServletException {
		// Ensure that filter is only applied once per request.
		if (req.getAttribute(FILTER_APPLIED) == null) {
			log.debug("Applying clickstream filter to request.");

			req.setAttribute(FILTER_APPLIED, true);

			HttpServletRequest request = (HttpServletRequest)req;

			HttpSession session = request.getSession();
			Clickstream stream = (Clickstream) session.getAttribute(ClickstreamListener.SESSION_ATTRIBUTE_KEY);
			stream.addRequest(request);
		}
		else {
			log.debug("Clickstream filter already applied, ignoring it.");
		}

		// pass the request on
		chain.doFilter(req, res);
	}

当session销毁的时候，把Clickstream持久化即可。

改进

1. Clickstram项目，使用ServletContext来存储Map，意味着只能使用一个web容器，

不然无法保证ClickstreamRequest的顺序性，不利于拓展。所以在集群情况下，

比如tomcat集群，可以使用Redis来存储相关的对象。

把Clickstream拆成三部分：

Redis中的List，每个元素对应着一个序列化之后的ClickstreamRequest
字符串；

Redis中的Hash，存储private
Map<String, Object> attributes = new HashMap<String, Object>();

Redis中的Hash，存储hostname,initialReferrer,start,lastRequest,bot,HttpSession
-id等字段

使用集合session.ids 来存储相关的session-id，每个字符串session:{id}对应着一个ClickStream，

然后就可以完成相关的持久化操作。

如何统计与分析

持久化使用了下面的两个表(简化一下)：

session 会话表：

id ip referer is_bot request_count start_time end_time

referer ：入口页面；

is_bot ：是否是搜索引擎；

request_count :该次会话中，请求的次数，如果为1，则表明用户访问了一个页面，就离开了，可以用于计算跳出率。

request 请求表：

id ip referer uri query is_ajax start_time last_time（持续时间） refresh_count sid

refresh_count ：页面的刷新次数；

List类型的clickstream
中，两个相邻的ClickstreamRequest元素A，B，

B的发生时间减去A的发生时间，可以当做A页面的停留时间；

如果A和B是同一个页面，说明用户在刷新页面A，用户计算request_count
；

想要获知在线人数，查看session_ids中的元素个数即可；

其它功能省略……

如果想要获得用户浏览器、操作系统等，可以从User-Agent中获得

由于没使用js，所以无法获知浏览器分辨率等情况。

使用Servlet获取用户日志

Clickstream的实现

改进

如何统计与分析

相关项目

使用Servlet获取用户日志的相关文章

Servlet获取ajax传递的json值

JSP简单练习-用Servlet获取表单数据

servlet获取参数时，request.getParameter("id")参数获取失败

javaEE servlet获取jsp内置对象

JavaWeb学习记录（八）——servlet获取配置信息

Servlet获取用户请求参数并写在指定的图片上

jsp上传文件，form表单提交数据enctype="multipart/form-data"时servlet获取不到type=“text”的数据问题

[02] Servlet获取请求和页面跳转

JAVA，JSP，Servlet获取当前工程路径-绝对路径