使用Servlet获取用户日志

前段时间,实验室需要开发一个用户日志模块,来对实验室的Web项目监控,获取用户的行为日志。个人首先觉得应该主要使用js来实现相关功能,无奈js水平着实太低,最终采用了servlet的方式来实现。

自己先从github上查询到了一个相关项目,clickstream,我先来介绍一下该项目是怎么实现的。

Clickstream的实现

它首先使用了一个Listener来监听ServletContext和HttpSession,代码如下

public class ClickstreamListener implements ServletContextListener, HttpSessionListener {

	private static final Log log = LogFactory.getLog(ClickstreamListener.class);

	/** The servlet context attribute key. */
	public static final String CLICKSTREAMS_ATTRIBUTE_KEY = "clickstreams";

	/**
	 * The click stream (individual) attribute key: this is
	 * the one inserted into the HttpSession.
	 */
	public static final String SESSION_ATTRIBUTE_KEY = "clickstream";

	/** The current clickstreams, keyed by session ID. */
	private Map<String, Clickstream> clickstreams = new ConcurrentHashMap<String, Clickstream>();

	public ClickstreamListener() {
		log.debug("ClickstreamLogger constructed");
	}

	/**
	 * Notification that the ServletContext has been initialized.
	 *
	 * @param sce The context event
	 */
	public void contextInitialized(ServletContextEvent sce) {
		log.debug("ServletContext initialised");
		sce.getServletContext().setAttribute(CLICKSTREAMS_ATTRIBUTE_KEY, clickstreams);
	}

	/**
	 * Notification that the ServletContext has been destroyed.
	 *
	 * @param sce The context event
	 */
	public void contextDestroyed(ServletContextEvent sce) {
		log.debug("ServletContext destroyed");
		// help gc, but should be already clear except when exception was thrown during sessionDestroyed
		clickstreams.clear();
	}

	/**
	 * Notification that a Session has been created.
	 *
	 * @param hse The session event
	 */
	public void sessionCreated(HttpSessionEvent hse) {
		final HttpSession session = hse.getSession();
		if (log.isDebugEnabled()) {
			log.debug("Session " + session.getId() + " was created, adding a new clickstream.");
		}

		Object attrValue = session.getAttribute(SESSION_ATTRIBUTE_KEY);
		if (attrValue != null) {
			log.warn("Session " + session.getId() + " already has an attribute named " +
					SESSION_ATTRIBUTE_KEY + ": " + attrValue);
		}

		final Clickstream clickstream = new Clickstream();
		session.setAttribute(SESSION_ATTRIBUTE_KEY, clickstream);
		clickstreams.put(session.getId(), clickstream);
	}

	/**
	 * Notification that a session has been destroyed.
	 *
	 * @param hse The session event
	 */
	public void sessionDestroyed(HttpSessionEvent hse) {
		final HttpSession session = hse.getSession();

		// check if the session is not null (expired)
		if (session == null) {
	  	  return;
		}

		if (log.isDebugEnabled()) {
			log.debug("Session " + session.getId() + " was destroyed, logging the clickstream and removing it.");
		}

		final Clickstream stream = clickstreams.get(session.getId());
		if (stream == null) {
			log.warn("Session " + session.getId() + " doesn't have a clickstream.");
			return;
		}

		try {
			if (stream.getSession() != null) {
				ClickstreamLoggerFactory.getLogger().log(stream);
			}
		}
		catch (Exception e) {
			log.error(e.getMessage(), e);
		}
		finally {
			clickstreams.remove(session.getId());
		}
	}
}

在这里,读者应该明白session和request之间的区别,一次session可以对应多个request,而多个request可以封装成一个Clickstream。所以使用了

private Map<String, Clickstream> clickstreams = new ConcurrentHashMap<String, Clickstream>();

来存储session和Clickstream之间的映射。

每次创建一个session的时候,就在session里面绑定一个Clickstream。

Clickstream的定义如下:

public class Clickstream implements Serializable {

	private static final long serialVersionUID = 1;

	/** The stream itself: a list of click events. */
	private List<ClickstreamRequest> clickstream = new CopyOnWriteArrayList<ClickstreamRequest>();

	/** The attributes. */
	private Map<String, Object> attributes = new HashMap<String, Object>();

	/** The host name. */
	private String hostname;

	/** The original referer URL, if any. */
	private String initialReferrer;

	/**  The stream start time. */
	private Date start = new Date();

	/** The time of the last request made on this stream. */
	private Date lastRequest = new Date();

	/** Flag indicating this is a bot surfing the site. */
	private boolean bot = false;

	/**
	 * The session itself.
	 *
	 * Marked as transient so that it does not get serialized when the stream is serialized.
	 * See JIRA issue CLK-14 for details.
	 */
	private transient HttpSession session;

	/**
	 * Adds a new request to the stream of clicks. The HttpServletRequest is converted
	 * to a ClickstreamRequest object and added to the clickstream.
	 *
	 * @param request The serlvet request to be added to the clickstream
	 */
	public void addRequest(HttpServletRequest request) {
		lastRequest = new Date();

		if (hostname == null) {
			hostname = request.getRemoteHost();
			session = request.getSession();
		}

		// if this is the first request in the click stream
		if (clickstream.isEmpty()) {
			// setup initial referrer
			if (request.getHeader("REFERER") != null) {
				initialReferrer = request.getHeader("REFERER");
			}
			else {
				initialReferrer = "";
			}

			// decide whether this is a bot
			bot = BotChecker.isBot(request);
		}

		clickstream.add(new ClickstreamRequest(request, lastRequest));
	}

	/**
	 * Gets an attribute for this clickstream.
	 *
	 * @param name
	 */
	public Object getAttribute(String name) {
		return attributes.get(name);
	}

	/**
	 * Gets the attribute names for this clickstream.
	 */
	public Set<String> getAttributeNames() {
		return attributes.keySet();
	}

	/**
	 * Sets an attribute for this clickstream.
	 *
	 * @param name
	 * @param value
	 */
	public void setAttribute(String name, Object value) {
		attributes.put(name, value);
	}

	/**
	 * Returns the host name that this clickstream relates to.
	 *
	 * @return the host name that the user clicked through
	 */
	public String getHostname() {
		return hostname;
	}

	/**
	 * Returns the bot status.
	 *
	 * @return true if the client is bot or spider
	 */
	public boolean isBot() {
		return bot;
	}

	/**
	 * Returns the HttpSession associated with this clickstream.
	 *
	 * @return the HttpSession associated with this clickstream
	 */
	public HttpSession getSession() {
		return session;
	}

	/**
	 * The URL of the initial referer. This is useful for determining
	 * how the user entered the site.
	 *
	 * @return the URL of the initial referer
	 */
	public String getInitialReferrer() {
		return initialReferrer;
	}

	/**
	 * Returns the Date when the clickstream began.
	 *
	 * @return the Date when the clickstream began
	 */
	public Date getStart() {
		return start;
	}

	/**
	 * Returns the last Date that the clickstream was modified.
	 *
	 * @return the last Date that the clickstream was modified
	 */
	public Date getLastRequest() {
		return lastRequest;
	}

	/**
	 * Returns the actual List of ClickstreamRequest objects.
	 *
	 * @return the actual List of ClickstreamRequest objects
	 */
	public List<ClickstreamRequest> getStream() {
		return clickstream;
	}

ClickstreamRequest是对HttpServletRequest的简化封装,定义如下:

public class ClickstreamRequest implements Serializable {

	private static final long serialVersionUID = 1;

	private final String protocol;
	private final String serverName;
	private final int serverPort;
	private final String requestURI;
	private final String queryString;
	private final String remoteUser;
	private final long timestamp;

	public ClickstreamRequest(HttpServletRequest request, Date timestamp) {
		protocol = request.getProtocol();
		serverName = request.getServerName();
		serverPort = request.getServerPort();
		requestURI = request.getRequestURI();
		queryString = request.getQueryString();
		remoteUser = request.getRemoteUser();
		this.timestamp = timestamp.getTime();
	}

	public String getProtocol() {
		return protocol;
	}

	public String getServerName() {
		return serverName;
	}

	public int getServerPort() {
		return serverPort;
	}

	public String getRequestURI() {
		return requestURI;
	}

	public String getQueryString() {
		return queryString;
	}

	public String getRemoteUser() {
		return remoteUser;
	}

	public Date getTimestamp() {
		return new Date(timestamp);
	}

	/**
	 * Returns a string representation of the HTTP request being tracked.
	 * Example: <b>www.opensymphony.com/some/path.jsp?arg1=foo&arg2=bar</b>
	 *
	 * @return a string representation of the HTTP request being tracked.
	 */
	@Override
	public String toString() {
		return serverName + (serverPort != 80 ? ":" + serverPort : "") + requestURI
				+ (queryString != null ? "?" + queryString : "");
	}
}

所以,当每次有请求时,使用Filter对request进行过滤,完善Clickstream的内容

public void doFilter(ServletRequest req, ServletResponse res, FilterChain chain) throws IOException, ServletException {
		// Ensure that filter is only applied once per request.
		if (req.getAttribute(FILTER_APPLIED) == null) {
			log.debug("Applying clickstream filter to request.");

			req.setAttribute(FILTER_APPLIED, true);

			HttpServletRequest request = (HttpServletRequest)req;

			HttpSession session = request.getSession();
			Clickstream stream = (Clickstream) session.getAttribute(ClickstreamListener.SESSION_ATTRIBUTE_KEY);
			stream.addRequest(request);
		}
		else {
			log.debug("Clickstream filter already applied, ignoring it.");
		}

		// pass the request on
		chain.doFilter(req, res);
	}

当session销毁的时候,把Clickstream持久化即可。

改进

1. Clickstram项目,使用ServletContext来存储Map,意味着只能使用一个web容器,

不然无法保证ClickstreamRequest的顺序性,不利于拓展。所以在集群情况下,

比如tomcat集群,可以使用Redis来存储相关的对象。

把Clickstream拆成三部分:

Redis中的List, 每个元素对应着一个序列化之后的ClickstreamRequest
字符串;

Redis中的Hash,存储private
Map<String, Object> attributes = new HashMap<String, Object>();

Redis中的Hash,存储hostname,initialReferrer,start,lastRequest,bot,HttpSession
-id等字段

使用集合session.ids 来存储相关的session-id,每个字符串session:{id}对应着一个ClickStream,

然后就可以完成相关的持久化操作。

如何统计与分析

持久化使用了下面的两个表(简化一下):

session 会话表:

id   ip referer  is_bot   request_count  start_time   end_time

referer  :入口页面;

is_bot   :是否是搜索引擎;

request_count  :该次会话中,请求的次数,如果为1,则表明用户访问了一个页面,就离开了,可以用于计算跳出率。

request 请求表:

id  ip  referer  uri  query is_ajax  start_time   last_time(持续时间) refresh_count   sid

refresh_count   :页面的刷新次数;

List类型的clickstream
中,两个相邻的ClickstreamRequest元素A,B,

B的发生时间减去A的发生时间,可以当做A页面的停留时间;

如果A和B是同一个页面,说明用户在刷新页面A,用户计算request_count
 ;

想要获知在线人数,查看session_ids中的元素个数即可;

其它功能省略……

如果想要获得用户浏览器、操作系统等,可以从User-Agent中获得

由于没使用js,所以无法获知浏览器分辨率等情况。

相关项目

Google Analytic

使用js,据说为了保护用户隐私,不提供用户的浏览顺序。

Piwik

使用js,后端使用php。

时间: 2024-12-09 11:46:06

使用Servlet获取用户日志的相关文章

Servlet获取ajax传递的json值

Servlet获取ajax传递的json值 其实标题可直接写为“记一件愚蠢的事”.另外声明这是只是一篇水文. 原本都用SpringMVC和ajax进行前后台的交互,今天打算试试用原始的Servlet与其进行交互. 起初是打算实现一个跳转(虽然感觉没什么意义): Action如下: package per.zww.ajax.action; import java.io.IOException; import javax.servlet.ServletException; import javax.

JSP简单练习-用Servlet获取表单数据

// javaBean代码 package servlet; import java.io.*; import javax.servlet.*; import javax.servlet.http.*; public class AcceptUserRegist extends HttpServlet { public String codeToString(String str) // 处理中文字符串的函数 { String s=str; try { byte tempB[]=s.getByt

servlet获取参数时,request.getParameter(&quot;id&quot;)参数获取失败

servlet获取参数时,request.getParameter("id")参数获取失败,这里的参数是“index”里面href中的参数 要注意,取不到值,是不是要取的参数有没有提交 servlet: public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { try { String idStr = request

javaEE servlet获取jsp内置对象

既然jsp和servlet是等价的,在jsp中可以使用内置对象,那么在servlet中也可以使用. 1.获得out对象 可以使用如下代码获得out对象: import java.io.PrintWriter; ... public void doGet(HttpServletRequest request,HttpServletResponse response)throws ServletException,IOException{ PrintWriter out = reponse.getW

JavaWeb学习记录(八)——servlet获取配置信息

jdbc.properties内容如下: jdbcUrl=jdbc\:mysql\://localhost\:3306/animaluser=rootpass=root servlet获取资源信息代码如下public class ResourceServlet extends HttpServlet { public void doGet(HttpServletRequest request, HttpServletResponse response)            throws Ser

Servlet获取用户请求参数并写在指定的图片上

package com.hacker; import java.awt.Color; import java.awt.Font; import java.awt.Graphics2D; import java.awt.image.BufferedImage; import java.io.BufferedReader; import java.io.ByteArrayOutputStream; import java.io.File; import java.io.FileOutputStrea

jsp上传文件,form表单提交数据enctype="multipart/form-data"时servlet获取不到type=“text”的数据问题

在上传文件用了commons fileupload组件的情况下,servlet接收的数据只能是type=file表单元素类型,那么获取type=text类型,就可以使用parseRequest(request)来获取list,fileitem,判断isFormField,为true非file类型的.就可以处理了.下面是处理的部分代码: DiskFileItemFactory factory = new DiskFileItemFactory(); factory.setSizeThreshold

[02] Servlet获取请求和页面跳转

1.Tomcat和Servlet的关系 之前提到过,Servlet是运行在Web容器里的,Tomcat作为容器的一种,在这里自然也要大概说说两者之间的大致关系. 首先,如上所述,Tomcat是Web应用服务器,是一个Servlet/JSP容器.它负责处理客户请求,把请求传送给Servlet,并将Servlet的响应传回给客户.而Servlet是运行在支持Java语言的服务器上的组件. 从HTTP协议中的请求和响应就可以得知,浏览器发出的请求是一个请求文本,而浏览器接收到的也应该是一个响应文本.而

JAVA,JSP,Servlet获取当前工程路径-绝对路径

在jsp和class文件中调用的相对路径不同. 在jsp里,根目录是WebRoot 在class文件中,根目录是WebRoot/WEB-INF/classes 当然你也可以用System.getProperty("user.dir")获取你工程的绝对路径. 另:在Jsp,Servlet,Java中详细获得路径的方法! 1.jsp中取得路径: 以工程名为TEST为例: (1)得到包含工程名的当前页面全路径:request.getRequestURI()结果:/TEST/test.jsp(