[paper]Real-time recommendation for microblogs / 憋错料

1.Related work

1.1.recommendation strategies

　　1.　　two types of techniques:

　　　　　　(1)the link-based approach

　　　　　　　（C.C. Aggarwal, J.L. Wolf, K.L. Wu, P.S. Yu, Horting hatches an egg: a new graph-theoretic approach to collaborative filtering, in: KDD, 1999, pp. 201–212.）

　　　　　　　（X. Song, B.L. Tseng, C.Y. Lin, M.T. Sun, Personalized recommendation driven by information flow, in: SIGIR, 2006, pp. 509–516）

　　　　　　(2)the content-based approach:

　　　　　　　（M. Balabanovic´ , Y. Shoham, Fab: content-based, collaborative recommendation, Commun. ACM 40 (1997) 66–72.）

　　　　　　　（A.I. Schein, A. Popescul, L.H. Ungar, D.M. Pennock, Methods and metrics for cold-start recommendations, in: SIGIR, 2002, pp. 253–260.）

　　　　　　　（I. Guy, N. Zwerdling, I. Ronen, D. Carmel, E. Uziel, Social media recommendation based on people and tags, in: SIGIR, 2010, pp. 194–201.）

　　　　　　　（D.R. Liu, P.Y. Tsai, P.H. Chiu, Personalized recommendation of popular blog articles for mobile applications, in: Information Sciences, 2011, pp. 1552–1572..）

　　 2.　On one side ,in order to recommend blogs to a user u, the collaborative filtering approach finds users having similar taste with u and then recommend blogs that are most-liked by these users.

　　　　　　(Y. Koren, Factorization meets the neighborhood: a multifaceted collaborative filtering model, in: KDD, 2008, pp. 426–434.)

　　　　　　(Y. Koren, Factor in the neighbors: scalable and accurate collaborative filtering, in: TKDD, 2010.)

　　　　　　(X. Su, T.M. Khoshgoftaar, A survey of collaborative filtering techniques, in: Advances in Artificial Intelligence, 2009.)

　　　　On the other hand, content-based approach recommends blogs that are similar to user’s selection in the past.

　　　　　　(O. Phelan, K. McCarthy, B. Smyth, Using Twitter to recommend real-time topical news, in: RecSys, 2009, p. 385–388.)

　　　　　　(O. Phelan, K. McCarthy, M. Bennett, B. Smyth, On using the real-time web for news recommendation & discovery, in: WWW (Companion Volume),2011a, pp. 103–104.)

　　3.　　some hybrid approaches [15,4,16,22] combine these two techniques to improve the quality of recommendations

　　　　　　(N. Good, J.B. Schafer, J.A. Konstan, A. Borchers, B.M. Sarwar, J.L. Herlocker, J. Riedl, Combining collaborative filtering with personal agents for better recommendations, in: AAAI/IAAI, 1999, p. 439–446.)

　　　　　　(S. Amer-Yahia, J. Huang, C. Yu, Building community-centric information exploration applications on social content sites, in: SIGMOD Conference, 2009,pp. 947–952.)

　　　　　　(Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, X. He, Document recommendation in social tagging services, in: WWW, 2010, pp. 391–400.)

　　　　　　(Y. Koren, Factor in the neighbors: scalable and accurate collaborative filtering, in: TKDD, 2010.)

　　4.　　the trade-off between accuracy and privacy in social link analysis is also studied

　　　　　　(A. Machanavajjhala, A. Korolova, A.D. Sarma, Personalized social recommendations: accurate or private, Proc. VLDB Endow. (2011).)

　　5.　　a comparative study on different recommendation strategies in social system

　　　　　　(A. Bellogín, I. Cantador, P. Castells, A comparative study of heterogeneous item recommendations in social systems, in: Information Sciences, 2013, pp.142–169.)

　　6.　　a survey of these systems (G. Adomavicius, A. Tuzhilin, Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions, in: KDE,2005, pp. 734–749.)

2.2. Recommendations on microblogging system

　　1.　　exploit the microblogging system to provide various kinds of recommendation services

　　　　　　the URL recommendations

　　　　　　　　 (J. Chen, R. Nairn, L. Nelson, M.S. Bernstein, E.H. Chi, Short and tweet: experiments on recommending content from information streams, in: CHI, 2010,pp. 1185–1194.)

　　　　　　utilizes the content of tweet to score user’s RSS feed (O. Phelan, K. McCarthy, M. Bennett, B. Smyth, Terms of a feather: content-based news recommendation and discovery using Twitter, in: ECIR, 2011b,pp. 448–459.)

　　　　　　proposes a mention recommendation strategy to expand the diffusion of tweets

　　　　　　　　(B. Wang, C. Wang, J. Bu, C. Chen, W.V. Zhang, D. Cai, X. He, Whom to mention: expand the diffusion of tweets by @ recommendation on micro-blogging systems, in: WWW, 2013, pp. 1331–1340)

　　　　　　based on existing following relationship

　　　　　　　　 (J. Chen, W. Geyer, C. Dugan, M.J. Muller, I. Guy, Make new friends, but keep the old: recommending people on social networking sites, in: CHI, 2009, pp.201–210.)

　　　　　　based on users relationships and tweets they published

　　　　　　　　 (J. Hannon, M. Bennett, B. Smyth, Recommending Twitter users to follow using content and collaborative filtering approaches, in: RecSys, 2010, pp.199–206.)

　　　　　　bursty keywords are grouped together for discovering discussion trends

　　　　　　　　(M. Mathioudakis, N. Koudas, Twitter monitor: trend detection over the Twitter stream, in: SIGMOD Conference, 2010, pp. 1155–1158.)

　　2.　　To recommend tweets to users, various factors are integrated in the score function

　　　　　　integrate the influence of implicit social expert

　　　　　　　　(C. Lin, R. Xie, X. Guan, L. Li, T. Li, Personalized news recommendation via implicit social experts, in: Information Sciences, 2014, pp. 1–18.)

　　　　　　builds score for each tweet from view of information diffusion such that emergency news will get a higher score

　　　　　　　　(A.R. Sun, J. Cheng, D.D. Zeng, A novel recommendation framework for micro-blogging based on information diffusion, in: Proceedings of the 16th workshop on information technologies and systems, 2009.)

　　　　　　combining the hashtags,topics and entities to estimate the importance of a specific microblog to one user

　　　　　　　　(F. Abel, Q. Gao, G.J. Houben, K. Tao, Analyzing user modeling on Twitter for personalized news recommendations, in: UMAP, 2011, pp. 1–12.)

　　　　　　extract weighted tags for users as their interest vector

　　　　　　　　(S. Sen, J. Vig, J. Riedl, Tagommenders: connecting users to items through tags, in: WWW, 2009, pp. 671–680.)

　　　　　　based on collaborative ranking, by conceiving a score function integrating the tweet topic, user social network with other explicit features

　　　　　　　　(K. Chen, T. Chen, G. Zheng, O. Jin, E. Yao, Y. Yu, Y. Yu, Collaborative personalized tweet recommendation, in: SIGIR, 2012, p. 661–670.)

　　　　　　dynamically adapts users profiles as time goes

　　　　　　　　(L. Marin, D. Isern, A. Moreno, A. Valls, On-line dynamic adaptation of fuzzy preferences, in: Information Sciences, 2013, pp. 5–21.)

　　　　　　build a tweets similarity network and user following network and user-tweet network to integrate the popularity and diversification of tweets and author ranking

　　　　　　　　(R. Yan, M. Lapata, X. Li, Tweet recommendation with graph co-ranking, in: ACL (1), 2012, pp. 516–525.)

　　　　　　recommends a set of k tweets within a time periods where the set has maximal overall interestingness

　　　　　　　　(M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini, Making your interests follow you on Twitter, in: CIKM, 2012, pp. 165–174.)

　　　　　　utilizes a set of components to rank tweets independently and merges these lists with a competition algorithm

　　　　　　　　(S.J. Yu, The dynamic competitive recommendation algorithm in social network services, in: Information Sciences, 2012, pp. 1–14.)

　　　They compute the similarities,weights or scores between every user and microblog before the final answer set if given

2.Overview

　1.　A push-based recommendation service:

　2.function f (u,t); to denote the relevance between u and t

　　　　 (for each tweet)

　　　　 (n tags in the system)

　　3. the content relevance function is a summation of the similarity between the tweet tm and tweets published by ui, which can be converted into the inner product between tm and the term frequency vector of ui’s historical tweets.

　　　　(K. Chen, T. Chen, G. Zheng, O. Jin, E. Yao, Y. Yu, Y. Yu, Collaborative personalized tweet recommendation, in: SIGIR, 2012, p. 661–670.)

　　4.system architecture

　　　　consists of two parts: the tag buffer and the user buffer.

　　　　The tag buffer has two components: the keyword-tag index and the tag-user index.

3.Personal tags

　　3.1 tag retrieval

　　　　1.the term-frequency inverse-user-frequency weighting scheme(TF-IDF)

　　　　　　(J. Chen, R. Nairn, L. Nelson, M.S. Bernstein, E.H. Chi, Short and tweet: experiments on recommending content from information streams, in: CHI, 2010,pp. 1185–1194.)

　　　　2, Therefore, other tag retrieval methods can be easily integrated with our recommendation strategy, and other tags, such as geotags, can be added to the tag vector to adjust the recommendation result

　　　　3. For users with extremely few tags, global frequent tags are picked as their additional tags, so that those users can still receive recommendations in our system.

　　3.2 weights of tags

　　　　1. On the other hand, in the social network of microblogging systems, users are connected via the following/followed links.

　　　　3.The iteration algorithm

　　　　4. This iteration algorithm is similar to PageRank　

　　　　　(S. Brin, L. Page, The anatomy of a large-scale hypertextual web search engine, in: Computer Networks and ISDN Systems, 1998, pp. 107–117.)

　　　　5.As the above approach is expensive, tag weights are recomputed offline only when a large number of new users join the network.

　　　　　Before the next computation, a user u i will use his/her initial weight vector rather than the aggregated one.

4.Recommendation algorithm

　4.1 naive approach

　　　　For user u i and its tag weight vector V i , the probability that a tweet will be sent to u i in the naive approach is:

　　4.2 Approximate pruning scheme

　　　　key: To get the top-K results, the probabilities must satisfy the following condition:

　　　　1.critical tags

　　　　　　If one tweet hits only one tag, the tags to be registered in tag-user index S initial is generated as:

　　　　　　multiple tags may be linked to the same tweet:

　　　　　　　　the definition of Critical Tag Set:Any tweet containing all tags in a critical tag set is considered as the candidate of the top-K recommendations.

　　　　　　2.Algorithm 2 illustrates the basic idea to retrieve the minimal critical tag sets.

　　　　3 tag buffer

　　　　Compared to the naive approach, our APS approach effectively decreases the computational cost, as the size of tag buffer is significantly reduced.

　　　　4 probability estimation

　the probabilities change significantly in the subsequent time intervals. Therefore, instead of using the historical probabilities directly, we apply the probability ranges.