Experiments on the NYC dataset(updated 3rd,Aug)

Experiments on the NYC datasets,

here is the dataset link: https://sites.google.com/site/yangdingqi/home/foursquare-dataset

Forgive me being lazy and uploading a manuscript photo about the preprocessing of the data:

The codes are available on the github, here is the link:
Binary Tests

Take into each user‘s check in time

And This is the result I run the code on cluster:

unique user&venue checkin combination in test 18205
unique user&venue checkin combination in test 72819
max num in matrix 1.0
max num in train 1.0
I am beginning to model
model has been fitted
this is the binary model
Time used: 4.789567
Train_auc is 0.999504
Test_aus is 0.654491
/home/s2013258/.local/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)

unique user&venue checkin combination in test 18205
unique user&venue checkin combination in test 72819
max num in matrix 257
max num in train 205
I am beginning to model
model has been fitted
this is the model that consider the checkin times
Time used: 4.782983
Train_auc is 0.999508
Test_aus is 0.655189
/home/s2013258/.local/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)

As for the hybrid model, I have nort tried it yet, TBC.....

时间: 2024-08-28 13:48:00

Experiments on the NYC dataset(updated 3rd,Aug)的相关文章

CF with friends and user's influence considered on NYC data(updated Aug,11st)

Here is the code link: https://github.com/FassyGit/LightFM_liu/blob/master/U_F1.py I use NYC data as other experimens. The split of the training data was seperated by the timeline, and I have normalised the interaction matrix by replacing the checkin

new lightfm model with different MAX_SAMPLED(updated 29th,Aug)

I thought the low train AUC was due to the underfitting, but after some experiments I found that it is not as thought. The low train AUC was caused by the difference between the negative examples we used for training and the ones for evaluating AUC.

new lightfm model with different radius(updated 29th,Aug)

some results running on the linux laptop with the new model: [email protected]5548:~/code$ python3 newlfmodelradius.py /home/liu/.local/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in vers

What are the advantages of different classification algorithms?

What are the advantages of different classification algorithms? For instance, if we have large training data set with approx more than 10000 instances and more than 100000 features ,then which classifier will be best to choose for classification Want

corosync+pacemaker+mysql+drbd 实现mysql的高可用

corosync corosync的由来是源于一个Openais的项目,是Openais的一个子 项目,可以实现HA心跳信息传输的功能,是众多实现HA集群软件中之一,heartbeat与corosync是流行的Messaging Layer (集群信息层)工具.而corosync是一个新兴的软件,相比Heartbeat这款很老很成熟的软件,corosync与Heartbeat各有优势,博主就不在这里比较之间的优势了,corosync相对于Heartbeat只能说现在比较流行. pacemaker

MySQL+DRBD+Corosync+Pacemaker CentOS6.5版

一.DRBD部分配置 1.安装环境说明 node1       192.168.110.141 node2       192.168.110.142 Node1: # sed -i '[email protected]\(HOSTNAME=\).*@\[email protected]' /etc/sysconfig/network # hostname node1.pancou.com # vim /etc/hosts 192.168.110.141 node1.pancou.com nod

Corosync+Pacemaker+DRBD+MySQL 实现高可用(HA)的MySQL集群

大纲一.前言二.环境准备三.Corosync 安装与配置四.Pacemaker 安装与配置五.DRBD 安装与配置六.MySQL 安装与配置七.crmsh 资源管理 推荐阅读: Linux 高可用(HA)集群基本概念详解 http://www.linuxidc.com/Linux/2013-08/88522.htm Linux 高可用(HA)集群之Heartbeat详解 http://www.linuxidc.com/Linux/2013-08/88521.htm 一.前言      前几篇博文

corosync(pacemaker)+drbd+web(apache)

环境:     vm1-hong:172.16.3.2/16     vm2-ning:172.16.3.10/16     VIP:172.16.3.100/16 一.drbd安装: 案例:配置主从primary/secondary的drbd设备(主从节点在高可用集群中,中从节点切换比较慢) 前提:     1.两节点之间必须时间同步.基于主机名能相互通信     2.准备的磁盘设备必须是同样大小的     3.系统架构得一样    包: drbd-8.4.3-33.el6.x86_64.rp

High Availability手册(3): 配置

各种配置在命令行状态下,多用crm进行 Global Cluster Options 这个类型是全局配置,主要包含下面两个: no-quorum-policy quorum的意思是最低法定人数,pacemaker能够继续工作所需要的最少的active的node的个数,这个数是(num of nodes)/2 + 1 如果不能达到法定人数的时候行为如何呢? ignore表示继续运行,如果是两个Node的cluster,只要有一个挂了,就小于最小法定数目了,所有要设为ignore freeze表示已