Customer segmentation – LifeCycle Grids with R(转)

I want to share a very powerful approach for customer segmentation in this post. It is based on customer’s lifecycle, specifically on frequency and recency of purchases. The idea of using these metrics comes from the RFM analysis. Recency and frequency are very important behavior metrics. We are interested in frequent and recent purchases, because frequency affects client’s lifetime value and recency affects retention. Therefore, these metrics can help us to understand the current phase of the client’s lifecycle. When we know each client’s phase, we can split customer base into groups (segments) in order to:

  • understand the state of affairs,
  • effectively using marketing budget through accurate targeting,
  • use different offers for every group,
  • effectively using email marketing,
  • increase customers’ life-time and value, finally.

For this, we will use a matrix called LifeCycle Grids. We will study how to process initial data (transaction) to the matrix, how to visualize it, and how to do some in-depth analysis. We will do all these steps with the R programming language.

Let’s create a data sample with the following code:


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

# loading libraries

library(dplyr)

library(reshape2)

library(ggplot2)

# creating data sample

set.seed(10)

data <- data.frame(orderId=sample(c(1:1000), 5000, replace=TRUE),

product=sample(c(‘NULL‘,‘a‘,‘b‘,‘c‘), 5000, replace=TRUE,

prob=c(0.15, 0.65, 0.3, 0.15)))

order <- data.frame(orderId=c(1:1000),

clientId=sample(c(1:300), 1000, replace=TRUE))

gender <- data.frame(clientId=c(1:300),

gender=sample(c(‘male‘, ‘female‘), 300, replace=TRUE, prob=c(0.40, 0.60)))

date <- data.frame(orderId=c(1:1000),

orderdate=sample((1:100), 1000, replace=TRUE))

orders <- merge(data, order, by=‘orderId‘)

orders <- merge(orders, gender, by=‘clientId‘)

orders <- merge(orders, date, by=‘orderId‘)

orders <- orders[orders$product!=‘NULL‘, ]

orders$orderdate <- as.Date(orders$orderdate, origin="2012-01-01")

rm(data, date, order, gender)

The head of our data sample looks like:

  orderId clientId product gender orderdate
1   1       254       a    female 2012-04-03
2   1       254       b    female 2012-04-03
3   1       254       c    female 2012-04-03
4   1       254       b    female 2012-04-03
5   2       151       a    female 2012-01-31
6   2       151       b    female 2012-01-31

You can see that there is a gender of customer in the table. We will use it as an example of some in-depth analysis later. I recommend you to use any additional features, that you have, for seeking insights. It can be source of client, channel, campaign, geo data and so on.

A few words about LifeCycle Grids. It is a matrix with 2 dimensions:

  • frequency, which is expressed in number of purchased items or placed orders,
  • recency, which is expressed in days or months since the last purchase.

The first step is to think about suitable grids for your business. It is impossible to work with infinite segments. Therefore, we need to define some boundaries of frequency and recency, which should help us to split customers into homogeneous groups (segments). The analysis of the distribution of the frequency and the recency in our data set combined with the knowledge of business aspects can help us to find suitable boundaries.

Therefore, we need to calculate two values:

  • number of orders that were placed by each client (or in some cases, it can be the number of items),
  • time lapse from the last purchase to the reporting date.

Then, plot the distribution with the following code:


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

# reporting date

today <- as.Date(‘2012-04-11‘, format=‘%Y-%m-%d‘)

# processing data

orders <- dcast(orders, orderId + clientId + gender + orderdate ~ product, value.var=‘product‘, fun.aggregate=length)

orders <- orders %>%

 group_by(clientId) %>%

 mutate(frequency=n(),

 recency=as.numeric(today-orderdate)) %>%

 filter(orderdate==max(orderdate)) %>%

 filter(orderId==max(orderId))

# exploratory analysis

ggplot(orders, aes(x=frequency)) +

 theme_bw() +

 scale_x_continuous(breaks=c(1:10)) +

 geom_bar(alpha=0.6, binwidth=1) +

 ggtitle("Dustribution by frequency")

ggplot(orders, aes(x=recency)) +

 theme_bw() +

 geom_bar(alpha=0.6, binwidth=1) +

 ggtitle("Dustribution by recency")

Early behavior is most important, so finer detail is good there. Usually, there is a significant difference between customers who bought 1 time and those who bought 3 times, but is there any difference between customers who bought 50 times and other who bought 53 times? That is why it makes sense to set boundaries from lower values to higher gaps. We will use the following boundaries:

  • for frequency: 1, 2, 3, 4, 5, >5,
  • for recency: 0-6, 7-13, 14-19,  20-45, 46-80, >80

Next, we need to add segments to each client based on the boundaries. Also, we will create new variable ‘cart’, which includes products from the last cart, for doing in-depth analysis.


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

orders.segm <- orders %>%

 mutate(segm.freq=ifelse(between(frequency, 1, 1), ‘1‘,

 ifelse(between(frequency, 2, 2), ‘2‘,

 ifelse(between(frequency, 3, 3), ‘3‘,

 ifelse(between(frequency, 4, 4), ‘4‘,

 ifelse(between(frequency, 5, 5), ‘5‘, ‘>5‘)))))) %>%

 mutate(segm.rec=ifelse(between(recency, 0, 6), ‘0-6 days‘,

 ifelse(between(recency, 7, 13), ‘7-13 days‘,

 ifelse(between(recency, 14, 19), ‘14-19 days‘,

 ifelse(between(recency, 20, 45), ‘20-45 days‘,

 ifelse(between(recency, 46, 80), ‘46-80 days‘, ‘>80 days‘)))))) %>%

 # creating last cart feature

 mutate(cart=paste(ifelse(a!=0, ‘a‘, ‘‘),

 ifelse(b!=0, ‘b‘, ‘‘),

 ifelse(c!=0, ‘c‘, ‘‘), sep=‘‘)) %>%

 arrange(clientId)

# defining order of boundaries

orders.segm$segm.freq <- factor(orders.segm$segm.freq, levels=c(‘>5‘, ‘5‘, ‘4‘, ‘3‘, ‘2‘, ‘1‘))

orders.segm$segm.rec <- factor(orders.segm$segm.rec, levels=c(‘>80 days‘, ‘46-80 days‘, ‘20-45 days‘, ‘14-19 days‘, ‘7-13 days‘, ‘0-6 days‘))

We have everything need to create LifeCycle Grids. We need to combine clients into segments with the following code:


1

2

3

4

5

lcg <- orders.segm %>%

 group_by(segm.rec, segm.freq) %>%

 summarise(quantity=n()) %>%

 mutate(client=‘client‘) %>%

 ungroup()

The classic matrix can be created with the following code:


1

lcg.matrix <- dcast(lcg, segm.freq ~ segm.rec, value.var=‘quantity‘, fun.aggregate=sum)

However, I suppose a good visualization is obtained through the following code:


1

2

3

4

5

6

7

ggplot(lcg, aes(x=client, y=quantity, fill=quantity)) +

 theme_bw() +

 theme(panel.grid = element_blank())+

 geom_bar(stat=‘identity‘, alpha=0.6) +

 geom_text(aes(y=max(quantity)/2, label=quantity), size=4) +

 facet_grid(segm.freq ~ segm.rec) +

 ggtitle("LifeCycle Grids")

I’ve added colored borders for a better understanding of how to work with this matrix. We have four quadrants:

  • yellow – here are our best customers, who have placed quite a few orders and made their last purchase recently. They have higher value and higher potential to buy again. We have to take care of them.
  • green – here are our new clients, who placed several orders (1-3) recently. Although they have lower value, they have potential to move into the yellow zone. Therefore, we have to help them move into the right quadrant (yellow).
  • red – here are our former best customers. We need to understand why they are former and, maybe, try to reactivate them.
  • blue – here are our onetime-buyers.

Does it make sense to make the same offer to all of these customers? Certainly, it doesn’t! It makes sense to create different approaches not only for each quadrant, but for border grids as well.

What I really like about this model of segmentation is that it is stable and alive simultaneously. It is alive in terms of customers flow. Every day, with or without purchases, it will provide customers flow from one grid to another. And it is stable in terms of working with segments. It allows to work with customers who have the same behavior profile. That means you can create suitable campaigns / offers / emails for each or several close grids and use them constantly.

Ok, it’s time to study how we can do some in-depth analysis. R allows us to create subsegments and visualize them effectively. It can be helpful to distribute each grid via some features. For instance, there can be some dependence between behavior and gender. For the other example, where our products have different lifecycles, it can be helpful to analyze which product/s was/were in the last cart or we can combine these features. Let’s do this with the following code:


1

2

3

4

5

6

7

8

9

10

11

12

lcg.sub <- orders.segm %>%

 group_by(gender, cart, segm.rec, segm.freq) %>%

 summarise(quantity=n()) %>%

 mutate(client=‘client‘) %>%

 ungroup()

ggplot(lcg.sub, aes(x=client, y=quantity, fill=gender)) +

 theme_bw() +

 theme(panel.grid = element_blank())+

 geom_bar(stat=‘identity‘, position=‘fill‘ , alpha=0.6) +

 facet_grid(segm.freq ~ segm.rec) +

 ggtitle("LifeCycle Grids by gender (propotion)")

or even:


1

2

3

4

5

6

ggplot(lcg.sub, aes(x=gender, y=quantity, fill=cart)) +

 theme_bw() +

 theme(panel.grid = element_blank())+

 geom_bar(stat=‘identity‘, position=‘fill‘ , alpha=0.6) +

 facet_grid(segm.freq ~ segm.rec) +

 ggtitle("LifeCycle Grids by gender and last cart (propotion)")

Therefore, there is a lot of space for creativity. If you want to know much more about LifeCycle Grids and strategies for working with quadrants, I highly recommend that you read Jim Novo’s works, e.g. this blogpost.

Thank you for reading this!

转自:http://analyzecore.com/2015/02/16/customer-segmentation-lifecycle-grids-with-r/

时间: 2024-08-29 16:50:27

Customer segmentation – LifeCycle Grids with R(转)的相关文章

Customer segmentation – LifeCycle Grids, CLV and CAC with R(转)

We studied a very powerful approach for customer segmentation in the previous post, which is based on the customer’s lifecycle. We used two metrics: frequency and recency. It is also possible and very helpful to add monetary value to our segmentation

Cohort Analysis and LifeCycle Grids mixed segmentation with R(转)

This is the third post about LifeCycle Grids. You can find the first post about the sense of LifeCycle Grids and A-Z process for creating and visualizing with R programming language here. Lastly, here is the second post about adding monetary metrics

python excel 文件合并

Combining Data From Multiple Excel Files Introduction A common task for python and pandas is to automate the process of aggregating data from multiple files and spreadsheets. This article will walk through the basic flow required to parse multiple Ex

文献 | 2010-2016年被引用次数最多的深度学习论文(修订版)

本来来自 :http://blog.csdn.net/u010402786/article/details/51682917 一.书籍 Deep learning (2015) 作者:Bengio 下载地址:http://www.deeplearningbook.org/ 二.理论 1.在神经网络中提取知识 Distilling the knowledge in a neural network 作者:G. Hinton et al. 2.深度神经网络很易受骗:高信度预测无法识别的图片 Deep

Mybatis的分页查询

示例1:查询业务员的联系记录 1.控制器代码(RelationController.java) //分页列出联系记录 @RequestMapping(value="toPage/customerRecord") public String listRelationRecord(Map map,String beginTime,String endTime, String uname,Long curPage){ Map<String,Object> map1 = new H

生产者--消费者模式

生产者--消费者模式 1.示例: class Resource{ private String name; private int count = 1; private Boolean flag = false; public synchronized void set(String name){ if(flag) try { this.wait();//this代表调用函数线程 } catch (InterruptedException e) {} this.name = name+"__ &

数据挖掘之应用

1.数据挖掘解决的典型商业问题 需要强调的是,数据挖掘技术从一开始就是面向应用的.目前,在很多领域,数据挖掘(data mining)都是一个很时髦的词,尤其是在如银行.电信.保险.交通.零售(如超级市场)等商业领域.数据挖掘所能解决的典型商业问题包括:数据库营销(Database Marketing).客户群体划分(Customer Segmentation & Classification).背景分析(Profile Analysis).交叉销售(Cross-selling)等市场分析行为,

MySLQ排序后标记排行

查询排行及所有(表名.*) 1. set @rownum=0; SELECT @rownum:=@rownum+1 AS top, customer.* FROM customer 2. SELECT @rownum:=@rownum+1 AS top, customer.* FROM (SELECT @rownum:=0) r, customer ORDER BY customer.this_month_cost DESC 1和2的结果: 查询排行及id(表名.id) SELECT @rown

我们需要解决的机器学习问题

From:http://machinelearningmastery.com/practical-machine-learning-problems/ Practical Machine Learning Problems What is Machine Learning? We can read authoritative definitions of machine learning, but really, machine learning is defined by the proble