Customer segmentation – LifeCycle Grids, CLV and CAC with R(转)

We studied a very powerful approach for customer segmentation in the previous post, which is based on the customer’s lifecycle. We used two metrics: frequency and recency. It is also possible and very helpful to add monetary value to our segmentation. If you havecustomer acquisition cost (CAC) and customer lifetime value (CLV), you can easily add these data to the calculations.

We will create the same data sample as in the previous post, but with two added data frames:

  • cac, our expenses for each customer acquisition,
  • gr.margin, gross margin of each product.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

# loading libraries

library(dplyr)

library(reshape2)

library(ggplot2)

# creating data sample

set.seed(10)

data <- data.frame(orderId=sample(c(1:1000), 5000, replace=TRUE),

product=sample(c(‘NULL‘,‘a‘,‘b‘,‘c‘), 5000, replace=TRUE,

prob=c(0.15, 0.65, 0.3, 0.15)))

order <- data.frame(orderId=c(1:1000),

clientId=sample(c(1:300), 1000, replace=TRUE))

gender <- data.frame(clientId=c(1:300),

gender=sample(c(‘male‘, ‘female‘), 300, replace=TRUE, prob=c(0.40, 0.60)))

date <- data.frame(orderId=c(1:1000),

orderdate=sample((1:100), 1000, replace=TRUE))

orders <- merge(data, order, by=‘orderId‘)

orders <- merge(orders, gender, by=‘clientId‘)

orders <- merge(orders, date, by=‘orderId‘)

orders <- orders[orders$product!=‘NULL‘, ]

orders$orderdate <- as.Date(orders$orderdate, origin="2012-01-01")

# creating data frames with CAC and Gross margin

cac <- data.frame(clientId=unique(orders$clientId), cac=sample(c(10:15), 289, replace=TRUE))

gr.margin <- data.frame(product=c(‘a‘, ‘b‘, ‘c‘), grossmarg=c(1, 2, 3))

rm(data, date, order, gender)

Next, we will calculate CLV to date (actual amount that we earned) using gross margin values and orders of the products. We will use the following code:


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

# reporting date

today <- as.Date(‘2012-04-11‘, format=‘%Y-%m-%d‘)

# calculating customer lifetime value

orders <- merge(orders, gr.margin, by=‘product‘)

clv <- orders %>%

group_by(clientId) %>%

summarise(clv=sum(grossmarg))

# processing data

orders <- dcast(orders, orderId + clientId + gender + orderdate ~ product, value.var=‘product‘, fun.aggregate=length)

orders <- orders %>%

group_by(clientId) %>%

mutate(frequency=n(),

recency=as.numeric(today-orderdate)) %>%

filter(orderdate==max(orderdate)) %>%

filter(orderId==max(orderId))

orders.segm <- orders %>%

mutate(segm.freq=ifelse(between(frequency, 1, 1), ‘1‘,

ifelse(between(frequency, 2, 2), ‘2‘,

ifelse(between(frequency, 3, 3), ‘3‘,

ifelse(between(frequency, 4, 4), ‘4‘,

ifelse(between(frequency, 5, 5), ‘5‘, ‘>5‘)))))) %>%

mutate(segm.rec=ifelse(between(recency, 0, 6), ‘0-6 days‘,

ifelse(between(recency, 7, 13), ‘7-13 days‘,

ifelse(between(recency, 14, 19), ‘14-19 days‘,

ifelse(between(recency, 20, 45), ‘20-45 days‘,

ifelse(between(recency, 46, 80), ‘46-80 days‘, ‘>80 days‘)))))) %>%

# creating last cart feature

mutate(cart=paste(ifelse(a!=0, ‘a‘, ‘‘),

ifelse(b!=0, ‘b‘, ‘‘),

ifelse(c!=0, ‘c‘, ‘‘), sep=‘‘)) %>%

arrange(clientId)

# defining order of boundaries

orders.segm$segm.freq <- factor(orders.segm$segm.freq, levels=c(‘>5‘, ‘5‘, ‘4‘, ‘3‘, ‘2‘, ‘1‘))

orders.segm$segm.rec <- factor(orders.segm$segm.rec, levels=c(‘>80 days‘, ‘46-80 days‘, ‘20-45 days‘, ‘14-19 days‘, ‘7-13 days‘, ‘0-6 days‘))

Note: if you prefer to use potential/expected/predicted CLV or total CLV (sum of CLV to date and potential CLV) you can adapt this code or find the example in the next post.

In addition, we need to merge orders.segm with the CAC and CLV data, and combine the data with the segments. We will calculate total CAC and CLV to date, as well as their average with the following code:


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

orders.segm <- merge(orders.segm, cac, by=‘clientId‘)

orders.segm <- merge(orders.segm, clv, by=‘clientId‘)

lcg.clv <- orders.segm %>%

group_by(segm.rec, segm.freq) %>%

summarise(quantity=n(),

# calculating cumulative CAC and CLV

cac=sum(cac),

clv=sum(clv)) %>%

ungroup() %>%

# calculating CAC and CLV per client

mutate(cac1=round(cac/quantity, 2),

clv1=round(clv/quantity, 2))

lcg.clv <- melt(lcg.clv, id.vars=c(‘segm.rec‘, ‘segm.freq‘, ‘quantity‘))

Ok, let’s plot two charts: the first one representing the totals and the second one representing the averages:


1

2

3

4

5

6

7

8

9

10

11

12

13

14

ggplot(lcg.clv[lcg.clv$variable %in% c(‘clv‘, ‘cac‘), ], aes(x=variable, y=value, fill=variable)) +

theme_bw() +

theme(panel.grid = element_blank())+

geom_bar(stat=‘identity‘, alpha=0.6, aes(width=quantity/max(quantity))) +

geom_text(aes(y=value, label=value), size=4) +

facet_grid(segm.freq ~ segm.rec) +

ggtitle("LifeCycle Grids - CLV vs CAC (total)")

ggplot(lcg.clv[lcg.clv$variable %in% c(‘clv1‘, ‘cac1‘), ], aes(x=variable, y=value, fill=variable)) +

theme_bw() +

theme(panel.grid = element_blank())+

geom_bar(stat=‘identity‘, alpha=0.6, aes(width=quantity/max(quantity))) +

geom_text(aes(y=value, label=value), size=4) +

facet_grid(segm.freq ~ segm.rec) +

ggtitle("LifeCycle Grids - CLV vs CAC (average)")

You can find in the grid that the width of bars depends on the number of customers. I think these visualizations are very helpful. You can see the difference between CLV to dateand CAC and make decisions about on paid campaigns or initiatives like:

  • does it make sense to spend extra money to reactivate some customers (e.g. those who are in the “1 order / >80 days“ grid or those who are in the “>5 orders / 20-45 days“ grid)?,
  • how much money is appropriate to spend?,
  • and so on.

Therefore, we have got a very interesting visualization. We can analyze and make decisions based on the three customer lifecycle metrics: recency, frequency andmonetary value.

Thank you for reading this!

转自:http://analyzecore.com/2015/02/19/customer-segmentation-lifecycle-grids-clv-and-cac-with-r/

时间: 2024-11-02 11:36:01

Customer segmentation – LifeCycle Grids, CLV and CAC with R(转)的相关文章

Customer segmentation – LifeCycle Grids with R(转)

I want to share a very powerful approach for customer segmentation in this post. It is based on customer’s lifecycle, specifically on frequency and recency of purchases. The idea of using these metrics comes from the RFM analysis. Recency and frequen

Cohort Analysis and LifeCycle Grids mixed segmentation with R(转)

This is the third post about LifeCycle Grids. You can find the first post about the sense of LifeCycle Grids and A-Z process for creating and visualizing with R programming language here. Lastly, here is the second post about adding monetary metrics

Explain about What is User Exits and Customer Exits?

In computer software, a user exit is a place in a software program where a customer can arrange for their own tailor-made program to be called. In the R/3 system from SAP, a user exit is contrasted with a customer exit and allows a customer's develop

C. 【lxs Contest #140】Cac

[题意] [题解] 仙人掌,考虑圆方树. 正常构建圆方树并在每两个树边之间加入方点. 考虑使用每个方点维护与他相连的圆点信息,发现每个方点只能维护他的儿子圆点信息,否则会算重. 题目中修改操作即将两个圆点在新树上的路径上方点都加上v,即表示路径上所有环上的圆点都加v. #include<bits/stdc++.h> #define int long long using namespace std; const int N=6e5+5; const int mo=998244353; int

数据挖掘之应用

1.数据挖掘解决的典型商业问题 需要强调的是,数据挖掘技术从一开始就是面向应用的.目前,在很多领域,数据挖掘(data mining)都是一个很时髦的词,尤其是在如银行.电信.保险.交通.零售(如超级市场)等商业领域.数据挖掘所能解决的典型商业问题包括:数据库营销(Database Marketing).客户群体划分(Customer Segmentation & Classification).背景分析(Profile Analysis).交叉销售(Cross-selling)等市场分析行为,

硅谷投资人秘籍:16个指标避免掉坑

[转自] 网易科技报道 [导语] 以下英文内容来自硅谷风投A16Z的文章<16 Startup Metrics>.介绍如何衡量一家初创公司的真实情况.网易创业Club推荐此文,保留了英文全文,并加上原创中文解说,不是翻译哟.为了便于理解,把中文内容放在相应英文段落的前面.最后,跳过全部英文也不影响阅读的完整性,但在本文中不建议跳过. 本文原文由 A16Z多位投资人集体创作.A16Z是世界知名的顶级风投公司,Skype, Facebook, Instagram, Twitter, Foursqu

python excel 文件合并

Combining Data From Multiple Excel Files Introduction A common task for python and pandas is to automate the process of aggregating data from multiple files and spreadsheets. This article will walk through the basic flow required to parse multiple Ex

我们需要解决的机器学习问题

From:http://machinelearningmastery.com/practical-machine-learning-problems/ Practical Machine Learning Problems What is Machine Learning? We can read authoritative definitions of machine learning, but really, machine learning is defined by the proble

SaS学习资源收集

目前手里的电子书(2018-08-15 00:24) SAS Programming ISAS Programming IISAS Programming IIIdon't be a SAS dinosaur Modernize Your SAS ProgramsSAS_Certification_Prep_Guide_-_Base_Programming_for_SAS_9,_4th_EditioneThe Little SAS Book A Primer, Fifth Edition 5th