Customer Segmentation Using RFM and Tableau

One of the fun parts of consulting is getting to work with people who are willing to explain new concepts. XCentium has a relationship with Dr Craig Miller who works in the Simulation and Modeling arena. I recently got a chance to work with him to use a technique called RFM to demonstrate customer segmentation to one of our clients. This is how we used Tableau for this purpose.

RFM:

RFM is a well-known technique that characterizes a customer’s loyalty, likelihood of customer retention and future profitability. It uses business transactions to do this, such as individual customer purchases. Each customer’s purchases are evaluated in the following ways: Recency is how recently the latest transaction occurred. Frequency is the number of transactions in the period being evaluated. Monetary Value is the value of all purchases in the period. Recency, Frequency and Monetary Value are combined to make a single RFM score.

For each of these measures, customers are conceptually sorted and divided into groups. We used 5 groups, resulting in scores from 1 to 5. These scores are then aggregated into a single rating by some means. We used a weighted sum of scores.

Requirement complications:

One complicating factor was that, rather than dividing the customer list into groups with 1/5 of the customers, we used the mean and standard deviation to identify the quintile into which each customer would fall in a normalized distribution.

Given each customer’s score needed calculation of mean and standard deviation, we might have used SQL to calculate the RFM. In our case, this couldn’t work. The requirements were for individual salespeople and regions to be able to see the RFM score calculated for only their particular customers. A quick look at the data showed RFM scores were not evenly spread across regions.

Our client had many transactions per customer per day. A majority of their customers had a transaction in any given day. Consequently the requirements included de-emphasizing the importance of the Recency measure.

Tableau:

Rather than make one large, monolithic calculation, we built the calculation successively from a number of calculated measures and a table calculation. This approach had the advantage of being understandable, maintainable and more easily tested/debugged.

RValue: was a trivial calculation, used here to document the transition from the client domain into RFM calculation. For Recency it was seconds since 1/1/2000. The other values were number of transactions, and the total sales in the time period.

Z of R: was the Z Score for the measure. That is how many Standard Deviations the customer’s value is from the mean. This required a table calculation because it changed, depending on how we filtered or grouped. This was:

(MAX[RValue] - WINDOW_AVG(MAX([RValue]))) / WINDOW_STDEVP(MAX([RValue])).

RValue was the only measure aggregated using MAX. The other measures are summed. Also, if you try this, it is important to make sure it, or dependent calculated fields, are computed using the relevant dimension(s) for a given visualization.

RScore: reflected the quintile (in our case) in which RValue would lie in a normalized distribution. This was a calculated measure:

IF([ZofR]< -0.84) THEN 1

ELSEIF([ZofR]< -0.25) THEN 2

ELSEIF([ZofR]< 0.25) THEN 3

ELSEIF([ZofR]< .84 ) THEN 4

ELSE 5

END

Those 0.84 and 0.25 values are the fractions of Standard Deviation which delimit the quintiles in a normal distribution. So, for example, 20% of the members of a population with a normal distribution have a Z Score less than Mean – (0.84 * the Standard Deviation).

RWeighting: was a Tableau Parameter used to weight the R Score to control its relative contribution to the final RFM score. As mentioned we de-emphasized R because most recencies were 1 day or less. There were parameter controls to our dashboards to allow users to experiment with the weighting.

Weighted R: was the contribution of the R score to the composite RFM value. The calculation was (RWeighting * RScore).

RFMScore: was the sum of Weighted R, Weighted F and Weighted M. This wasthe primary measure we used for visualization.

Result:

Having a single measure combining several measures of customer value in a meaningful manner allowed us to make visualizations that were powerful, yet visually straightforward. For example, we could map the composite measure in a single map, which I felt was faster to grasp visually than using the 3 component measures individually.

It was helpful to have separate weighted measures. For some visualizations, we showed the contribution of the weighted R, F, and M to the composite score. This answered some follow-up questions in our visualizations.

It was also valuable to be able to alter the weightings of each score. Experimenting with this showed the composite score did not vary in a surprising way as the weightings were changed, within reason.