Activty Stream Engagement Metrics - Part 1
Goal
In this post we’ll be exploring some common existing web engagement metrics as defined in the book “Measuring the Immesurable: Visitor Engagement” [1]
First, we’ll be computing 5 of the metrics described there for individual users:
- Interaction Index (without block and delete)
- Negative Interaction Index (only block and delete)
- Session Duration Index
- Recency Index
- Loyalty Index
Then we’ll define an engaged user as one that returns and interacts with Activty Stream at least once per week for at least the past 3 consecutive weeks.
We’ll then try to build a model that can predict an engaged user given their behaviour (the 5 metrics).
Finally, we’ll talk about what can be done with these results and what further exploration is left for the future.
Engagement Metrics
1. Interaction Index Distribution
The following graph shows a distribution of interaction index for all of our users. In this case, we compute interaction rate for each user as:
NOTES:
- The chart generally trends downward showing fewer very dedicated users with high interaction rates
- We can see a peak (~15% of users) with between 5% to 10% of sessions having interactions
- The peak at 100% is likely from users with only a few sessions overall
2. Negative Interaction Index Distribution
The following graph shows a distribution of negative interaction index for all of our users. In this case, we compute interaction rate for each user as:
NOTES:
- A vast majority of users (~62%) have a 0% to 5% interaction rate with a decaying trend
- It’s probable that users are liking what they’re seeing if they’re clicking more than blocking/deleting
3. Session Duration Index Distribution
The following graph shows a distribution of session duration index for all of our users. First we choose the overall median session duration (5.5 seconds) to be the threshold. Then we compute each user’s session duration index as:
NOTES:
- The shape of the chart (centre peak) is likely due to the choice of median as a threshold
- The peak at 100% is likely once again influenced by users with a small number of sessions. It could also be from users who frequently leave their tabs open and idling in this case.
4. Recency Index Distribution
The following graph shows a distribution of recency index for all of our users. The recency index is computed for a given visit date as follows:
and is then averaged over the total number of days visited.
NOTES:
- The peak at 0% accounts for both users who have a total of 1 visit only and those who have very sparse/distributed visits.
- There’s also a little peak at 50% which means visiting every other day seems to be common
5. Loyalty Index Distribution
The following graph shows a distribution of loyalty index for all of our users. The loyalty index is computed for each user as follows:
NOTES:
- ~54.3% of all users did not open activity stream at all or only opened it once over the past 3 weeks.
- ~30% of all users returned over the past 3 weeks and were very loyal with at least 20 visits total
- Given the nature of the computation, it’s not possible to have a loyalty index greater than 0 and less than 50.
Analyzing Correlations Between Metrics
Now that we’ve seen an overview of user behaviour, we can try to identify some relationships between behaviours to better understand our users. Table 1 below shows the Pearson correlation coefficient [2] for all combinations of our 5 metrics above.
When the coefficient is close to 0 it implies no correlation, when it’s close to one it implies a positive correlation, and close to -1 implies a negative correlation.
Table 1. Correlations Between Metrics
Metric 1 | Metric 2 | User Type | Correlation |
---|---|---|---|
Interaction Index | Session Duration Index | Unengaged | 0.430325647134 |
Interaction Index | Session Duration Index | Engaged | 0.392083132159 |
Interaction Index | Recency Index | Engaged | 0.34084020437 |
Interaction Index | Recency Index | Unengaged | 0.25021778555 |
Loyalty Index | Recency Index | Engaged | 0.24676779249 |
Interaction Index | Negative Interaction Index | Engaged | 0.113349350414 |
Interaction Index | Negative Interaction Index | Unengaged | 0.108766867896 |
Session Duration Index | Negative Interaction Index | Engaged | 0.104265768653 |
Session Duration Index | Negative Interaction Index | Engaged | 0.0876997159773 |
Loyalty Index | Recency Index | Unengaged | 0.0847442251109 |
Recency Index | Session Duration Index | Unengaged | 0.0557225952733 |
Recency Index | Negative Interaction Index | Unengaged | 0.0277618916522 |
Recency Index | Negative Interaction Index | Engaged | 0.0273987012206 |
Recency Index | Session Duration Index | Engaged | -0.0231654117681 |
Loyalty Index | Negative Interaction Index | Engaged | -0.0409611062983 |
Loyalty Index | Negative Interaction Index | Unengaged | -0.0627722736047 |
Loyalty Index | Session Duration Index | Unengaged | -0.182615114018 |
Loyalty Index | Session Duration Index | Engaged | -0.201243275844 |
Interaction Index | Loyalty Index | Engaged | -0.237541403076 |
Interaction Index | Loyalty Index | Unengaged | -0.28686159045 |
Given these correlations, we can make some speculations about the users’ behaviour:
- Users who have higher interaction rates also have higher session durations
- this is expected since it takes time to browse around and click, we can almost call it a sanity check
- High recency index and high interaction index is also correlated
- users who come back frequently are more likely to interact or vice versa
- Users with high loyalty have low interaction and session duration
- this one is more interesting - these users have many short, non-interactive visits - they probably just open a new tab and quickly navigate away
- Engaged users who have high loyalty also have high recency
- Despite the label “recency index”, it’s a bit of a misnomer. Unengaged users may not have returned recently and this isn’t taken into account in the recency index but it is in the loyalty index
Metric Averages
We can also take a look at some metric averages for engaged vs. unengaged in Table 2 to see how differently these users behave.
Table 2. Metric Averages for Engaged vs. Unengaged Users
Engaged | Unengaged | |
---|---|---|
Interaction Index | 36.34 | 19.64 |
Negative Interaction Index | 0.90 | 0.87 |
Session Duration Index | 64.46 | 61.40 |
Loyalty Index | 89.21 | 50.04 |
Recency Index | 73.93 | 33.92 |
As expected, all averages are higher for engaged users. The most drastic being loyalty and recency.
The interaction index average for unengaged users is still fairly high though.
Defining Engaged Users
Now that we’ve looked at various user behaviours that tell us a little bit about engagement, let’s define our ultimate engagement goal as bringing users back and interacting with Activity Stream at least once per week.
We’ll call a user “engaged” if they’ve interacted with Activity Stream at least once per week over the past 3 weeks and we’ll look at the ratio of engaged vs. unengaged over the past 8 weeks in the graph below.
NOTES:
- This is cool! It seems we’ve been keeping a (roughly) steady number of engaged users over the weeks, but our drop in overall users is mostly comning from the engaged users.
Classifying Engaged Users
We’ve defined user behaviours and labels for engaged vs. unengaged. With this we have enough information to design a classifier. But let’s talk about the motivation first.
Motivation for an Engagement Classifier
In theory, if we can predict whether a user will be engaged given their behaviour, we can try to probe certain behaviours for them to become more engaged.
For example, if we find that users who return at least 3 times a week also return every week then we can put more effort into making users interact more times per week, such as having fresh, frequent, relevant recommendations.
On the other hand, if no such relationship exists, the motivation to make users interact more times per week is lower.
So let’s take a look at some basic classifier performance in Table 3:
Table 3. Classifier Performance
Classification Method | Precision | Recall | |
---|---|---|---|
Engaged | Logistic Regression | 0.0935817202736 | 0.996899224806 |
Unengaged | Logistic Regression | 0.999413833529 | 0.353807843951 |
Engaged | SVM | 0.185929648241 | 0.458914728682 |
Unengaged | SVM | 0.959843516281 | 0.865532268105 |
Engaged | Decision Tree | 0.249037227214 | 0.902325581395 |
Unengaged | Decision Tree | 0.992071482507 | 0.817908279726 |
Engaged | Random Forest | 0.274735830932 | 0.886821705426 |
Unengaged | Random Forest | 0.991098646507 | 0.843328491388 |
Cool! Our best performing classifier is, unsurprisingly, the Random Forest. It performs reasonably well given these metrics except for engaged user prediction. This means that while 88% of the engaged validation set is is correctly classified, only 27% of the users classified as engaged were indeed engaged. The rest were unengaged.
This doesn’t sound very good, but generally there is a tradeoff between precision and recall and if precision were more important for us we could take a hit on recall and increase precision.
For the purpose of this trial though, we are just trying to see if it’s even possible to get reasonable classification results, but we haven’t thoroughly desgined a solid classifier yet. And indeed it does seem possible!
Conclusions
There are many observations and conclusions that can be drawn from this analysis. The three most important outcomes are:
-
Getting a better understanding of our user behaviour
-
Seeing that our defintion of engaged users returning for several consecutive weeks is reasonable (though it could be improved).
-
Seeing that there is a strong distinction between engaged and unengaged users
Next Steps
-
Improving our definition of “engaged”. Perhaps one of the reasons engaged precision is low is that our definition for engagement is not quite right. Users might be “engaged” visually which shows up as a longer session duration, high loyalty and high recency but not as much of an interaction rate. The unengaged users also seem to have a fairly high interaction index average (almost 20%!) This idea is worth exploring further.
-
Cluster users into various types by behaviour. As we saw, some users open many new tabs for a short time but don’t interact, some interacted a lot and then stopped, some interact very regularly. Understanding our user types can help us make a better experience for those who seem to lose interest.
-
Improving our classification accuracy. There is a lot that can be done here. Getting or generating more training data, adding more features (e.g. history size, number of bookmarks, user’s OS, etc), tuning hyperparameters, experimenting with various ensemble methods, and so on.