Identify statistical outliers
Statistical outliers are data points that deviate significantly from the typical pattern in your metrics. While sometimes dismissed as anomalies to be ignored, outliers often contain valuable insights about edge cases or emerging issues. For instance, in a SaaS product where most users complete tasks in 2-5 minutes, a few sessions might last 45+ minutes. These are outliers that deserve attention rather than automatic exclusion.
Several approaches exist for identifying outliers. The simplest is visual inspection through scatter plots or box plots that make extreme values immediately apparent. More rigorous methods include statistical calculations like standard deviation (flagging values more than 3 standard deviations from the mean). These statistical approaches provide consistent identification regardless of dataset size.
When you discover outliers, investigate rather than automatically removing them. Consider questions like: Do these users share any characteristics? Could a bug or technical issue explain the extreme values? Is this an early indicator of a broader trend? A great example of this is Tinder, where they found a small percentage of users purchasing a disproportionate amount of one-off features. After researching this outlier cohort, they discovered these were primarily men who traveled frequently (military personnel, salespeople) who wanted to date efficiently while in new locations. Rather than dismissing these outliers, Tinder leveraged this insight to build premium subscription tiers specifically catering to these high-value users.[1]