Statistical significance
Statistical significance tells you whether your test results are real or just random chance. Think of it like flipping a coin. Getting heads three times in a row doesn't mean the coin is rigged. Similarly, version B performing better for a day doesn't mean it's actually superior. You need enough data to be confident.
Most teams use 95% confidence level, meaning there's only a 5% chance the results are due to random variation. P-value measures this confidence. When p-value drops below 0.05, you've reached statistical significance. But this isn't a magic number. Sometimes 90% confidence makes sense for low-risk changes. Critical features might need 99% confidence.
Reaching significance too quickly often means something's wrong. If your test shows significant results after 100 users, check your setup. Real differences usually take hundreds or thousands of users to detect.