Correlation Coefficient Calculator
Compute the Pearson correlation coefficient (r) from any pair of numeric lists. Returns r, r², covariance, the line of best fit, and a step-by-step solution with every intermediate sum.
Enter your data
Separate numbers with commas, spaces, or new lines. The two lists must contain matched pairs - the i-th X value is paired with the i-th Y value.
Step-by-Step Solution
Every intermediate sum is computed from the 10 pairs you entered. Expand any step to see the substitution.
Step 1Tabulate the paired values
| i | xᵢ | yᵢ | xᵢ·yᵢ | xᵢ² | yᵢ² |
|---|---|---|---|---|---|
| 1 | 1 | 52 | 52.00 | 1.00 | 2704.00 |
| 2 | 2 | 60 | 120.00 | 4.00 | 3600.00 |
| 3 | 3 | 65 | 195.00 | 9.00 | 4225.00 |
| 4 | 4 | 70 | 280.00 | 16.00 | 4900.00 |
| 5 | 5 | 73 | 365.00 | 25.00 | 5329.00 |
| 6 | 6 | 78 | 468.00 | 36.00 | 6084.00 |
| 7 | 7 | 82 | 574.00 | 49.00 | 6724.00 |
| 8 | 8 | 85 | 680.00 | 64.00 | 7225.00 |
| 9 | 9 | 90 | 810.00 | 81.00 | 8100.00 |
| 10 | 10 | 94 | 940.00 | 100.00 | 8836.00 |
| Σ | 55.00 | 749.00 | 4484.00 | 385.00 | 57727.00 |
Step 2Compute the running sums
- n = 10
- Σx = 55.0000
- Σy = 749.0000
- Σxy = 4484.0000
- Σx² = 385.0000
- Σy² = 57727.0000
Step 3Write the formula
Step 4Substitute the sums into the formula
Step 5Simplify the numerator and denominator
Step 6Interpret the result
The data show a very strong positive linear relationship. The coefficient of determination = 0.9899 means that approximately 99.0% of the variation in Y can be explained by a straight-line model on X. The remaining 1.0% comes from other factors, measurement error, or non-linear structure that r cannot capture.
The Pearson Correlation Formula
Pearson's r can be written several equivalent ways. The calculator above uses the computational form, which avoids computing deviations from the mean and so reduces rounding error.
Computational form (used by this calculator)
Definitional form (deviations from the mean)
This version makes the geometric meaning clearer: numerator is the sample covariance times (n − 1), denominator is the product of the X and Y standard deviations times (n − 1). The (n − 1) terms cancel.
Covariance form
Where and are the sample standard deviations of X and Y. This is the cleanest way to remember why r is dimensionless and bounded by ±1.
How to Interpret the Correlation Coefficient
r condenses an entire scatter plot into a single number, so the interpretation has two pieces: the sign tells you the direction, and the absolute value tells you the strength.
| |r| | Strength | Typical interpretation |
|---|---|---|
| 0.00 – 0.19 | Very weak / none | No meaningful linear pattern in the cloud. |
| 0.20 – 0.39 | Weak | Slight trend, swamped by noise. |
| 0.40 – 0.69 | Moderate | Clear trend visible in the scatter; meaningful in most applied fields. |
| 0.70 – 0.89 | Strong | Points cluster tightly around a line; useful for prediction. |
| 0.90 – 1.00 | Very strong | Almost deterministic - common in physics and engineering, rare in human behaviour data. |
These bands are conventional starting points, not laws. A correlation of 0.30 between a marketing nudge and conversion rate can be commercially huge; a correlation of 0.85 between two instrument readings of the same quantity is disappointing. Always interpret r in the context of your field's typical effect sizes.
Worked Example: Ice Cream Sales vs Temperature
A small kiosk records the daily high temperature (°F) and ice cream cups sold for seven days. Compute the correlation by hand to verify the calculator.
| Day | Temp X (°F) | Cups Y | X · Y | X² | Y² |
|---|---|---|---|---|---|
| Mon | 72 | 84 | 6048 | 5184 | 7056 |
| Tue | 75 | 92 | 6900 | 5625 | 8464 |
| Wed | 78 | 99 | 7722 | 6084 | 9801 |
| Thu | 81 | 110 | 8910 | 6561 | 12100 |
| Fri | 84 | 121 | 10164 | 7056 | 14641 |
| Sat | 88 | 138 | 12144 | 7744 | 19044 |
| Sun | 91 | 150 | 13650 | 8281 | 22500 |
| Σ | 569 | 794 | 65538 | 46535 | 93606 |
With n = 7, the substitutions are:
With r ≈ 0.995, ice cream sales and temperature are almost perfectly linearly related across this week. Paste 72, 75, 78, 81, 84, 88, 91 and 84, 92, 99, 110, 121, 138, 150 into the calculator above to reproduce the result.
Where the Correlation Coefficient is Used
Finance & portfolio analysis
Pairwise correlations between asset returns drive diversification. A portfolio of assets with low or negative correlations has lower variance than the weighted average of the individual variances.
Medical & epidemiological research
Correlating biomarker levels with disease severity is a first-pass screening tool before designing a controlled study or fitting a regression model.
Education & psychometrics
Test–retest reliability, item-total correlation, and the validity of a new instrument against an established one are all reported as Pearson or Spearman r.
Machine learning & feature selection
Correlation matrices flag redundant features and multicollinearity before fitting a linear model, regularised regression, or PCA.
Quality control & manufacturing
Correlating an upstream process variable (oven temperature, mixing time) with a downstream defect rate helps locate the source of variation.
A/B testing diagnostics
When two metrics in an experiment move together, knowing the correlation between them prevents double-counting evidence and helps choose a single primary metric.
Common Mistakes to Avoid
- Trusting r without plotting the data. Anscombe's quartet is the canonical demonstration: four datasets with identical r ≈ 0.816 but wildly different shapes - one curved, one with a single high-leverage outlier, one perfectly linear with a single off-line point. Always view the scatter plot before quoting r.
- Confusing correlation with causation. A high r can come from a true causal effect, but also from a shared cause (lurking variable), reverse causation, sampling bias, or chance. r is a description, not an explanation.
- Using Pearson on non-linear data. Pearson's r only measures linear association. A perfect parabolic relationship like y = x² centred at zero gives r = 0. Switch to Spearman or fit a non-linear model when the scatter shows curvature.
- Letting an outlier drag r. One extreme point in a small dataset can move r from 0.2 to 0.8. Robust alternatives (Spearman, Kendall's τ, percentile bootstrap) help, but the right first step is to investigate whether the outlier is a data entry error or a real, informative observation.
- Comparing r across datasets with different ranges. Restricting the range of X (truncated sampling) systematically attenuates r. Two studies of the same underlying relationship can report very different correlations purely because of the sampling design.
- Quoting r without a sample size. r = 0.6 from n = 8 is barely distinguishable from chance; r = 0.2 from n = 5,000 is highly statistically significant. Always report r alongside n (and ideally a confidence interval).
Pearson r vs Spearman ρ vs Kendall τ
| Coefficient | Measures | Use when |
|---|---|---|
| Pearson r | Linear association between two continuous variables. | Data is roughly normal, relationship looks linear in the scatter, no severe outliers. |
| Spearman ρ | Monotonic association on ranks (linear or curved as long as direction is consistent). | Data is ordinal, contains outliers, or the relationship is monotonic but visibly curved. |
| Kendall τ | Probability that pairs are concordant minus probability they are discordant. | Small samples, many tied ranks, or when you need a coefficient with a clean probability interpretation. |
Frequently Asked Questions
What is a correlation coefficient?
How do you calculate the Pearson correlation coefficient?
What is a good correlation coefficient value?
What is the difference between r and r²?
Does a high correlation mean causation?
What is the difference between Pearson and Spearman correlation?
How many data points do I need for a reliable correlation?
What does a negative correlation coefficient mean?
Can the correlation coefficient be greater than 1?
What is the difference between correlation and covariance?
Should I use sample or population formulas?
References and Further Reading
- Pearson, K. (1895). Notes on regression and inheritance in the case of two parents. Background on the original formulation - Pearson correlation coefficient (Wikipedia).
- Anscombe, F. J. (1973). Graphs in statistical analysis. The classic four-dataset example showing why r alone can mislead - Anscombe's quartet.
- NIST/SEMATECH (2012). e-Handbook of Statistical Methods, section on correlation - NIST handbook: correlation.
- Spearman, C. (1904) for the rank-based alternative when data is ordinal or non-linear - Spearman's rank correlation.
- For the distinction between correlation and causation, see our deeper write-up: Correlation vs causation - a practical guide.