Correlation Coefficient Calculator

Compute the Pearson correlation coefficient (r) from any pair of numeric lists. Returns r, , covariance, the line of best fit, and a step-by-step solution with every intermediate sum.

Enter your data

10 numeric values parsed
10 numeric values parsed

Separate numbers with commas, spaces, or new lines. The two lists must contain matched pairs - the i-th X value is paired with the i-th Y value.

Pearson correlation coefficient
r = 0.9949
Very strong positive linear relationship
0.9899
n
10
Covariance
40.500
Interactive scatter plot with linear fit

Step-by-Step Solution

Every intermediate sum is computed from the 10 pairs you entered. Expand any step to see the substitution.

Step 1
Tabulate the paired values
ixᵢyᵢxᵢ·yᵢxᵢ²yᵢ²
115252.001.002704.00
2260120.004.003600.00
3365195.009.004225.00
4470280.0016.004900.00
5573365.0025.005329.00
6678468.0036.006084.00
7782574.0049.006724.00
8885680.0064.007225.00
9990810.0081.008100.00
101094940.00100.008836.00
Σ55.00749.004484.00385.0057727.00
Step 2
Compute the running sums
  • n = 10
  • Σx = 55.0000
  • Σy = 749.0000
  • Σxy = 4484.0000
  • Σx² = 385.0000
  • Σy² = 57727.0000
Step 3
Write the formula
r=nxyxy[nx2(x)2][ny2(y)2]r = \frac{n\sum xy - \sum x \sum y}{\sqrt{[n\sum x^2 - (\sum x)^2]\,[n\sum y^2 - (\sum y)^2]}}
Step 4
Substitute the sums into the formula
r=10(4484.00)(55.00)(749.00)[10(385.00)(55.00)2][10(57727.00)(749.00)2]r = \frac{10(4484.00) - (55.00)(749.00)}{\sqrt{[10(385.00) - (55.00)^2]\,[10(57727.00) - (749.00)^2]}}
Step 5
Simplify the numerator and denominator
r=3645.0000825.0000×16269.0000=3645.00003663.5945r = \frac{3645.0000}{\sqrt{825.0000 \times 16269.0000}} = \frac{3645.0000}{3663.5945}
r=0.994925r = 0.994925
Step 6
Interpret the result

The data show a very strong positive linear relationship. The coefficient of determination r2r^2 = 0.9899 means that approximately 99.0% of the variation in Y can be explained by a straight-line model on X. The remaining 1.0% comes from other factors, measurement error, or non-linear structure that r cannot capture.

The Pearson Correlation Formula

Pearson's r can be written several equivalent ways. The calculator above uses the computational form, which avoids computing deviations from the mean and so reduces rounding error.

Computational form (used by this calculator)

r=nxyxy[nx2(x)2][ny2(y)2]r = \frac{n\sum xy - \sum x \sum y}{\sqrt{[n\sum x^2 - (\sum x)^2]\,[n\sum y^2 - (\sum y)^2]}}

Definitional form (deviations from the mean)

r=(xixˉ)(yiyˉ)(xixˉ)2(yiyˉ)2r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}}

This version makes the geometric meaning clearer: numerator is the sample covariance times (n − 1), denominator is the product of the X and Y standard deviations times (n − 1). The (n − 1) terms cancel.

Covariance form

r=Cov(X,Y)sXsYr = \frac{\operatorname{Cov}(X, Y)}{s_X \, s_Y}

Where sXs_X and sYs_Y are the sample standard deviations of X and Y. This is the cleanest way to remember why r is dimensionless and bounded by ±1.

How to Interpret the Correlation Coefficient

r condenses an entire scatter plot into a single number, so the interpretation has two pieces: the sign tells you the direction, and the absolute value tells you the strength.

|r|StrengthTypical interpretation
0.00 – 0.19Very weak / noneNo meaningful linear pattern in the cloud.
0.20 – 0.39WeakSlight trend, swamped by noise.
0.40 – 0.69ModerateClear trend visible in the scatter; meaningful in most applied fields.
0.70 – 0.89StrongPoints cluster tightly around a line; useful for prediction.
0.90 – 1.00Very strongAlmost deterministic - common in physics and engineering, rare in human behaviour data.

These bands are conventional starting points, not laws. A correlation of 0.30 between a marketing nudge and conversion rate can be commercially huge; a correlation of 0.85 between two instrument readings of the same quantity is disappointing. Always interpret r in the context of your field's typical effect sizes.

Worked Example: Ice Cream Sales vs Temperature

A small kiosk records the daily high temperature (°F) and ice cream cups sold for seven days. Compute the correlation by hand to verify the calculator.

DayTemp X (°F)Cups YX · Y
Mon7284604851847056
Tue7592690056258464
Wed7899772260849801
Thu811108910656112100
Fri8412110164705614641
Sat8813812144774419044
Sun9115013650828122500
Σ569794655384653593606

With n = 7, the substitutions are:

r=7(65538)(569)(794)[7(46535)5692][7(93606)7942]r = \frac{7(65538) - (569)(794)}{\sqrt{[7(46535) - 569^2]\,[7(93606) - 794^2]}}
r=69801984×24806=0.9950r = \frac{6980}{\sqrt{1984 \times 24806}} = 0.9950

With r ≈ 0.995, ice cream sales and temperature are almost perfectly linearly related across this week. Paste 72, 75, 78, 81, 84, 88, 91 and 84, 92, 99, 110, 121, 138, 150 into the calculator above to reproduce the result.

Where the Correlation Coefficient is Used

Finance & portfolio analysis

Pairwise correlations between asset returns drive diversification. A portfolio of assets with low or negative correlations has lower variance than the weighted average of the individual variances.

Medical & epidemiological research

Correlating biomarker levels with disease severity is a first-pass screening tool before designing a controlled study or fitting a regression model.

Education & psychometrics

Test–retest reliability, item-total correlation, and the validity of a new instrument against an established one are all reported as Pearson or Spearman r.

Machine learning & feature selection

Correlation matrices flag redundant features and multicollinearity before fitting a linear model, regularised regression, or PCA.

Quality control & manufacturing

Correlating an upstream process variable (oven temperature, mixing time) with a downstream defect rate helps locate the source of variation.

A/B testing diagnostics

When two metrics in an experiment move together, knowing the correlation between them prevents double-counting evidence and helps choose a single primary metric.

Common Mistakes to Avoid

  1. Trusting r without plotting the data. Anscombe's quartet is the canonical demonstration: four datasets with identical r ≈ 0.816 but wildly different shapes - one curved, one with a single high-leverage outlier, one perfectly linear with a single off-line point. Always view the scatter plot before quoting r.
  2. Confusing correlation with causation. A high r can come from a true causal effect, but also from a shared cause (lurking variable), reverse causation, sampling bias, or chance. r is a description, not an explanation.
  3. Using Pearson on non-linear data. Pearson's r only measures linear association. A perfect parabolic relationship like y = x² centred at zero gives r = 0. Switch to Spearman or fit a non-linear model when the scatter shows curvature.
  4. Letting an outlier drag r. One extreme point in a small dataset can move r from 0.2 to 0.8. Robust alternatives (Spearman, Kendall's τ, percentile bootstrap) help, but the right first step is to investigate whether the outlier is a data entry error or a real, informative observation.
  5. Comparing r across datasets with different ranges. Restricting the range of X (truncated sampling) systematically attenuates r. Two studies of the same underlying relationship can report very different correlations purely because of the sampling design.
  6. Quoting r without a sample size. r = 0.6 from n = 8 is barely distinguishable from chance; r = 0.2 from n = 5,000 is highly statistically significant. Always report r alongside n (and ideally a confidence interval).

Pearson r vs Spearman ρ vs Kendall τ

CoefficientMeasuresUse when
Pearson rLinear association between two continuous variables.Data is roughly normal, relationship looks linear in the scatter, no severe outliers.
Spearman ρMonotonic association on ranks (linear or curved as long as direction is consistent).Data is ordinal, contains outliers, or the relationship is monotonic but visibly curved.
Kendall τProbability that pairs are concordant minus probability they are discordant.Small samples, many tied ranks, or when you need a coefficient with a clean probability interpretation.

Frequently Asked Questions

What is a correlation coefficient?
The Pearson correlation coefficient (r) is a number between −1 and +1 that measures the strength and direction of a linear relationship between two variables. A value of +1 means a perfect positive linear relationship, −1 means a perfect negative linear relationship, and 0 means no linear relationship.
How do you calculate the Pearson correlation coefficient?
Use the formula r = [n·Σxy − Σx·Σy] / √{[n·Σx² − (Σx)²]·[n·Σy² − (Σy)²]}. The calculator on this page computes every intermediate sum (Σx, Σy, Σxy, Σx², Σy²) and substitutes them into the formula so you can verify each step.
What is a good correlation coefficient value?
There is no universal threshold; it depends on the field. In physics or engineering you often expect |r| above 0.9. In psychology or social sciences, |r| above 0.5 is often considered strong. As rough textbook bands: 0.0–0.2 is very weak, 0.2–0.4 weak, 0.4–0.7 moderate, 0.7–0.9 strong, 0.9–1.0 very strong.
What is the difference between r and r²?
r is the correlation coefficient and tells you direction and strength of a linear relationship. r² (coefficient of determination) is r multiplied by itself, expressed between 0 and 1, and tells you the proportion of variance in Y that can be explained by X under a linear model. For example, r = 0.8 gives r² = 0.64, meaning 64% of the variance is explained.
Does a high correlation mean causation?
No. Correlation only measures co-movement of two variables. A high r can arise from a true causal link, reverse causation, a third confounding variable, selection bias, or pure coincidence in small samples. Establishing causation requires experimental design or causal inference techniques, not r alone.
What is the difference between Pearson and Spearman correlation?
Pearson (r) measures linear association between two continuous variables and assumes the relationship is roughly linear. Spearman (ρ) is calculated on ranks instead of raw values, so it captures any monotonic relationship and is robust to outliers and non-linear but monotonic patterns. Use Spearman when your data is ordinal, contains outliers, or the scatter plot shows a curved but consistently increasing or decreasing relationship.
How many data points do I need for a reliable correlation?
Correlations from very small samples are highly unstable. With n < 10, a single point can swing r by 0.3 or more. For a reasonably stable estimate, aim for at least 30 paired observations. For publication-grade work, the required n depends on the effect size you expect - power calculations are the right tool.
What does a negative correlation coefficient mean?
A negative r means that as one variable increases, the other tends to decrease. Examples include hours of exercise vs resting heart rate, or temperature vs heating bill. The closer r is to −1, the more tightly the points cluster around a downward-sloping line.
Can the correlation coefficient be greater than 1?
No. By construction, the Pearson r is bounded between −1 and +1. If your calculation produces a value outside that range, you have an arithmetic error - most commonly forgetting to take square roots in the denominator, or mismatched list lengths between X and Y.
What is the difference between correlation and covariance?
Covariance also measures whether two variables move together, but its value depends on the units of X and Y, so it is not directly comparable across datasets. Correlation is covariance scaled by the product of the two standard deviations, which removes the unit dependence and forces the result into [−1, +1].
Should I use sample or population formulas?
The Pearson r formula does not change between sample and population because both numerator and denominator scale the same way. The distinction matters for variance and standard deviation (n vs n − 1), but for r the answer is identical either way.

References and Further Reading

Related Calculators on this Site