Correlation Coefficient Calculator

Q: What does a negative correlation coefficient mean?

A negative r means that as one variable increases, the other tends to decrease. Examples include hours of exercise vs resting heart rate, or temperature vs heating bill. The closer r is to −1, the more tightly the points cluster around a downward-sloping line.

Q: Can the correlation coefficient be greater than 1?

No. By construction, the Pearson r is bounded between −1 and +1. If your calculation produces a value outside that range, you have an arithmetic error - most commonly forgetting to take square roots in the denominator, or mismatched list lengths between X and Y.

Q: What is the difference between correlation and covariance?

Covariance also measures whether two variables move together, but its value depends on the units of X and Y, so it is not directly comparable across datasets. Correlation is covariance scaled by the product of the two standard deviations, which removes the unit dependence and forces the result into [−1, +1].

Compute the Pearson correlation coefficient (r) from any pair of numeric lists. Returns r, r², covariance, the line of best fit, and a step-by-step solution with every intermediate sum.

Enter your data

Try an example

X values

10 numeric values parsed

Y values

10 numeric values parsed

Separate numbers with commas, spaces, or new lines. The two lists must contain matched pairs - the i-th X value is paired with the i-th Y value.

Pearson correlation coefficient

r = 0.9949

Very strong positive linear relationship

r²

0.9899

Covariance

40.500

Interactive scatter plot with linear fit

Step-by-Step Solution

Every intermediate sum is computed from the 10 pairs you entered. Expand any step to see the substitution.

Step 1

Tabulate the paired values

i	xᵢ	yᵢ	xᵢ·yᵢ	xᵢ²	yᵢ²
1	1	52	52.00	1.00	2704.00
2	2	60	120.00	4.00	3600.00
3	3	65	195.00	9.00	4225.00
4	4	70	280.00	16.00	4900.00
5	5	73	365.00	25.00	5329.00
6	6	78	468.00	36.00	6084.00
7	7	82	574.00	49.00	6724.00
8	8	85	680.00	64.00	7225.00
9	9	90	810.00	81.00	8100.00
10	10	94	940.00	100.00	8836.00
Σ	55.00	749.00	4484.00	385.00	57727.00

Step 2

Compute the running sums

n = 10
Σx = 55.0000
Σy = 749.0000
Σxy = 4484.0000
Σx² = 385.0000
Σy² = 57727.0000

Step 3

Write the formula

r = \frac{n\sum xy - \sum x \sum y}{\sqrt{[n\sum x^2 - (\sum x)^2]\,[n\sum y^2 - (\sum y)^2]}}

Step 4

Substitute the sums into the formula

r = \frac{10(4484.00) - (55.00)(749.00)}{\sqrt{[10(385.00) - (55.00)^2]\,[10(57727.00) - (749.00)^2]}}

Step 5

Simplify the numerator and denominator

r = \frac{3645.0000}{\sqrt{825.0000 \times 16269.0000}} = \frac{3645.0000}{3663.5945}

r = 0.994925

Step 6

Interpret the result

The data show a very strong positive linear relationship. The coefficient of determination $r^2$ = 0.9899 means that approximately 99.0% of the variation in Y can be explained by a straight-line model on X. The remaining 1.0% comes from other factors, measurement error, or non-linear structure that r cannot capture.

The Pearson Correlation Formula

Pearson's r can be written several equivalent ways. The calculator above uses the computational form, which avoids computing deviations from the mean and so reduces rounding error.

Computational form (used by this calculator)

r = \frac{n\sum xy - \sum x \sum y}{\sqrt{[n\sum x^2 - (\sum x)^2]\,[n\sum y^2 - (\sum y)^2]}}

Definitional form (deviations from the mean)

r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}}

This version makes the geometric meaning clearer: numerator is the sample covariance times (n − 1), denominator is the product of the X and Y standard deviations times (n − 1). The (n − 1) terms cancel.

Covariance form

r = \frac{\operatorname{Cov}(X, Y)}{s_X \, s_Y}

Where $s_X$ and $s_Y$ are the sample standard deviations of X and Y. This is the cleanest way to remember why r is dimensionless and bounded by ±1.

How to Interpret the Correlation Coefficient

r condenses an entire scatter plot into a single number, so the interpretation has two pieces: the sign tells you the direction, and the absolute value tells you the strength.

\|r\|	Strength	Typical interpretation
0.00 – 0.19	Very weak / none	No meaningful linear pattern in the cloud.
0.20 – 0.39	Weak	Slight trend, swamped by noise.
0.40 – 0.69	Moderate	Clear trend visible in the scatter; meaningful in most applied fields.
0.70 – 0.89	Strong	Points cluster tightly around a line; useful for prediction.
0.90 – 1.00	Very strong	Almost deterministic - common in physics and engineering, rare in human behaviour data.

These bands are conventional starting points, not laws. A correlation of 0.30 between a marketing nudge and conversion rate can be commercially huge; a correlation of 0.85 between two instrument readings of the same quantity is disappointing. Always interpret r in the context of your field's typical effect sizes.

Worked Example: Ice Cream Sales vs Temperature

A small kiosk records the daily high temperature (°F) and ice cream cups sold for seven days. Compute the correlation by hand to verify the calculator.

Day	Temp X (°F)	Cups Y	X · Y	X²	Y²
Mon	72	84	6048	5184	7056
Tue	75	92	6900	5625	8464
Wed	78	99	7722	6084	9801
Thu	81	110	8910	6561	12100
Fri	84	121	10164	7056	14641
Sat	88	138	12144	7744	19044
Sun	91	150	13650	8281	22500
Σ	569	794	65538	46535	93606

With n = 7, the substitutions are:

r = \frac{7(65538) - (569)(794)}{\sqrt{[7(46535) - 569^2]\,[7(93606) - 794^2]}}

r = \frac{6980}{\sqrt{1984 \times 24806}} = 0.9950

With r ≈ 0.995, ice cream sales and temperature are almost perfectly linearly related across this week. Paste 72, 75, 78, 81, 84, 88, 91 and 84, 92, 99, 110, 121, 138, 150 into the calculator above to reproduce the result.

Where the Correlation Coefficient is Used

Finance & portfolio analysis

Pairwise correlations between asset returns drive diversification. A portfolio of assets with low or negative correlations has lower variance than the weighted average of the individual variances.

Medical & epidemiological research

Correlating biomarker levels with disease severity is a first-pass screening tool before designing a controlled study or fitting a regression model.

Education & psychometrics

Test–retest reliability, item-total correlation, and the validity of a new instrument against an established one are all reported as Pearson or Spearman r.

Machine learning & feature selection

Correlation matrices flag redundant features and multicollinearity before fitting a linear model, regularised regression, or PCA.

Quality control & manufacturing

Correlating an upstream process variable (oven temperature, mixing time) with a downstream defect rate helps locate the source of variation.

A/B testing diagnostics

When two metrics in an experiment move together, knowing the correlation between them prevents double-counting evidence and helps choose a single primary metric.

Common Mistakes to Avoid

Trusting r without plotting the data. Anscombe's quartet is the canonical demonstration: four datasets with identical r ≈ 0.816 but wildly different shapes - one curved, one with a single high-leverage outlier, one perfectly linear with a single off-line point. Always view the scatter plot before quoting r.
Confusing correlation with causation. A high r can come from a true causal effect, but also from a shared cause (lurking variable), reverse causation, sampling bias, or chance. r is a description, not an explanation.
Using Pearson on non-linear data. Pearson's r only measures linear association. A perfect parabolic relationship like y = x² centred at zero gives r = 0. Switch to Spearman or fit a non-linear model when the scatter shows curvature.
Letting an outlier drag r. One extreme point in a small dataset can move r from 0.2 to 0.8. Robust alternatives (Spearman, Kendall's τ, percentile bootstrap) help, but the right first step is to investigate whether the outlier is a data entry error or a real, informative observation.
Comparing r across datasets with different ranges. Restricting the range of X (truncated sampling) systematically attenuates r. Two studies of the same underlying relationship can report very different correlations purely because of the sampling design.
Quoting r without a sample size. r = 0.6 from n = 8 is barely distinguishable from chance; r = 0.2 from n = 5,000 is highly statistically significant. Always report r alongside n (and ideally a confidence interval).

Pearson r vs Spearman ρ vs Kendall τ

Coefficient	Measures	Use when
Pearson r	Linear association between two continuous variables.	Data is roughly normal, relationship looks linear in the scatter, no severe outliers.
Spearman ρ	Monotonic association on ranks (linear or curved as long as direction is consistent).	Data is ordinal, contains outliers, or the relationship is monotonic but visibly curved.
Kendall τ	Probability that pairs are concordant minus probability they are discordant.	Small samples, many tied ranks, or when you need a coefficient with a clean probability interpretation.

Frequently Asked Questions

What is a correlation coefficient?

The Pearson correlation coefficient (r) is a number between −1 and +1 that measures the strength and direction of a linear relationship between two variables. A value of +1 means a perfect positive linear relationship, −1 means a perfect negative linear relationship, and 0 means no linear relationship.

How do you calculate the Pearson correlation coefficient?

Use the formula r = [n·Σxy − Σx·Σy] / √{[n·Σx² − (Σx)²]·[n·Σy² − (Σy)²]}. The calculator on this page computes every intermediate sum (Σx, Σy, Σxy, Σx², Σy²) and substitutes them into the formula so you can verify each step.

What is a good correlation coefficient value?

There is no universal threshold; it depends on the field. In physics or engineering you often expect |r| above 0.9. In psychology or social sciences, |r| above 0.5 is often considered strong. As rough textbook bands: 0.0–0.2 is very weak, 0.2–0.4 weak, 0.4–0.7 moderate, 0.7–0.9 strong, 0.9–1.0 very strong.

What is the difference between r and r²?

r is the correlation coefficient and tells you direction and strength of a linear relationship. r² (coefficient of determination) is r multiplied by itself, expressed between 0 and 1, and tells you the proportion of variance in Y that can be explained by X under a linear model. For example, r = 0.8 gives r² = 0.64, meaning 64% of the variance is explained.

Does a high correlation mean causation?

No. Correlation only measures co-movement of two variables. A high r can arise from a true causal link, reverse causation, a third confounding variable, selection bias, or pure coincidence in small samples. Establishing causation requires experimental design or causal inference techniques, not r alone.

What is the difference between Pearson and Spearman correlation?

Pearson (r) measures linear association between two continuous variables and assumes the relationship is roughly linear. Spearman (ρ) is calculated on ranks instead of raw values, so it captures any monotonic relationship and is robust to outliers and non-linear but monotonic patterns. Use Spearman when your data is ordinal, contains outliers, or the scatter plot shows a curved but consistently increasing or decreasing relationship.

How many data points do I need for a reliable correlation?

Correlations from very small samples are highly unstable. With n < 10, a single point can swing r by 0.3 or more. For a reasonably stable estimate, aim for at least 30 paired observations. For publication-grade work, the required n depends on the effect size you expect - power calculations are the right tool.

What does a negative correlation coefficient mean?

A negative r means that as one variable increases, the other tends to decrease. Examples include hours of exercise vs resting heart rate, or temperature vs heating bill. The closer r is to −1, the more tightly the points cluster around a downward-sloping line.

Can the correlation coefficient be greater than 1?

No. By construction, the Pearson r is bounded between −1 and +1. If your calculation produces a value outside that range, you have an arithmetic error - most commonly forgetting to take square roots in the denominator, or mismatched list lengths between X and Y.

What is the difference between correlation and covariance?

Covariance also measures whether two variables move together, but its value depends on the units of X and Y, so it is not directly comparable across datasets. Correlation is covariance scaled by the product of the two standard deviations, which removes the unit dependence and forces the result into [−1, +1].

Should I use sample or population formulas?

The Pearson r formula does not change between sample and population because both numerator and denominator scale the same way. The distinction matters for variance and standard deviation (n vs n − 1), but for r the answer is identical either way.

References and Further Reading

Pearson, K. (1895). Notes on regression and inheritance in the case of two parents. Background on the original formulation - Pearson correlation coefficient (Wikipedia).
Anscombe, F. J. (1973). Graphs in statistical analysis. The classic four-dataset example showing why r alone can mislead - Anscombe's quartet.
NIST/SEMATECH (2012). e-Handbook of Statistical Methods, section on correlation - NIST handbook: correlation.
Spearman, C. (1904) for the rank-based alternative when data is ordinal or non-linear - Spearman's rank correlation.
For the distinction between correlation and causation, see our deeper write-up: Correlation vs causation - a practical guide.

Related Calculators on this Site

Standard Deviation Calculator

Sample and population SD with worked steps and a bell-curve overlay.

Linear Regression Calculator

Slope, intercept, R², residuals, and the line of best fit.

Scatter Plot Maker

Plot your X-Y data, customise it, and export PNG or SVG.

Heatmap Maker

Visualise a full correlation matrix as a colour-coded grid.

Histogram Maker

Check the distribution of a single variable before correlating.

Blog: Correlation vs Causation

Why a high r does not prove cause and effect, with real examples.