Scatter Plot Maker & Calculator

Plot your X-Y data, fit a line of best fit, and instantly get the regression equation, R², correlation coefficient, residuals and outliers - free, in your browser, no sign-up.

Export Chart

Regression Analysis & Descriptive Statistics

5 data points
Line of Best Fit
y = 0.0562x + 1.6305
Weak positive correlation (r = 0.1836)

Click any card below to expand its step-by-step calculation with the formula and your input values substituted in.

Slope (m)
0.0562
Change in y per unit x
Show step-by-step calculation
Formula
m=nxyxynx2(x)2m = \frac{n\sum xy - \sum x \sum y}{n\sum x^2 - (\sum x)^2}
Intermediate sums
  • n = 5
  • Σx = 93.4000
  • Σy = 13.4000
  • Σxy = 257.0600
  • Σx² = 1864.8200
Substitute
m=5(257.0600)(93.4000)(13.4000)5(1864.8200)(93.4000)2m = \frac{5(257.0600) - (93.4000)(13.4000)}{5(1864.8200) - (93.4000)^2}
Simplify
m=33.7400600.5400=0.056183m = \frac{33.7400}{600.5400} = 0.056183
Y-Intercept (b)
1.6305
Predicted y when x = 0
Show step-by-step calculation
Formula
b=yˉmxˉb = \bar{y} - m\bar{x}
Means
  • x̄ = Σx / n = 93.4000 / 5 = 18.680000
  • ȳ = Σy / n = 13.4000 / 5 = 2.680000
Substitute
b=2.680000(0.056183)(18.680000)b = 2.680000 - (0.056183)(18.680000)
b=1.630506b = 1.630506
R² (Determination)
0.0337
3.4% variance explained
Show step-by-step calculation
Definition

R² is the square of the correlation coefficient r. It is the proportion of the variance in Y explained by X under the linear model.

R2=r2R^2 = r^2
Substitute
R2=(0.183591)2=0.033706R^2 = (0.183591)^2 = 0.033706
Interpretation

About 3.4% of the variation in Y can be predicted from X using this line. The remaining 96.6% is unexplained variation.

r (Correlation)
0.1836
Pearson correlation (-1 to 1)
Show step-by-step calculation
Formula
r=nxyxy[nx2(x)2][ny2(y)2]r = \frac{n\sum xy - \sum x \sum y}{\sqrt{[n\sum x^2 - (\sum x)^2]\,[n\sum y^2 - (\sum y)^2]}}
Intermediate sums
  • Σy² = 47.1600
  • n·Σx² − (Σx)² = 600.5400
  • n·Σy² − (Σy)² = 56.2400
  • n·Σxy − Σx·Σy = 33.7400
Substitute
r=33.7400600.5400×56.2400r = \frac{33.7400}{\sqrt{600.5400 \times 56.2400}}
r=33.7400183.7780=0.183591r = \frac{33.7400}{183.7780} = 0.183591
Interpretation

Weak positive linear relationship. r is bounded by −1 and +1; values near 0 indicate no linear association.

RMSE
1.4744
Root mean square error
Show step-by-step calculation
Formula
RMSE=1ni=1n(yiy^i)2\text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}
Compute SSE (sum of squared residuals)

For each point: residual = yi(mxi+b)y_i - (mx_i + b), then square and sum.

(yiy^i)2=10.868879\sum (y_i - \hat{y}_i)^2 = 10.868879
Substitute
RMSE=10.8688795=1.474373\text{RMSE} = \sqrt{\frac{10.868879}{5}} = 1.474373
Interpretation

Typical prediction error is about 1.47 units of Y. Lower is better; combine with R² for a full picture of fit quality.

Outliers Detected
0
Points with |z-residual| > 2
Show step-by-step calculation
Method

Each residual is standardized by subtracting the mean residual and dividing by the residual standard deviation. Points whose standardized residual exceeds ±2 are flagged.

zi=eieˉse,outlier if zi>2z_i = \frac{e_i - \bar{e}}{s_e}, \quad \text{outlier if } |z_i| > 2
Residual statistics
  • Mean residual ē = -0.000000
  • Residual SD s_e = 1.648399
  • Threshold = ±3.296798 from ē
Result

0 of 5 points flagged. Outliers render in red on the scatter plot above.

Mean X
18.6800
Average of X
Show step-by-step calculation
Formula
xˉ=xin\bar{x} = \frac{\sum x_i}{n}
Substitute
xˉ=93.40005=18.680000\bar{x} = \frac{93.4000}{5} = 18.680000
Mean Y
2.6800
Average of Y
Show step-by-step calculation
Formula
yˉ=yin\bar{y} = \frac{\sum y_i}{n}
Substitute
yˉ=13.40005=2.680000\bar{y} = \frac{13.4000}{5} = 2.680000
Std Dev X
5.4797
Sample SD of X
Show step-by-step calculation
Formula
sx=xi2nxˉ2n1s_x = \sqrt{\frac{\sum x_i^2 - n\bar{x}^2}{n - 1}}
Substitute
sx=1864.82005(18.680000)24s_x = \sqrt{\frac{1864.8200 - 5(18.680000)^2}{4}}
sx=120.1080004=5.479690s_x = \sqrt{\frac{120.108000}{4}} = 5.479690
Std Dev Y
1.6769
Sample SD of Y
Show step-by-step calculation
Formula
sy=yi2nyˉ2n1s_y = \sqrt{\frac{\sum y_i^2 - n\bar{y}^2}{n - 1}}
Substitute
sy=47.16005(2.680000)24s_y = \sqrt{\frac{47.1600 - 5(2.680000)^2}{4}}
sy=11.2480004=1.676902s_y = \sqrt{\frac{11.248000}{4}} = 1.676902

Slope and intercept are computed via the least-squares method. R² is the proportion of variance in Y explained by X. Outliers are flagged when the standardized residual exceeds ±2.

How to Use the Scatter Plot Maker & Calculator

Six quick steps to plot your X-Y data, fit a regression line, read the R² and correlation coefficient, and export a publication-ready chart.

  1. Enter your X values

    Type or paste your X-axis values into the X Values field in the Data Entry panel. The scatter plot calculator accepts comma-separated, space-separated, tab-separated, and Excel copy-paste formats - pick whichever matches your source. The data preview table updates in real time so you can confirm the values parsed correctly.

  2. Enter the matching Y values

    Add the corresponding Y values in the Y Values field, one for each X. The point counter underneath the input shows how many valid (X, Y) pairs were detected. If the counts don't match, the scatter plot maker pairs the first N values, where N is the smaller of the two lists.

  3. Enable the line of best fit

    Open the Trendline section and toggle Show Trendline. The calculator computes the least-squares regression line and renders it on top of your scatter plot. The headline equation y = mx + b appears in the Regression Analysis panel, alongside the slope, y-intercept, R², correlation coefficient, and RMSE.

  4. Highlight outliers and inspect residuals

    Switch on Highlight Outliers in the Regression Analysis section to flag any point whose standardized residual exceeds ±2 - these will render in red. Turn on Show Residual Plot to display a separate (y − ŷ) vs x chart underneath, which makes it easy to spot non-linear patterns or heteroscedasticity.

  5. Customize the appearance

    Set descriptive axis labels with units (for example, “Hours studied” and “Test score (%)”), pick a color theme, and adjust marker size for accessibility. Use the Title & Subtitle section to add a chart title and dataset description. All settings apply live to the chart preview.

  6. Export your scatter plot

    Use the Export Chart panel to download the scatter plot as PNG (best for presentations), JPEG/JPG (smallest file size), or SVG (vector format, ideal for print and LaTeX). Nothing is uploaded - every download is generated locally in your browser, so your data stays on your device.

Understanding Scatter Plots & Regression Analysis

Scatter plots aren't just dots on a graph-they're powerful tools for spotting patterns, identifying outliers, and figuring out how two variables connect. Whether you're a student, researcher, or just curious about data, here's the math that powers every scatter plot creator.

1The Line of Best Fit (Linear Regression)

Ever wondered what that trendline on your scatter plot actually means? That's a regression line-basically the single straight line that best captures where your data is heading. Here's the classic equation you'll see:

y=mx+by = mx + b

What each part means:

  • yy - The predicted value (what you're trying to figure out)
  • mm - The slope (how steep your line climbs or falls)
  • xx - Your input value
  • bb - The y-intercept (where your line hits the y-axis)

Real world example:

Say you're charting study hours against test scores. If your equation turns out to be y=8x+40y = 8x + 40, that tells you each extra hour of study bumps your score by 8 points. Even with zero study time, you'd still score around 40 (maybe from paying attention in class!).

2Calculating the Slope

The slope is where it gets interesting. It tells you exactly how much y shifts whenever x changes by one unit. Positive slope? Both variables rise together. Negative slope? One goes up while the other drops-like ice cream sales vs. sweater purchases.

m=nxyxynx2(x)2m = \frac{n\sum xy - \sum x \sum y}{n\sum x^2 - (\sum x)^2}

Breaking it down:

nn - Number of data points

xy\sum xy - Sum of each x times its paired y

x\sum x and y\sum y - Sum of all x and y values

x2\sum x^2 - Sum of each x value squared

Don't stress about memorizing this formula-our scatter plot creator handles all the heavy lifting automatically when you toggle on trendlines.

3Finding the Y-Intercept

Got your slope? Finding the y-intercept is the easy part. It's simply the average y-value minus the slope multiplied by the average x-value:

b=yˉmxˉb = \bar{y} - m\bar{x}

Here, xˉ\bar{x} and yˉ\bar{y} are just fancy notation for the averages of your x and y values. Think of the y-intercept as your starting point-what y equals when x is zero.

Real-world example: if you're plotting ad spend vs. sales revenue, the y-intercept shows your baseline sales before spending a single dollar on advertising.

4Correlation Coefficient (r)

Here's the real MVP of scatter plot analysis. The correlation coefficient (r) tells you exactly how tightly your two variables are connected-and whether that relationship is positive or negative:

r=nxyxy[nx2(x)2][ny2(y)2]r = \frac{n\sum xy - \sum x \sum y}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}}

r=+1r = +1

Perfect positive correlation. As x increases, y increases proportionally.

r=0r = 0

No correlation. The variables have no linear relationship.

r=1r = -1

Perfect negative correlation. As x increases, y decreases proportionally.

Practical interpretation:

  • |r| > 0.7 - Strong relationship
  • 0.4 < |r| < 0.7 - Moderate relationship
  • |r| < 0.4 - Weak relationship

5R-Squared (Coefficient of Determination)

R-squared is exactly what it sounds like: the correlation coefficient, squared. But here's why it's so useful-it tells you what percentage of y's variation can be explained by x:

R2=r2R^2 = r^2

Why this matters:

When you see R² = 0.85, that means 85% of the ups and downs in your y-values can be traced back to changes in x. The other 15%? That's noise-other stuff you're not tracking.

Quick rule: R² above 0.5 generally means your trendline is telling a real story. Below that, you might just be seeing patterns in random noise.

6Covariance & Standard Deviation

Under the hood, both the slope and the correlation coefficient can be expressed in terms of two more fundamental quantities: the sample standard deviation of each variable, and the covariance between them.

sx=(xixˉ)2n1,sy=(yiyˉ)2n1s_x = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n - 1}}, \quad s_y = \sqrt{\frac{\sum (y_i - \bar{y})^2}{n - 1}}
Cov(x,y)=(xixˉ)(yiyˉ)n1\text{Cov}(x, y) = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{n - 1}

With those, the slope and correlation collapse to elegant forms:

m=Cov(x,y)sx2,r=Cov(x,y)sxsym = \frac{\text{Cov}(x,y)}{s_x^2}, \quad r = \frac{\text{Cov}(x,y)}{s_x \cdot s_y}

The scatter plot calculator above reports sxs_x and sys_y in the Descriptive Statistics panel so you can sanity-check spread before interpreting the slope.

7RMSE (Root Mean Square Error)

R² tells you the proportion of variance explained, but it doesn't tell you how big the typical prediction error is in the original units of Y. That's what RMSE does.

RMSE=1ni=1n(yiy^i)2\text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}

y^i\hat{y}_i is the predicted value from the regression line and yiy_i is the observed value. RMSE is in the same units as Y, so if you're predicting test scores and RMSE = 4, your line is off by about 4 points on a typical prediction. Lower is better; combine it with R² for a full picture of fit quality.

8Residuals & Outlier Detection

A residual is the vertical distance between an observed Y value and the regression line's prediction at the same X:

ei=yiy^i=yi(mxi+b)e_i = y_i - \hat{y}_i = y_i - (mx_i + b)

Residuals are the diagnostic engine of regression. If a residual plot looks like random noise around zero, the linear model is appropriate. If it shows a pattern - a curve, a fan shape, or clusters - a different model would fit better.

For outlier detection, the calculator standardizes each residual and flags points where:

eieˉse>2\left| \frac{e_i - \bar{e}}{s_e} \right| > 2

That's the rule behind the red points when Highlight Outliers is enabled - anything more than two residual standard deviations from the regression line gets flagged for review.

Quick Tips for Better Scatter Plots

Do:

  • Label your axes with units (e.g., "Revenue (USD)")
  • Use transparency when points overlap
  • Check R² before trusting a trendline
  • Look for outliers - they tell interesting stories

Avoid:

  • Assuming correlation means causation
  • Forcing trendlines on random-looking data
  • Using rainbow color schemes (accessibility issue)
  • Overcrowding with too many data points

Worked Example: Computing the Regression Equation by Hand

To make the formulas concrete, here's a complete walkthrough on a small dataset. Imagine surveying five students on hours studied (X) and test score (Y), then fitting a line of best fit.

Step 1 - The dataset

StudentX (hours)Y (score)X · Y
1152521
22601204
33682049
447329216
558241025
Σ15335107855

Step 2 - Means

xˉ=155=3.00,yˉ=3355=67.00\bar{x} = \frac{15}{5} = 3.00, \quad \bar{y} = \frac{335}{5} = 67.00

Step 3 - Slope (m)

m=nxyxynx2(x)2m = \frac{n\sum xy - \sum x \sum y}{n\sum x^2 - (\sum x)^2}
m=5(1078)(15)(335)5(55)(15)2=36550=7.3000m = \frac{5(1078) - (15)(335)}{5(55) - (15)^2} = \frac{365}{50} = 7.3000

Each additional hour of study is associated with about 7.30 additional test points.

Step 4 - Y-Intercept (b)

b=yˉmxˉb = \bar{y} - m\bar{x}
b=67.00(7.3000)(3.00)=45.1000b = 67.00 - (7.3000)(3.00) = 45.1000

Step 5 - Final Regression Equation, r and R²

y=7.30x+45.10y = 7.30x + 45.10
Correlation coefficient
r = 0.9971
Strong positive linear correlation.
Coefficient of determination
R² = 0.9942
99.4% of score variation explained by hours.

Want to verify? Paste 1, 2, 3, 4, 5 into the X field and 52, 60, 68, 73, 82 into the Y field of the calculator above - you'll see exactly the same slope, intercept, R², and r.

Who Uses the Scatter Plot Maker & Calculator

A free, browser-based scatter plot calculator with built-in regression analysis is useful any time you need to spot a pattern between two variables - fast, without spreadsheet friction.

Students & academics

Lab reports for biology, chemistry, physics, psychology, and economics - generate the regression line and R² in seconds for a methods section.

Business analytics

Visualize price-vs-demand, ad spend vs revenue, conversion funnels. Outlier detection catches anomalies before they skew strategy.

Research & data science

Quick exploratory analysis before pulling out a heavier stats package. Residual plots guide model selection.

Teaching & presentations

Live-update plots in front of students or stakeholders. SVG export keeps charts crisp in slides and printed handouts.

Data journalism

Communicate correlations to a general audience with a clean, branded scatter plot - no software install needed.

Engineering & QA

Calibration curves, sensor drift, tolerance studies - RMSE quantifies how tight the linear approximation actually is.

Frequently Asked Questions About the Scatter Plot Maker & Calculator

How do I create a scatter plot with a regression line?+

Paste your X and Y values into the Data Entry panel - comma-separated, space-separated, or copied straight from Excel all work. Toggle Show Trendline in the Trendline section, and the scatter plot maker instantly draws the line of best fit and surfaces the regression equation y = mx + b along with slope, y-intercept, R², the correlation coefficient r, and RMSE in the Regression Analysis panel.

How do I find the regression equation from a data table?+

Enter the X column and the Y column from your data table into the two input fields. The scatter plot calculator computes the slope and y-intercept using the least-squares method and displays the regression equation in the form y = mx + b. The numerical answer updates in real time as you edit the data - no manual computation, and no spreadsheet formulas required.

What does R-squared mean and how do I interpret it?+

R² (the coefficient of determination) is the proportion of the variation in Y that is explained by X under the linear model. R² = 0.85 means 85% of the variation in Y can be predicted from X, with the remaining 15% attributable to other factors or noise. As a rough guide: R² above 0.7 is a strong fit, 0.4–0.7 is moderate, and below 0.4 means the linear trend may be unreliable for prediction.

What is the correlation coefficient (r) and how is it different from R²?+

The correlation coefficient r measures both the strength and direction of the linear relationship between X and Y, ranging from −1 (perfect negative) through 0 (no linear relationship) to +1 (perfect positive). R² is simply r squared, so it discards the sign and only conveys strength. Use r to describe direction, R² to describe explanatory power.

What are residuals and why are they useful?+

A residual is the vertical distance between an observed Y value and the value the regression line predicts for that X. Residuals reveal whether your linear model is appropriate: if residuals are randomly scattered around zero, the linear fit is sound. If they form a curve, fan out, or cluster, a non-linear model - or a transformation - is likely a better choice. Toggle Show Residual Plot above to inspect them visually.

How does the calculator detect and highlight outliers?+

When Highlight Outliers is enabled, the scatter plot calculator standardizes each residual (subtract the mean residual, divide by the residual standard deviation) and flags any point whose standardized residual exceeds ±2. Flagged points render in red so you can investigate them - they may indicate data-entry typos, unusual cases, or genuine anomalies that you may want to exclude before re-running the regression.

Can the calculator handle non-linear or multiple regression?+

This tool is designed for simple linear regression with one independent variable (X) and one dependent variable (Y). Polynomial, exponential, logarithmic, and multiple-regression models are not currently supported. For non-linear data, you can sometimes apply a transformation (e.g. log Y) and fit a linear model in the transformed space.

How do I enter data into the scatter plot maker?+

Type your X values into the X Values field and your Y values into the Y Values field. Both comma-separated (e.g. 1, 2, 3) and space-separated (e.g. 1 2 3) formats are accepted, and you can paste directly from Excel or Google Sheets - the parser handles tabs and newlines too. The data preview table underneath the inputs shows each parsed pair so you can verify alignment before plotting.

What file formats can I download my scatter plot in?+

Four formats are supported: PNG (lossless raster, best for slides and web), JPEG and JPG (smaller raster files, good for emailing), and SVG (scalable vector format, perfect for print, LaTeX, and large displays). SVG is recommended whenever you need to scale the chart up without quality loss.

Is my data uploaded to a server?+

No. All computation - plotting, the regression equation, R², residuals, and outlier detection - happens locally in your browser using JavaScript. Your data is never transmitted, stored, or logged. That's also why no sign-up is required: there's nothing for us to store on your behalf.

Is the scatter plot maker and calculator free?+

Yes. The scatter plot maker and calculator is 100% free, browser-based, and unrestricted - no sign-up, no watermark, no usage caps, and no paid tier. Export as many charts as you need, in any of the four supported formats.

Can I customize the appearance of my scatter plot?+

Extensively. The Style and Series & Color sections let you change marker color and size, background color, text color, trendline color, and legend position. The Animation and Grid sections control hover effects, gridlines, tooltip theming, and animation speed. You can produce a chart that matches your brand, journal style guide, or presentation theme.

Related Calculators & Chart Makers

Explore other free, browser-based tools for data visualization and statistical analysis - all with the same no-sign-up, your-data-stays-local approach as the scatter plot calculator.

Choosing an Export Format

The scatter plot maker exports in four formats so the chart fits whatever document or platform you're using.

  • PNG - Lossless raster. Best default for slides, web, and reports where you need transparency or crisp text.
  • JPEG / JPG - Compressed raster. Smaller file size for email attachments and image-heavy documents.
  • SVG - Scalable vector. Perfect for print, LaTeX, posters, and any context where the chart will be resized.
Featured on LaunchIgniterFeatured on findly.tools