5 Scatter Plot Mistakes That Make Your Data Look Terrible

I still remember the first scatter plot I showed to my manager. It was a mess. Points everywhere, colors that clashed, and a title that literally said "Chart 1". She looked at it for about three seconds before asking, "What am I supposed to see here?"

That moment taught me something important: a chart is only as good as the story it tells. Over the years, I have stumbled my way through countless visualization fails. Here are the five mistakes I kept making and how I eventually fixed them.

Mistake 1: Too Many Data Points Without Any Transparency

This one gets everyone. You have 500 data points, you plot them all, and suddenly your chart looks like someone spilled blue paint on it. You literally cannot see anything.

The problem is called overplotting, and it happens when points stack on top of each other.

The fix? Add transparency using the alpha parameter. In most tools, setting alpha to somewhere between 0.3 and 0.6 lets overlapping points blend together, revealing density patterns you could not see before.


python
import matplotlib.pyplot as plt
import numpy as np

# Simulating 1000 clustered points
np.random.seed(42)
x = np.concatenate([np.random.normal(10, 2, 500), np.random.normal(20, 3, 500)])
y = np.concatenate([np.random.normal(15, 2, 500), np.random.normal(25, 3, 500)])

# Without alpha - just a blob
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.scatter(x, y, c='steelblue')
plt.title("Without Transparency")

# With alpha - now you can see two clusters
plt.subplot(1, 2, 2)
plt.scatter(x, y, c='steelblue', alpha=0.4)
plt.title("With Alpha = 0.4")

plt.tight_layout()
plt.show()

Try this with your own data. The difference is night and day.

Mistake 2: Using Crazy Rainbow Color Schemes

Okay, I get it. When you discover that you can color each point by a third variable, it feels like unlocking a superpower. But then you end up with a chart that looks like a bag of Skittles exploded.

The real problem is not just aesthetics. Rainbow color schemes are genuinely bad for data. They create false boundaries between colors and make it nearly impossible for colorblind viewers to interpret your work.

Instead, use sequential or diverging color palettes. Sequential palettes like blues or greens work great for continuous variables. Diverging palettes like coolwarm are perfect when you have a meaningful center point like zero.

I personally stick with viridis for most things. It is perceptually uniform, works in grayscale, and is colorblind-friendly. Took me years to stop fighting this.

Mistake 3: No Context in Axis Labels

Here is a chart title I actually wrote once: "Monthly Data"

That is it. No units. No explanation of what the axes represent. Looking back, I have no idea what that chart was about, and I made it.

Your axis labels need to tell people what they are looking at and in what units. "Revenue (USD in thousands)" beats "Revenue" every single time. "Distance (km)" is infinitely clearer than "Distance".

This seems obvious, but go look at your last five charts. I bet at least one of them is missing units somewhere.

Mistake 4: Making Every Point the Same Size

When every dot is the same size, you are leaving money on the table. Size can encode a third variable in a way that is immediately intuitive to human brains.

Bigger dot? More important. Smaller dot? Less significant. This is called a bubble chart, and it turns a 2D visualization into something that feels 3D without the headaches of actual 3D charts.


python
# Example: City populations
cities = ['NYC', 'LA', 'Chicago', 'Houston', 'Phoenix']
latitude = [40.7, 34.0, 41.9, 29.8, 33.4]
longitude = [-74.0, -118.2, -87.6, -95.4, -112.1]
population = [8.3, 4.0, 2.7, 2.3, 1.6]  # in millions

plt.figure(figsize=(8, 6))
# Size scaled by population
plt.scatter(longitude, latitude, s=[p * 100 for p in population], 
            alpha=0.6, c='coral', edgecolors='darkred', linewidth=1)

for i, city in enumerate(cities):
    plt.annotate(city, (longitude[i], latitude[i]), 
                 textcoords="offset points", xytext=(0, 10), ha='center')

plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.title("US Cities by Population")
plt.show()

Now your scatter plot conveys three dimensions of information at a glance.

Mistake 5: Forcing a Trendline When There Is No Trend

I used to slap a trendline on every scatter plot I made. It felt professional, like adding a finishing touch.

But here is the thing: if your data has no clear linear relationship, that trendline is lying. It suggests a pattern where none exists. Worse, it gives your audience a false sense of confidence in a nonexistent correlation.

Before adding a trendline, look at your R-squared value. If it is below 0.3, seriously reconsider whether that line belongs. Sometimes the honest answer is "there is no relationship here," and that is valuable information too.

The Takeaway

Good scatter plots are not about being fancy. They are about being clear. Every choice you make from colors to labels to point sizes should serve the story you are trying to tell.

Next time you build a chart, run through this checklist:

Can I see individual clusters despite overlapping points?
Are my colors accessible and meaningful?
Do my labels include units?
Am I using size intentionally?
Does my trendline actually represent reality?

If you can answer yes to all five, you are already ahead of most people. Trust me on this one.

Scatter Plot Maker