🤿 Lost in a Sea of Datapoints? Do this.

👋 Hi!

While working on my ggplot2 uncharted project this week, I revisited many old posts from the R Graph Gallery. It reminded me of a classic issue in data visualization, and how to fix it.

It probably happened to you already. It can lead to misleading conclusions. In the worst cases, it can make a chart completely unreadable.

The good news is: the fix is easy once you know it. So you won't need to worry about the FBI knocking on your door for producing a bad chart next time you face the problem! 🙃

The problem: overplotting

Overplotting happens when you have too many data points, so many that they start overlapping.

This often happens in scatterplots. When it does, the figure becomes messy, unclear, and sometimes downright misleading.

Do we have three groups here? Are there denser regions than others?

Hard to tell. Impossible, actually.

The solution: 2D density charts

2D density charts are a powerful — and often underused — way to deal with overplotting.

The idea is simple:
- split the 2D space of your chart into small areas,
- count how many points fall into each, and
- color those areas based on that count.

If the areas are squares, it's called a 2D histogram. Some people say "2D heatmap," but I don’t like that term much.

If you use hexagons instead of squares, it's a hexbin chart.

You can also go one step further and compute a smoothed density estimate, like you'd do with a regular density plot instead of a histogram. If you draw lines connecting areas of equal density, you get a contour plot.

Which one should you use?

Honestly, I don’t think there’s a strict rule.

If your data has strong patterns or clusters like in the first example, smoothed density plots and contour lines can make them very clear. But if your data is more evenly spread, these plots might add more confusion than clarity.

What really matters is this: play with the parameters.

Just like you’d adjust the bin size in a histogram to avoid missing key patterns, you should do the same here. Adjust the granularity of the grid, the bandwidth of the density estimator, or the opacity of your scatterplot points. It makes a big difference.

That’s it! I hope this added a small boost to your dataviz toolbox!

And if you want to make these charts yourself, I’ve got tons of examples ready in R, Python, D3.js, and React.

The wind's about to pick up here, so it’s time for me to go wing-foiling.

See you next week!

Yan

Ps: I'm making good progress on a new tool that lets you create a portfolio like mine in just a few minutes. I’ve got 2 tester seats left, hit reply if you’d like early access!

Yan Holtz

Find me on X, LinkedIn, or check my Homepage

👋 By the way, here is how I can help!

Master R: Join my productive R workflow online course, already helping hundreds to excel in R, Quarto, and GitHub.
Team Training: Hire me to train your team on Data Visualization and Programming.
Engaging Talks: Book me for short, impactful talks on Data Visualization and Programming.

Check yan-holtz.com or hit reply any time! I love hearing from you.

https://preview.convertkit-mail2.com/unsubscribe
Unsubscribe · Preferences

🤿 Lost in a Sea of Datapoints? Do this.

The problem: overplotting

The solution: 2D density charts

Which one should you use?

Subscribe to Dataviz Universe