Skills Earth Sciences

Scatter plots

Scatter plots are the most basic type of graph. They display observations for two variables as a collection of points on the axes of variation. Scatter plots can be used to study the relationship or correlation between two quantitative variables. Measurements of x-axis variables do not have a consecutive order. If your data does, you are better of using a line graph (see Line graphs).

When you use a scatter plot to study the correlation between two variables, you can add a trend line or line of best fit to assess the strength of the correlation14. Trend lines can be linear or according to another formulae (power, quadratic, polynomial etc.), depending on the type of relationship you expect and observe. However, always be aware that a correlation does not automatically indicate a causal link between your variables: an unknown third factor might play a role as well.

Always display a formula and an R squared coefficient to your trendline. Furthermore, gridlines are most often left out of a scatterplot as it’s the scale of one variable that is of interest.

Especially when you are dealing with large datasets, your data points may overlap. To visualise your data as accurately as possible, keep all your datapoints visible. You can do this by making the outline of your datapoints bold, keeping them small, or adding uncertainty bars.

Independent variables

Scatter plots differ from line graphs as, for the former, it is often unsure if one or which of the variables is the dependent or independent variable. If you know which of your variables is the independent variable, plot this one on the x-axis.

If you want to plot the same variables for more than one dataset in one plot, e.g., data from different study sites, this can be easily done by giving the datasets different colours or symbols, or by using multiple windows/facets. Make sure to always label the datasets so your audience can easily see which data is from what dataset. When there is enough space in your graph, it is best to label your datasets in the plot instead of adding a separate legend.

For an example of a scatter plot with multiple windows and trend lines, see Fig. 5 from  Marret et al. (2001).

There are three main ways to a third quantitative variable (i.e. a third axis) to a scatter plot:

  1. The third variable can be represented by a colour gradient. However, be aware that it is easier for the human eye to interpret separate axes than colour spectra or sizes.
  2. You can also use different sizes of the data points to add a third variable. Then, your scatter plot will be called a bubble chart. However, in this way, your audience will need to interpret surface area instead of a location on an axis. Interpreting surface areas is less intuitive however, so this may make understanding your graph more complicated than need be.
  3. Use multiple windows with separate plots. This is often best because these are easier to interpret.