Skills Earth Sciences

Creating figures (graphs) using programming languages

Using a programming language to create your figures might initially take more time than creating your figures in Excel, especially if you do not have any programming experience. However, making your graphs using a programming language has several advantages over the use of MS Excel. It is worth the time to learn it, especially if you want to make more complicated figures, e.g., during your MSc thesis or when you are writing a manuscript for publication. Note that some supervisors are fine with the use of Excel figures, but others are not, so discuss this with your supervisor. They might also have tips for what programs and packages work best for your data and research.

The main advantages of using a programming language for creating figures are:

  • Using a programming language will give you a lot more freedom in the layout and format of your figure.
  • Using code to make your graphs allows you to save a script with command lines to produce a graph. This makes your workflow more transparent. Moreover, you can later use the same script when you want to make a similar figure with different dataset, saving you time.
  • A lot of packages and libraries for specific scientific fields and graph types are available on the internet, including help pages.
  • Some languages allow you to create 3D or animated figures and maps.

Several programming languages exist, and people might prefer different languages. However, each language has its own point of focus and advantages. Useful packages of the three most used languages at the Earth Sciences department at Utrecht University will be discussed below.

Python does not have its own built-in function for data visualisation. There are however several plotting libraries available that allow you to create high-quality figures. This gives you many possibilities for creating graphs. An extensive library with example plots and corresponding code is available here. When you use the python software (see Programming module), the website also introduces the python syntax to produce such figures yourself.

Libraries:

  • The Matplotlib library has functions for 2D and 3D plotting, and for interactive figures. It provides a lot of freedom for data visualisation. Various other data visualisation libraries are built on Matplotlib.
  • Pandas visualisation is developed for data manipulation and data cleaning, but also has some visualisation features. It is built on Matplotlib but has a more user-friendly interface and you will need to write less code to make your graphs.
  • Seaborn is suitable for advanced data visualisation. It has a lot of built-in functions and is particularly useful for visualising time series. It also has some built-in styling formats.
  • Ggplot is based on the ggplot2 package, which is built on the grammar of graphics. This makes sure your graph contains some basic elements and simplifies your code.
  • Plotly is an online platform for data visualisation, but has also an extension for Python and R. It allows you to create interactive plots and 3D graphs.

Python and the libraries mentioned above are all open source. For more information on making figures in Python, you can take a look at the Programming module. Many free tutorials, including complete beginner courses, are available online.

R is a programming language focused on statistical computing. It is connected to RStudio, which is a software with a more user-friendly interface that has several built-in functions for data visualisation. R and RStudio are specifically designed for statistics and data analysis. However, various additional data visualisation packages are available.

Packages:

  • ggplot2 is one of the most popular visualisation packages. It is based on grammar of graphics (R4DS). This package makes creating custom plots fairly easy, because it takes care of small details so that you do not have to write commands for those.
  • Lattice is specifically useful for visualising multivariate data. It allows you to create plots with different variables in panels.
  • Analogue is developed to work with paleoecological data. It is specifically useful to create species composition diagrams, as a part of stratigraphic plots.
  • RColorBrewer helps you to easily manipulate colours in your graphs. It has several sequential, diverging, and qualitative colour palettes.
  • Plotly is an online platform for data visualisation, but has also an extension for Python and R. It allows you to create interactive plots and 3D graphs. 

R, R studio and all packages are open-source. Many free tutorials and books are available online.

Tutorials/books:

  • The website ‘R for data science’ offers a good, free, e-book with step-by-step tutorials to teach you the most important data science tools in R, including a chapter on basic visualisation techniques in R using ggplot2.
  • The website ‘R for Paleolimnology’ offers similar step-by-step tutorials for visualizing paleolimnological data, including creating stratigraphic charts.

Matlab is mainly used for data analysis and modelling. The MATLAB visualisation app allows you to visualise data, and to view and explore data using interactive visualisations of MATLAB plots.

In contrast to Python and R, you need a license to download Matlab. As a UU student, you can use it for free by using SolisWorkspace, or on UU computers. You can also download it for free by using your @students.uu.nl email address. You can find more information on this here.