Skills Earth Sciences

A short remark on data transformation

A major part of data visualisation occurs prior to the visualisation itself: In order to visualise your raw data in the form of a graph or map, most of the time you will need to transform it to get it in the correct form. Although data transformation is beyond the scope of this module as it can use its own complete module, it is important to think about this whenever you want to visualise data.

Data transformation can consist of various steps, e.g., importing data, renaming variables, reordering observations, filtering and selecting data, and grouping. For all of these steps, you will need to make decisions. As these all affect the way your audience sees and interprets your data (and, in the case of exploratory graphs, the way you interpret your own data), you will have to think carefully about this to prevent misleading your audience and yourself. Therefore, data visualisation is strongly connected to scientific integrity. Additionally, in the steps of data transformation, it is easy to lose track of the changes you’ve made. Therefore, it is important to log everything you change to your raw data, and always keep a copy of the original data. Keep this in mind when you are making figures yourself.

If you want to learn more about data transformation, we recommend you look at chapter 5, Data Transformation from the e-book R for Data Science by Hadley Wickham and Garret Grolemund.