As a student, you may need to collect data for your university assignments. This requires a good data practice.
Good data practice involves organized, professionally informed, ethical and legal collection, storage and management of data.
Having a good data practice enhances the transparency and credibility of your academic work. It makes it easier for you to locate your data and reduces the risk of data loss and unauthorized access to your information.
Data practice is also known as data management.
Good data practice requires you to consider how you will plan, collect, organise, document, store, and delete data when it is no longer needed. Additionally, you must familiarise yourself with any regulations and guidelines that may apply when working with data.
In the following sections, you will learn more about what this entails and find tips for developing a good data practice.
During the planning phase, you should reflect on how you will collect, structure, organise, store, and delete data. This is essential for ensuring that you manage your data effectively and for protecting the data and informants involved.
It may be beneficial to document your decisions in a log book or similar format.
The type of data you collect will depend on your academic discipline, the subject you are investigating, and the methods you are using. Common categorisations of data include:
It is possible to find open data that you can use for your assignments. These are often published by public institutions, organisations, or researchers.
You can find data in various repositories:
You can search for repositories at Re3data.org, a database of available data repositories. Here, you can filter repositories based on criteria relevant to your assignment.
Once you have collected data, it is important to think about how you will organise it. This includes how you will name and structure your folders; making it easier to navigate your dataset.
Data is stored in files, which come in various formats. A good practice is to save your files in formats that can be opened by anyone, regardless of whether they have access to specific software. This approach not only benefits others but also serves your future needs.
Examples of open-access file formats include plain text files (.txt), PNG files (.png), and CSV files (.csv). In contrast, proprietary formats, such as Word documents (.doc or .docx) and Excel spreadsheets (.xlsx), may limit accessibility.
Aarhus University provides Microsoft 365 to all students. There are also alternatives to commercial software.
Keep your files organised by using meaningful names, logical folder structures, and by using a consistent file naming system.
By maintaining consistency in your file naming, you increase the chances of easily locating the correct file when you or others need it.
Computers typically sort files in File Explorer (PC) or Finder (Mac) either alphabetically or numerically. Therefore, it’s advisable to place the most important information at the beginning of the file name.
Examples of naming conventions:
Documenting data means providing sufficient information about your dataset. This ensures that you or others can understand, interpret, and utilize the data at a later point in time.
Depending on the context and the type of research you are conducting, you may include the following information:
Documentation can be recorded in various formats, including:
The above section is based on the data management section from the University of Copenhagen’s learning resources for digital literacy, 2023. CC-BY-NC-SA.
When you collect data, you must also decide where to store it. There are various options available, such as:
It is your responsibility to ensure that the data you are working with is stored securely to prevent data loss or unauthorized access. Depending on the types of data you are handling and how they are classified, different levels of security may be required for storage.
Aarhus University offers OneDrive as a secure location for data storage. You can store all types of data there, as long as the data is pseudonymised or anonymised.
Read more about data storage on Aarhus University’s website about processing personal data.
It is also important to remember to back up your data or ensure that the storage infrastructure you are using does so for you. For instance, if you store your data on OneDrive, backups of files and folders are created automatically, allowing you to access them even if you lose your computer. Conversely, if your data is only stored on your computer’s hard drive and it is lost, the data will be irretrievable.
A golden rule for backups is known as the 3-2-1 rule. This principle states that you should keep 3 copies of your data on 2 different media, with 1 of those media being a cloud solution like OneDrive.
As a student, you are responsible for your own data management, which means it is your duty to comply with the law and the university's guidelines.
Depending on the type of data you collect, it may be necessary to familiarise yourself with various legal frameworks.
If your data includes recognisable living individuals (or individuals who have passed away within the last 10 years), it means that your data contains personally identifiable information, and you must adhere to the EU’s GDPR regulations.
Note that multiple regulations may apply simultaneously.
For example, it is possible to have an image that is both copyrighted and contains recognisable living individuals (or individuals who have passed away within the last 10 years).
If your dataset includes works created by others, such as newspaper articles or photographs, these works may be protected by copyright.
If a work is protected by copyright and there is no agreement or license granting you permission to use it, you must obtain permission to use the work yourself.
As a student, you may use data that is not protected by copyright, or data that has been published in an open repository. If a creator has released a work under a Creative Commons license, you can use that work in accordance with the terms of the CC license.
There are many types of data that may contain personal identifiable information, such as interviews, surveys, images, and more.
Personal identifiable information refers to information that can be used to identify a specific individual.
If your data contains personal identifiable information that can be used to identify living individuals (or individuals who have passed away within the last 10 years), you must comply with the EU’s GDPR regulations.
Different types of personal data require varying levels of security for storage. Read more about Aarhus University’s classification of data.
It is legally required and a central part of good academic practice to obtain informed consent when collecting data about identifiable individuals (e.g., interview materials, images, and observations). You must be able to document this consent in writing. Aarhus University provides a template for a consent form that you can use.
You can choose to pseudonymise or anonymise your data.
When you pseudonymise your data, you create a confidential record that allows you to identify individuals again, for instance, by assigning each person a numerical code or a pseudonym. It is not sufficient to simply assign a numerical code or pseudonym to each individual; you must also remove any identifying characteristics from the dataset that could allow individuals to be identified without your record.
When you anonymise data, you delete all information that could be used to identify individuals. This means you do not create a record that allows you to identify the individuals again. Proper anonymisation is therefore irreversible.
Once data has been irreversibly anonymised, it is no longer considered personal data and is not subject to data protection laws.
In some cases, anonymising your data may therefore be advantageous.
Aarhus University offers various tips on how to pseudonymise and anonymise data correctly.
[Translate to English:]
Der findes mange typer data, som kan indeholde personhenførbare oplysninger, f.eks. interviews, spørgeskemaer, billeder m.v.
Personhenførbare oplysninger er oplysninger, der kan bruges til at identificere en bestemt person.
Hvis dine data indeholder personhenførbare oplysninger, der kan bruges til at identificere nulevende personer (eller personer som har været døde i mindre end 10 år), skal du forholde dig til EU’s GDPR-forordning.
Forskellige typer af persondata, kræver forskellige niveauer af sikkerhed i opbevaringen. Læs mere om Aarhus Universitets’ klassifikation af persondata.
Det er lovpligtigt og en central del af god akademisk praksis at sikre sig informeret samtykke, når du indsamler data om identificerbare personer (f.eks. interviewmateriale, billeder og observationer). Du skal kunne dokumentere samtykket på skrift. Aarhus Universitet har en skabelon til samtykkeerklæringer, som du kan bruge.
Du kan vælge at pseudonymisere eller anonymisere dine data.
Når du pseudonymiserer dine data, laver du en fortrolig fortegnelse, som tillader dig at identificere personerne igen, f.eks. ved at give hver person en talkode eller et pseudonym. Det er ikke tilstrækkeligt, at du blot giver hver person en talkode eller et pseudonym. Du skal også fjerne de kendetegn i datasættet, som gør det muligt at identificere personer uden din fortegnelse.
Når du anonymiserer data, sletter du alle oplysninger, som kan bruges til at identificere personer. Dermed laver du ikke en fortegnelse, som tillader dig at identificere personerne igen. Korrekt anonymisering er derfor uigenkaldelig.
Når du uigenkaldeligt har anonymiseret dine data, er det ikke længere persondata og er ikke underlagt persondatalovgivning.
I nogle tilfælde kan det derfor være en fordel at anonymisere data.
Aarhus Universitet har forskellige tips til, hvordan du pseudonymiserer og anonymiserer korrekt.
Of course, you are also welcome to contact your local AU Library.
AU Library offers courses and workshops on computer programs and tools designed to support and motivate all students, researchers, and instructors at AU in their work with data.
These include tools such as R, Python, Whisper, Transcriber, NVivo, Voyant, VOSviewer, LSEG Workspace, Orbis, and more.
Additionally, the library hosts both open and tailored courses on good data practices, aimed at students who wish to work systematically, consciously, and structured with data. All courses are open to all students, regardless of their faculty affiliation.