Good data practices

As a student, you may need to collect data for your university assignments. This requires a good data practice.

Good data practice involves organized, professionally informed, ethical and legal collection, storage and management of data.

Having a good data practice enhances the transparency and credibility of your academic work. It makes it easier for you to locate your data and reduces the risk of data loss and unauthorized access to your information.

Data practice is also known as data management.

Good data practice

Good data practice requires you to consider how you will plan, collect, organise, document, store, and delete data when it is no longer needed. Additionally, you must familiarise yourself with any regulations and guidelines that may apply when working with data.

In the following sections, you will learn more about what this entails and find tips for developing a good data practice.

Planning data collection

During the planning phase, you should reflect on how you will collect, structure, organise, store, and delete data. This is essential for ensuring that you manage your data effectively and for protecting the data and informants involved.

It may be beneficial to document your decisions in a log book or similar format.

Data collection

The type of data you collect will depend on your academic discipline, the subject you are investigating, and the methods you are using. Common categorisations of data include:

Primary and secondary data

Primary data refers to information collected directly from the source, such as through observations, surveys, interviews, or experiments.
Conversely, secondary data consists of information gathered in a context other than the current study. This data is often processed and organised in a meaningful way and can be sourced from books, articles, reports, statistics, or public databases.

Qualitative and quantitative data

Qualitative data describes the characteristics of a person, object, or situation. This can include feelings, attitudes, opinions, experiences, or descriptions. Qualitative data is often not quantified in numerical terms but can be analysed using text, images, or audio. When collecting qualitative data, such as interviews, you may need to use a voice recorder. At AU Library, we lend voice recorders to students.
Quantitative data, on the other hand, describes measurable characteristics, such as quantities, measurements or data given in measurable terms. Examples of quantitative data include counts, lengths, weights, temperatures, times, or other measurable attributes.

Finding data collected by others

It is possible to find open data that you can use for your assignments. These are often published by public institutions, organisations, or researchers.

You can find data in various repositories:

Institutional repositories contain data from researchers at their respective universities.
Field-specific repositories include data from one or a few related fields but may be open to all.
General repositories cater to everyone, regardless of institution, discipline, or data type.

You can search for repositories at Re3data.org, a database of available data repositories. Here, you can filter repositories based on criteria relevant to your assignment.

A widely used general repository is Zenodo.

Organising data

Once you have collected data, it is important to think about how you will organise it. This includes how you will name and structure your folders; making it easier to navigate your dataset.

Data Formats

Data is stored in files, which come in various formats. A good practice is to save your files in formats that can be opened by anyone, regardless of whether they have access to specific software. This approach not only benefits others but also serves your future needs.

Examples of open-access file formats include plain text files (.txt), PNG files (.png), and CSV files (.csv). In contrast, proprietary formats, such as Word documents (.doc or .docx) and Excel spreadsheets (.xlsx), may limit accessibility.

Aarhus University provides Microsoft 365 to all students. There are also alternatives to commercial software.

File Structure and Naming Conventions

Keep your files organised by using meaningful names, logical folder structures, and by using a consistent file naming system.

By maintaining consistency in your file naming, you increase the chances of easily locating the correct file when you or others need it.

Computers typically sort files in File Explorer (PC) or Finder (Mac) either alphabetically or numerically. Therefore, it’s advisable to place the most important information at the beginning of the file name.

Examples of naming conventions:

[initials]_[method]_[topic]_[YYYYMMDD]_[version]_[xxx]
[project#]_[method]_[version]_[YYYYMMDD]_[xxx]
[filetype]_[initials]_[date]_[xxx]
[YYYYMMDD]_[chapter name]_[document name]_[version]

Data documentation

Documenting data means providing sufficient information about your dataset. This ensures that you or others can understand, interpret, and utilize the data at a later point in time.

Depending on the context and the type of research you are conducting, you may include the following information:

Details about the equipment used, such as brand and model, settings, and calibration information
Information about the methods or theories applied
Text for questionnaires, interview templates, discussion guides, etc.
Information about who collected the data and when
Key features of the methodology, such as sampling techniques, whether the experiment was blinded, and how participants were identified and grouped
Legal and ethical agreements regarding the data, such as consent forms, data licenses, and approval documents
References to any secondary data you have utilized
Details about file formats
Information about the software used to generate or process data, including version numbers and platforms

Documentation can be recorded in various formats, including:

A README file: A structured text file where you describe your dataset and explain how it was collected and analysed
A log book: A place to record your observations, interpretations, and empirical data
A code book: A document where you define the variables you are using, their relationships, units of measurement, and how you note any gaps in the dataset

The above section is based on the data management section from the University of Copenhagen’s learning resources for digital literacy, 2023. CC-BY-NC-SA.

Data storage

When you collect data, you must also decide where to store it. There are various options available, such as:

Hard drives on computers
External hard drives
USB flash drives
Servers
Cloud solutions

It is your responsibility to ensure that the data you are working with is stored securely to prevent data loss or unauthorized access. Depending on the types of data you are handling and how they are classified, different levels of security may be required for storage.

Aarhus University offers OneDrive as a secure location for data storage. You can store all types of data there, as long as the data is pseudonymised or anonymised.

Backup

It is also important to remember to back up your data or ensure that the storage infrastructure you are using does so for you. For instance, if you store your data on OneDrive, backups of files and folders are created automatically, allowing you to access them even if you lose your computer. Conversely, if your data is only stored on your computer’s hard drive and it is lost, the data will be irretrievable.

A golden rule for backups is known as the 3-2-1 rule. This principle states that you should keep 3 copies of your data on 2 different media, with 1 of those media being a cloud solution like OneDrive.

Rules and guidelines

As a student, you are responsible for your own data management, which means it is your duty to comply with the law and the university's guidelines.

Depending on the type of data you collect, it may be necessary to familiarise yourself with various legal frameworks.

If your data includes recognisable living individuals (or individuals who have passed away within the last 10 years), it means that your data contains personally identifiable information, and you must adhere to the EU’s GDPR regulations.

Note that multiple regulations may apply simultaneously.

For example, it is possible to have an image that is both copyrighted and contains recognisable living individuals (or individuals who have passed away within the last 10 years).

Data and copyright

If your dataset includes works created by others, such as newspaper articles or photographs, these works may be protected by copyright.

If a work is protected by copyright and there is no agreement or license granting you permission to use it, you must obtain permission to use the work yourself.

As a student, you may use data that is not protected by copyright, or data that has been published in an open repository. If a creator has released a work under a Creative Commons license, you can use that work in accordance with the terms of the CC license.

GDPR regulations

There are many types of data that may contain personal identifiable information, such as interviews, surveys, images, and more.

Personal identifiable information refers to information that can be used to identify a specific individual.

If your data contains personal identifiable information that can be used to identify living individuals (or individuals who have passed away within the last 10 years), you must comply with the EU’s GDPR regulations.

Aarhus University has clear guidelines regarding the handling of personal identifiable information in relation to assignments, which you are required to follow.

Different types of personal data require varying levels of security for storage. Read more about Aarhus University’s classification of data.

Consent forms

It is legally required and a central part of good academic practice to obtain informed consent when collecting data about identifiable individuals (e.g., interview materials, images, and observations). You must be able to document this consent in writing. Aarhus University provides a template for a consent form that you can use.

Pseudonymisation and anonymisation

You can choose to pseudonymise or anonymise your data.

When you pseudonymise your data, you create a confidential record that allows you to identify individuals again, for instance, by assigning each person a numerical code or a pseudonym. It is not sufficient to simply assign a numerical code or pseudonym to each individual; you must also remove any identifying characteristics from the dataset that could allow individuals to be identified without your record.

When you anonymise data, you delete all information that could be used to identify individuals. This means you do not create a record that allows you to identify the individuals again. Proper anonymisation is therefore irreversible.

Once data has been irreversibly anonymised, it is no longer considered personal data and is not subject to data protection laws.

In some cases, anonymising your data may therefore be advantageous.

Aarhus University offers various tips on how to pseudonymise and anonymise data correctly.

Courses from the library

AU Library offers courses and workshops on computer programs and tools designed to support and motivate all students, researchers, and instructors at AU in their work with data.

These include tools such as R, Python, Whisper, Transcriber, NVivo, Voyant, VOSviewer, LSEG Workspace, Orbis, and more.

Additionally, the library hosts both open and tailored courses on good data practices, aimed at students who wish to work systematically, consciously, and structured with data. All courses are open to all students, regardless of their faculty affiliation.

See course calendar

See more about Arts Datalab (in Danish)

See more about BSS Datalab

Revised 15.06.2026

Collect and return

Libraries

Facilities

Need help?

When you study

When you find literature and data

When you write assignments