Step-by-step guidance for implementing FAIR data principles in research organizations, from metadata standards to repository selection.
The FAIR principles (Findable, Accessible, Interoperable, Reusable) were published in 2016 to address a simple problem: most research data, even when technically "available," is practically impossible for others to find, access, understand, or reuse. An estimated 80% of research data is never reused because it lacks the metadata, documentation, or accessibility needed to make it useful beyond the original research team.
FAIR is not a standard or a specification. It is a set of guiding principles that can be implemented in many ways. This flexibility is both its strength and the source of confusion about how to actually do it.
Data that exists but cannot be found is effectively invisible. Making data findable requires:
Every dataset needs a globally unique, persistent identifier. Digital Object Identifiers (DOIs) are the most widely adopted option. Most institutional and discipline-specific data repositories assign DOIs automatically upon deposit.
Do not use URLs as identifiers. URLs change when servers move, domains expire, or directory structures reorganize. A DOI resolves to the current location regardless of where the data physically lives.
Metadata makes data findable through search. At minimum, every dataset should have:
Use a standard metadata schema. Schema.org, Dublin Core, and DataCite Metadata Schema provide well-supported options. Discipline-specific schemas (DDI for social science, ISA for life sciences) add domain-relevant fields.
Metadata should be registered in searchable catalogs. Options include:
Findable data that cannot be accessed is a tease. Accessibility is about clear, standardized retrieval mechanisms.
Data should be retrievable through standard, open protocols. In practice, this usually means HTTPS. Repository APIs (REST, OAI-PMH) provide programmatic access.
Important distinction: FAIR does not require data to be open access. Sensitive data (patient records, commercially confidential data) can be FAIR while maintaining appropriate access controls. The key is that the access conditions are clearly stated and the mechanism for requesting access is documented.
Even when data itself is restricted, metadata should be openly accessible. This allows potential users to discover the dataset and understand how to request access.
For restricted data, implement clear access procedures:
Data becomes exponentially more valuable when it can be combined with other datasets. Interoperability requires shared languages and formats.
Use community-adopted controlled vocabularies and ontologies for describing your data:
Using standard terms rather than ad hoc descriptions ensures that your "blood glucose concentration" means the same thing as another researcher's "blood glucose concentration."
Prefer open, well-documented file formats over proprietary ones:
If proprietary formats are unavoidable (instrument-specific files), provide a parallel export in an open format.
Where practical, use linked data approaches to connect your data to related resources. RDF (Resource Description Framework) and JSON-LD enable machine-readable connections between datasets, publications, samples, and concepts.
This is where many FAIR implementations fall short. Data can be findable, accessible, and interoperable but still unusable because the recipient does not have enough context to interpret it correctly.
Every dataset needs an explicit license. Without one, potential reusers face legal uncertainty. Common options:
State the license in the metadata. Do not make people guess.
Document how the data was generated:
This documentation enables others to evaluate the data's fitness for their purpose and to replicate the processing if needed.
Follow discipline-specific data standards where they exist. The Minimum Information standards (MIAME for microarray experiments, MIBBI for biological investigations) define what information must accompany specific data types.
Bottom line: FAIR principles are about making research data a durable, reusable asset rather than a disposable byproduct of a single study. Start with persistent identifiers and rich metadata, use open formats and standard vocabularies, document everything, and license explicitly. Perfection is not the goal; meaningful improvement from the current state is.
Whether you're modernizing your infrastructure, navigating compliance, or building new software - we can help.
Book a 30-min Call