General search engines
There are search engines that allow you to search several repositories:
- DataciteCommons : set up by Datacite, the global provider of DOIs (Digital Object Identifiers) for scientific data, Datacite Commons is a search engine that lists nearly 20 million datasets from nearly 2,000 data repositories
- Dataset Search : offered by Google, the tool indexes 25 million datasets. The datasets found by the search engine must come from sites that comply with the schema.org structured format
- OpenAire : is a platform that reports both publications and research data, mainly from EU-funded projects. More than 100 repositories are harvested. The search can be filtered by project name, funder, type of data, etc.
- Mendeley Data : created by the publisher Elsevier, is both an engine indexing more than 20 million items of data and a repository that accepts deposits after the creation of an account
- Data Citation Index : provides a single point of access to quality research data from worldwide repositories across all disciplines, as well as citation metrics. The data is linked to scientific articles in Web of Science
- Dimensions.ia et Lens.org : are publication aggregators, each offering a ‘dataset’ filter to search only the datasets of the different types of harvested publications
Repositories
There are many repositories that have the particularity of allowing both searching and sharing data. But there are also a significant number of repositories that only allow you to consult data, if you are not affiliated to the institution or project. Data repositories always include search engines with filters to facilitate access to the data they contain. The Registry of Research Data Repositories (Re3Data.org) lists more than 3,000 repositories that can be filtered by subject, country, etc. Re3data provides a precise description of the repositories.
There are different types of repositories:
Generic
National
By institution/organisation
Thematic
- PANGAEA (Environmental science)
- Materials Cloud Archive (Materials science)
- PubChem (Chemistry)
- Nakala (Social sciences)
- Software Heritage (Codes – Software)
- GenBank (Medical science – biology)
Data Papers
A data paper (or data article) is a peer-reviewed scientific publication whose main purpose is to describe one or more datasets. Using Web of Science, you can filter the results by selecting only data papers as the type of document. Here is a list of generic and thematic data journals (Chemistry, Physics and related disciplines):
Topic page: Data Paper
Supplementary Materials
In supplementary materials, which are an integral part of a scientific article, the author explains their method and calculations, and may append additional data in the form of tables, diagrams, etc. But the fragmented nature of supplementary materials, the disparity in what is required from one journal to another, and the limited volumes of data that can be added are likely to limit the discovery of useful content.
Texte and data Mining
To facilitate data exploration and as part of the VisaTM project, INIST has set up a catalogue of tools for text mining.