The choice of file formats (identified by the extension in the file name) is particularly important as it will determine how easily project members can process, share and disseminate data throughout the project lifecycle.
Open, non–proprietary formats that are widely used by the relevant research community should be preferred:
- this avoids the rapid obsolescence of files,
- this ensures that data can be accessed over a relatively long period of time using open source software that implements the format,
- it ensures that the data is both reusable and sustainable.
File formats are highly dependent on the scientific community. Below are a few non–exhaustive examples (see the Wikipedia page on open file formats):
- Text documents: txt, odt, pdf
- Structured data: csv, ods, xml, json
- Data in binary format (experimental data, simulation data, etc.): hdf5, netcdf
- Images : png, gif
- Multimedia : mp3, mp4
Useful resources
- Open or closed format ? (in french): on the website Doranum.
- General Interoperability Repository (in french): recommendations from the Directorate–General for State Modernisation.
- Information on file formats for archiving (in french) on the CINES website.
-
The CINES tool “FACILE” (in french) for testing the validity of a file in open format.
In France, the concept of an open standard is legally defined by Law No. 2004–575 of 21 June 2004 on ‘confidence in the digital economy‘: “On entend par standard ouvert tout protocole de communication, d’interconnexion ou d’échange et tout format de données interopérable et dont les spécifications techniques sont publiques et sans restriction d’accès ni de mise en œuvre.”