The format you use to represent your data is one of the main factors for determining how likely someone else will be able to use your data in the future.
Eventually, much of the hardware and software we use today will likely become obsolete, so it’s important to make the effort now to use formats that are most likely to remain accessible in the future.
It’s also important to use formats that save as much of the original information as possible.
Guidelines for choosing appropriate file formats:
- Choose formats that are non-proprietary. Example: Instead of saving your spreadsheet as an Excel file (.xls), save it as a Comma Separated Values file (.csv).
- Use formats that do not compress your data. Example: For images, use a TIFF format rather than a GIF or JPEG format that automatically compresses (and therefore deletes!) information in your file.
- Use formats that are in common usage by the research community. Example: HTML and CSV files are formats that have been widely adopted for use and are readable by a wide variety of software.
- Use formats that are unencrypted. Data security, we know, is important, but an encrypted dataset whose key has been lost (say the password is forgotten) is no dataset at all.
- Image: JPEG, JPG-2000, PNG, TIFF
- Text: HTML, XML, PDF/A, UTF-8, ASCII
- Audio: AIFF, WAVE
- Spatial: Raster: GeoTIFF, Vector: shapefile
- Multidimensional: NetCDF
- Containers: TAR, GZIP, ZIP
- Databases: prefer XML or CSV to native binary formats, if possible