GloBI status page

Screenshot of GloBI status page. 24 January 2017.

Just like organisms, datasets get born, grow up, reproduce, and die. GloBI's mission is to help increase the productivity (or reuse), and increase the lifespan of, datasets before they meet their maker.

To monitor the lifestage of a datasets, the GloBI "status" page was introduced in January 2017. This page shows the state of the federated collection of species interaction datasets that make up GloBI. Rather than treating datasets like static entities, GloBI takes a dynamic approach and revisits datasets regularly to incorporate changes and revisit the links to naming authorities (e.g. itis.gov), gazetteers (e.g. geonames.org) and other relevant external data services. Several monitors can be found on the status page that indicate whether the dataset can be read, searched or citated. In addition, some statistics are provided to point to known issues and the properties of the datasets (e.g. number of interactions, number of names, match rate of names). With this, a de-facto publication process is outlined along with quality control measures that show how the lifestage of a dataset.

Continuous Integration

GloBI continuously discovers species interaction datasets through GitHub and Zenodo. In addition, automated review processes can be configured on travis-ci to check on the "health" of datasets. For more information see the data management wiki page.

At time of writing, the status page provides a wealth of information. For instance, the state of the dataset (e.g. ) provided by http://kelpforest.ucsc.edu (Beas-Luna et al. 2014) indicates that the dataset is close to dying or already dead: nearly all indicators are red and unresolved issues exist. In contrast, the chance of survival of a dataset like the Africa Tree Database (Selzer et al. 2015) looks promising: indicators are green (e.g. ), name match rate is 94% across 1.9k names and 7.7k interactions. In addition, the Africa Tree Database has been deposited with Zenodo, a service that is designed to provide "permanent" data availability through digital object identifiers.

Over the last couple of weeks, I've used the page to discover and resolve various dataset issues. For example, an obscure GloBI bug was found (and resolved) (see hurlbertlab/dietdatabase#40 and jhpoelen/eol-globi-data#276) that prevented the integration of some interaction records provided by the Avian Diet Database. I am curious to see how the status page will evolve in the next months.

For more information about the status page, please visit the data management wiki page. The development of the status page was supported by the Encyclopedia of Life.