A ‘How-To’ Guide for Extracting GloBI Data 😃

GloBI has a ton of useful data in it and can be used in many different ways. However, it can also be a bit overwhelming with so much data. This page offers some helpful links, hacks, and instructions for how to extract the information available in a useful format.

Do you have web programing skills? Extra time on your hands? Want to help make easy to push buttons, search boxes, or other web interface type improvements to GloBI? PLEASE contact Jorrit! We’d love the extra help!

Contents



Introduction to using GloBI

Did you know there is a video tutorial on how to use GloBI?
A Practical Exploration of Biotic Interaction Data Management and Information Retrieval through TPT and GloBI (video)

There are also detailed step-by-step instructions from the Species Interaction Data Workshop on how to extract information from GloBI.

  1. Getting Interaction Data
  2. Working with the Whole Dataset
  3. Exploring Ixodes (tick) Records By Pointing and Clicking
  4. Data Sources: Interaction Data Record Review
  5. Data Sources: Taxonomic Name Review


Top of Page



Pre-compiled Datasets

Did you know, GloBI has a number of datasets and files pre-compiled and ready to download?!

Just go to the GloBI Sources page, find a collection or group’s dataset you are interested in, and click the “review” button on the left with the green checkmark. Then copy the file you want and paste it into your web browser address box. It will automatically start to download the file 😃

Here are some multiple collection precompiled datasets you may find useful:

Terrestrial Parasite Tracker (TPT) data

Terrestrial Parasite Tracker is a NSF-funded project that aims to digitize natural history collection records related to parasites and their vertebrate hosts.

SCAN data

Symbiota Collections of Arthropods Network (SCAN) (https://scan-bugs.org) is “A Data Portal Built to Visualize, Manipulate, and Export Species Occurrences.”

Big-Bee data

The Big Bee project (https://big-bee.net/) aims to “Extend Anthophila research through image and trait digitization.”

Big-Bee publishes a quarterly report of global bee interactions indexed by GloBI that includes additional data curation such as the removal of duplicate records. This publication includes interactions from museum specimens, journal publications, and observations in both comma and tab-delimited formats.

Katja C. Seltmann. (2022). Global Bee Interaction Data (v1.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6564738 on May 19, 2022


Top of Page



General Searches

Search from home page

Search from browse page

To search from the GloBI browse page:


Top of Page



Using R

For those that are comfortable using R, install and use rglobi for more precise and filtered datasets.


Top of Page



Using the Command Line

General datasets

If you are are comfortable with using command line scripts and codes (i.e., in terminal, shell, etc.) you may find the following “Big Data Cheatsheet” useful.

After you download the dataset you need (see Pre-compiled Datasets, General Searches, or the GloBI data page), you can modify the following code to fit your dataset needs:

$ time cat data/stable/interactions.csv.gz\
  | gunzip\
  | mlr --csv filter '$sourceTaxonKingdomName == "Fungi"'\
  | mlr --csv filter '$targetTaxonGenusName == "Quercus"'\
  | mlr --csv cut -f sourceTaxonName,targetTaxonName\
  > data/oakfungi.csv
$ time cat data/stable/interactions.csv.gz\
  | gunzip\
  | mlr --csv filter '$targetTaxonKingdomName == "Fungi"'\
  | mlr --csv filter '$sourceTaxonGenusName == "Quercus"'\
  | mlr --csv cut -f targetTaxonName,sourceTaxonName\
  | tail -n+2\
  >> data/oakfungi.csv

You can also load a dataset from GloBI into a sqlite3 database on your personal computer by using/modifying the following code (after downloading the dataset):

$ cat interactions.csv.gz\
| gunzip\
| sqlite3 -csv globi.db '.import /dev/stdin interactions'

Or for SCAN specific data:

$ cat interactions.csv.gz\
| gunzip\
| grep "globalbioticinteractions/scan" > globi-scan.csv

or:

cat interactions.csv.gz\
| gunzip\
| grep "globalbioticinteractions/scan"\
| wc -l

To reduce the size of sqlite3 (or other) database, you can drop columns before importing them using powertools like cut or mlr/miller. See the importing csv files to sqlite page .

If you can provide further step by step instructions on how to use these scripts, please add it to the working guide and I will add to this page. I’m not a command line person, so any help adding to this section would be much appreciated!


Custom Taxon List Search

Want association information for multiple taxa without searching each name individually? Want to be able to download a csv with this data? Well, now you can!

  1. Create a simple text (.txt) file with the names you want to find associations for.
    • File should have a single column of names
    • For species names that have spaces in them, replace all spaces with “%20”. You can do this with a find and replace all procedure. Example: Cremnops montrealensisCremnops%20montrealensis
    • Save file with extension .txt
  2. Open a command line terminal on your computer (on Macs there is a built in one called “Terminal”)
  3. Navigate to the folder you want the resulting files to be saved to. Example for macs:
    Cd YourDocuments/YourFolderName/ProjectsFolder
    
  4. Once in the folder you want your results to save to, run the following code in terminal:
    • This will produce and save a csv file with all associations in GloBI for the taxa in the list you made as well as the records the associations came from.
      cat YourNameList.txt\
      | sed 's+^+https://api.globalbioticinteractions.org/interaction.csv?includeObservations=true\&sourceTaxon=+g'\
      | xargs -L1 curl\
      > results-YourNameList.csv
      

      Where:

    sed 's+^+https://api.globalbioticinteractions.org/interaction.csv?includeObservations=true&sourceTaxon=+g'

    • Turns each name into a url request for individual records that involve the specified taxon.

    xargs -L1 curl

    • Executes one request at a time using “curl” (command-line web browser)

    > results-...csv

    • Saves the results in a file called “result[something].csv”
  5. If you only want the taxon level information (not information about the individual records the associations came from) you can modify the code by omitting the includeObservations=true. For example:
    cat YourTaxaList.txt\
    | sed 's+^+https://api.globalbioticinteractions.org/interaction.csv?sourceTaxon=+g'\
    | xargs -L1 curl\
    > results-YourInteractions.csv
    

Things to note about this procedure:

Top of Page



GloBI Hacks

No-download data viewing

You can view datasets from GloBI (or any other .csv/.tsv files online) without actually downloading them! This is possible by using Google Sheets and removing the “.gz” extention of a file from the GloBI Sources page.

=IMPORTDATA("YOUR FILE NAME")

Example:

=IMPORTDATA("https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/scan/indexed-interactions-sample.tsv")

Top of Page



Name Alignment Tool

The Name Alignment Tool allows us to compare a list of names we may want to review (e.g., the association names we’ve shared with GloBI; list of names in a collection’s database) with a different or “accepted” or preexisting list of taxonomic names (e.g., ITIS; GBIF; Catalog of Life; TPT’s taxonomy; etc.). There are several different ways we can use the Alignment Tool and Jorrit has automated much of the process for us so we can just click a button to download or view different name alignments file(s) for comparison. Of course there are also a number of ways to customize the name comparisons relatively easily with GitHub and some simple command line prompts if you are so inclined 🙂

Here are some different ways to use the Alignment Tool:

TPT Group Name Matches

To review all the name matches in one place for collections involved in the TPT project, follow these instructions (super easy!):

Individual Collection/Dataset Name Matches

To review YOUR specific collection or dataset name matches, follow these instructions (super easy!):

TPT Collections
For Collections in the TPT project:

Big-Bee Collections
For Collections in the Big-Bee project:

All Collections & Datasets
For all other collections and datasets this button is still being developed. Find your collection or dataset and see what options are availble by going to the GloBI datasets page.

Customized Name List Comparisons

The Name Alignment Tool is set up so that it can be customized to align and compare any list of taxon names (must be appropriately formatted) to a number of different name catalogs including but not limited to: ITIS, NCBI, discoverlife, GBIF, COL, Open Tree of Life, GloBI taxon graph. There is a whole detailed workshop tutorial on how to customize name list alignments for comparison provided by the Big-Bee TCN (thanks Big-Bee!). The workshop has step-by-step instructions with lots of screenshots on how to customize the both lists of names to compare.

A GitHub account (free!) and a minimal amount of copying, pasting, and uploading files to GitHub will be necessary for these customizations, but everything is explained in great detail on the workshop website so that anyone can successfully customize their list comparisons - even with no previous experience using GitHub, coding, or a terminal.

More name matching with Nomer

Want more name alignment options? Check out the tool Nomer


Top of Page



GloBI and the GloBI How-to page can be used in so many interesting projects and applications!

Current Projects

Here are just some of the projects that have used GloBI or the GloBI How-to page in their project:

Add Your Project

Do you have a project that used GloBI, is related to GloBI, or used the GloBI How-to page (this page)? Let us know! We love learning about new projects or related works and would like to list you project here! Just fill email Jorrit or Erika, fill out the out the form below, or submit an issue and we’ll add you to this page. So many ways to share and be included 😃


Top of Page



Hopefully, this page had some helpful content to help you navigate GloBI!!

💡 Have something helpful to add to this page?

➡️ Please add it to the working guide we are creating to help pull data out of GloBI.

💡 Have a problem or something we need to add?

➡️ Please submit a request on the issue page!


🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧

Need additional help the working guide or issue page doesn’t cover? Contact page editor Erika Tucker

This page is supported in part by the Terrestrial Parasite Tracker group’s efforts to produce sustainable, open access, digitization methods and related research tools (NSF award# 1901932).