A ‘How-To’ Guide for Extracting GloBI Data 😃
GloBI has a ton of useful data in it and can be used in many different ways. However, it can also be a bit overwhelming with so much data. This page offers some helpful links, hacks, and instructions for how to extract the information available in a useful format.
Do you have web programing skills? Extra time on your hands? Want to help make easy to push buttons, search boxes, or other web interface type improvements to GloBI? PLEASE contact Jorrit! We’d love the extra help!
Contents
- Introduction to using GloBI
- Pre-compiled Datasets
- General Searches
- Using R
- Using the Command Line
- GloBI Hacks
❗ ⚠️ Page under construction ⚠️ ❗
Introduction to using GloBI
Did you know there is a video tutorial on how to use GloBI?
A Practical Exploration of Biotic Interaction Data Management and Information Retrieval through TPT and GloBI (video)
There are also detailed step-by-step instructions from the Species Interaction Data Workshop on how to extract information from GloBI.
- Getting Interaction Data
- Working with the Whole Dataset
- Exploring Ixodes (tick) Records By Pointing and Clicking
- Data Sources: Interaction Data Record Review
- Data Sources: Taxonomic Name Review
Top of Page
Pre-compiled Datasets
Did you know, GloBI has a number of datasets and files pre-compiled and ready to download?!
Just go to the GloBI Sources page, find a collection or group’s dataset you are interested in, and click the “review” button on the left with the green checkmark. Then copy the file you want and paste it into your web browser address box. It will automatically start to download the file 😃
Here are some multiple collection precompiled datasets you may find useful:
Terrestrial Parasite Tracker (TPT) data
Terrestrial Parasite Tracker is a NSF-funded project that aims to digitize natural history collection records related to parasites and their vertebrate hosts.
- List of all data files available
- .tsv file of some interactions from TPT data
- .tsv file of ALL interactions from TPT
Note: this is LARGE file so it will need to be unzipped/compressed. See below for ways to download un-zipped/compressed file… - .tsv file of interactions for each collection in TPT
- List of collections in the TPT group on GloBI
SCAN data
Symbiota Collections of Arthropods Network (SCAN) (https://scan-bugs.org) is “A Data Portal Built to Visualize, Manipulate, and Export Species Occurrences.”
- List of all data files available
- .csv file of some interactions from SCAN data
- .csv file of ALL interactions from SCAN
Note: this is LARGE file so it will need to be unzipped/compressed. See below for ways to download un-zipped/compressed file…
Big-Bee data
The Big Bee project (https://big-bee.net/) aims to “Extend Anthophila research through image and trait digitization.”
Big-Bee publishes a quarterly report of global bee interactions indexed by GloBI that includes additional data curation such as the removal of duplicate records. This publication includes interactions from museum specimens, journal publications, and observations in both comma and tab-delimited formats.
Katja C. Seltmann. (2022). Global Bee Interaction Data (v1.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6564738 on May 19, 2022
Top of Page
General Searches
Search from home page
- Navigate to GloBI’s home page
- Enter an organism name in one or both search boxes
- Hit search
- You can browse these result in the resulting list, or for a downloadable list, click the “Open results in interaction browser” link
- This will bring up a map, two different interactive interaction graphs, and a list of interactions on the left.
- To download the list of the interactions, click the “csv sample” link at the top of the list for a partial list, or for a full dataset click “all data” “ and select the file you want.
- Copy and paste the file name into your browser address box to download, or use the hack below to view the dataset without downloading
Search from browse page
To search from the GloBI browse page:
- Clear the name “Chelonia mydas” in green by clicking the little “x” next to it (unless of course this is the organism you are interested in!)
- Type in the scientific name of the organism you want to look up and select it from the dropdown menu when it comes up (you need to actually select it from the list, not just type it in)
- Examine the preview list automatically produced
- Click “download csv data sample” or “access full dataset” at the top of the list depending on the dataset you need
- If your browser cuts off some of the search boxes strangely like mine does, click the little grey box within a box icon on the top left of the page (next to the qestion mark - it says “maximize/minimize” if you hover your mouse over it for a bit”. This will hide the maps and make it easier to see the list and search boxes.
Top of Page
Using R
For those that are comfortable using R, install and use rglobi for more precise and filtered datasets.
- rglobi CRAN (download) page
- rglobi GitHub installation instructions
- Detailed instructions on how to use rglobi
- How to use GloBI data in bipartite interaction analyses
Top of Page
Using the Command Line
General datasets
If you are are comfortable with using command line scripts and codes (i.e., in terminal, shell, etc.) you may find the following “Big Data Cheatsheet” useful.
After you download the dataset you need (see Pre-compiled Datasets, General Searches, or the GloBI data page), you can modify the following code to fit your dataset needs:
$ time cat data/stable/interactions.csv.gz\
| gunzip\
| mlr --csv filter '$sourceTaxonKingdomName == "Fungi"'\
| mlr --csv filter '$targetTaxonGenusName == "Quercus"'\
| mlr --csv cut -f sourceTaxonName,targetTaxonName\
> data/oakfungi.csv
$ time cat data/stable/interactions.csv.gz\
| gunzip\
| mlr --csv filter '$targetTaxonKingdomName == "Fungi"'\
| mlr --csv filter '$sourceTaxonGenusName == "Quercus"'\
| mlr --csv cut -f targetTaxonName,sourceTaxonName\
| tail -n+2\
>> data/oakfungi.csv
You can also load a dataset from GloBI into a sqlite3 database on your personal computer by using/modifying the following code (after downloading the dataset):
$ cat interactions.csv.gz\
| gunzip\
| sqlite3 -csv globi.db '.import /dev/stdin interactions'
Or for SCAN specific data:
$ cat interactions.csv.gz\
| gunzip\
| grep "globalbioticinteractions/scan" > globi-scan.csv
or:
cat interactions.csv.gz\
| gunzip\
| grep "globalbioticinteractions/scan"\
| wc -l
To reduce the size of sqlite3 (or other) database, you can drop columns before importing them using powertools like cut or mlr/miller. See the importing csv files to sqlite page .
If you can provide further step by step instructions on how to use these scripts, please add it to the working guide and I will add to this page. I’m not a command line person, so any help adding to this section would be much appreciated!
Custom Taxon List Search
Want association information for multiple taxa without searching each name individually? Want to be able to download a csv with this data? Well, now you can!
- Create a simple text (.txt) file with the names you want to find associations for.
- File should have a single column of names
- For species names that have spaces in them, replace all spaces with “%20”. You can do this with a find and replace all procedure. Example:
Cremnops montrealensis
⇒Cremnops%20montrealensis
- Save file with extension
.txt
- Open a command line terminal on your computer (on Macs there is a built in one called “Terminal”)
- Navigate to the folder you want the resulting files to be saved to. Example for macs:
Cd YourDocuments/YourFolderName/ProjectsFolder
- Once in the folder you want your results to save to, run the following code in terminal:
- This will produce and save a csv file with all associations in GloBI for the taxa in the list you made as well as the records the associations came from.
cat YourNameList.txt\ | sed 's+^+https://api.globalbioticinteractions.org/interaction.csv?includeObservations=true\&sourceTaxon=+g'\ | xargs -L1 curl\ > results-YourNameList.csv
Where:
sed 's+^+https://api.globalbioticinteractions.org/interaction.csv?includeObservations=true&sourceTaxon=+g'
- Turns each name into a url request for individual records that involve the specified taxon.
xargs -L1 curl
- Executes one request at a time using “curl” (command-line web browser)
> results-...csv
- Saves the results in a file called “result[something].csv”
- This will produce and save a csv file with all associations in GloBI for the taxa in the list you made as well as the records the associations came from.
- If you only want the taxon level information (not information about the individual records the associations came from) you can modify the code by omitting the
includeObservations=true
. For example:cat YourTaxaList.txt\ | sed 's+^+https://api.globalbioticinteractions.org/interaction.csv?sourceTaxon=+g'\ | xargs -L1 curl\ > results-YourInteractions.csv
Things to note about this procedure:
- It may not work well for lists longer than 100 names
- These results are unstable (change everytime GloBI data changes) and are not in an easily citable format for research purposes.. For a test dataset and to practice with and to read the conversation as this procedure was figured out, check out GitHub issue #782
Top of Page
GloBI Hacks
No-download data viewing
You can view datasets from GloBI (or any other .csv/.tsv files online) without actually downloading them! This is possible by using Google Sheets and removing the “.gz” extention of a file from the GloBI Sources page.
- Open a new blank Google Spreadsheet in one browser tab/window
- In another browser tab/window, pick the dataset you want to view
- On the GloBI Sources page, pick a collection or group
- On the left of the items listed, click the “review” button with a green checkmark next to it
- From the list of files that opens on a new page, copy the file name of the file you want to view (excluding the .gz part if it has it)
- Go back to your blank Google Spreadsheet tab
- In the first cell type the formula:
=IMPORTDATA("YOUR FILE NAME")
Example:
=IMPORTDATA("https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/scan/indexed-interactions-sample.tsv")
- Don’t forget the quotation marks in the formula!
Name matching with other databases
To match or cross-reference names in GloBI to names in other databases such as ITIS or NCBI, check out the tool Nomer…
Top of Page
Hopefully, this page had some helpful content to help you navigate GloBI!!
💡 Have something helpful to add to this page?
➡️ Please add it to the working guide we are creating to help pull data out of GloBI.
💡 Have a problem or something we need to add?
➡️ Please submit a request on the issue page!
🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧 🚧
Need additional help the working guide or issue page doesn’t cover? Contact page editor Erika Tucker
This page is supported in part by the Terrestrial Parasite Tracker group’s efforts to produce sustainable, open access, digitization methods and related research tools (NSF award# 1901932).