This lesson is in the early stages of development (Alpha version)

Species Interaction Data Workshop

Getting Interaction Data


Teaching: 5 min
Exercises: 5 min
  • Where can I download interaction data?

  • What formats are the data available?

  • Understand the available ways to get interaction data

  • Understand the columns of the downloaded data

Getting oriented with GloBI interaction data

Let’s get oriented with the interaction data found on GloBI. GloBI data products are interpreted in order to bring together disparate data sources (literature, observations, collections, etc.). This means that GloBI data products are opinionated and may be incomplete compared to the original data sources. Original data can be found from the citations and these should still be considered the detailed building blocks for the entire interaction dataset.

Some of the different sources of data include natural history collection records, observations extracted from the literature, interaction and network datasets, observations from community science programs and other, larger aggregated datasets.

Where to find data

Navigate to the GloBI Data Products page and explore the Original Data Sources.

What is what?

stable versions of the data are versioned in the doi:10.5281/zenodo.3950589 GloBI data publication. A new version is done about every six months.

snapshot are the most recent, live data. Thus, this could change daily! Great for exploration and preliminary analysis.

How many records are in the GloBI dataset. It is a lot!

wc -l interactions.csv

Data publication

For research or other data intensive project, it is suggested to use GloBI’s stable versioned integrated data published via doi:10.5281/zenodo.3950589 or create a new data publication that contains the data you are using.

Other ways of accessing GloBI data

Exploratory, interactive queries can be executed through SPARQL and Cypher endpoints, GloBI Search/Browse pages, or by using the REST-y GloBI Web API. For those that use R, rglobi is available to explore interaction data. rglobi can also be used to execute Cypher queries. However, it is best to consider these as methods for exploring data rather than data access points. If you are doing research, download the full dataset and create a version of it.

Discussion: Why is it important to version the GloBI data in research?

Take a moment to discuss as a group why it is important to version, publish, or archive a copy of the GloBI dataset you use for research. What are some ways to archive datasets?

Next Up: Reviewing the Reviews

If you’d like to follow along while working with the entire dataset, please jump to Working with the Whole Dataset.

If you would like to explore GloBI data through the GloBI webpage, please visit lesson episode Point and Click.

Key Points

  • GloBI has interaction data that can be accessed as a full dataset

  • Better to use the full or whole dataset than the APIs

Working with the Whole Dataset


Teaching: 20 min
Exercises: 10 min
  • How can I get started with the whole interaction dataset?

  • Is this the only way to manage the interaction dataset?

  • Trim the dataset to only records that contain Ixodes

  • Put dataset into R and poke around

  • Add dataset to sqlite database

  • Learn from each other and ask questions!


The entire or whole interaction dataset on GloBI consists of over 6 million interaction records. There are many ways to approach a large dataset and this exercise is to demonstrate one example using shell and R. We are not going to follow along with shell and introduction to R tutorials in this workshop, but Carpentries has a few nice ones to get you started, including Introduction to shell and Introduction to R.

These exercises can be followed along using R and shell, but it is not necessary. If you would like to follow along, please go ahead and open R-studio and your shell window.

Getting started

At the end of this time we will regroup and report back the other workshop participants about what we did in this breakout group. Who would like to be the person/s who report back for the breakout group?

Let’s collaboratively take notes in the Google Document. The link to the document is in the chat.

Find all of the records in the dataset based on a taxon name

We are interested in finding all of the records in the interactions.csv dataset that deal with Ixodes and we are interested in reducing the size of the data so it is easier to manage. One quick way to do this is via the shell.

How many records are in the GloBI dataset. It is a lot!

wc -l interactions.csv

One of the first things we might want to do is trim the dataset to only those taxa we are interested in analysing. In this case, we will look for all Ixodes records. To do so, we will use a simple shell script, extract all of the rows that contain the word Ixodes and create a new file file. This process will help reduce the size of the dataset so we can use R for our analysis. The shell script will take ~ 4 minutes and 12 seconds to complete!


When we examine the code in the script we see that it is using grep, which is “a Unix command used to search files for the occurrence of a string of characters that matches a specified pattern”. Grep matches on the row and does not specify which column Ixodes is found. We then sort the records to look for only exact, unique versions of the records.

echo Creating headers
head -1 ../data/interactions.csv > ../data/Ixodes_data.csv

echo Finding all Ixodes
cat ../data/interactions.csv | grep -w "Ixodes" >> ../data/Ixodes_data.csv
wc -l ../data/Ixodes_data.csv

echo Sorting unique records
sort -r ../data/Ixodes_data.csv | uniq > ../data/Ixodes_data_unique.csv
wc -l ../data/Ixodes_data_unique.csv

If you want to find several taxa and combine the datasets, you could create files from multiple taxa and combine the output together into a single dataset using cat. An example of this can be found here. This example takes all files of the files in the Data folder that contanin the pattern unique.tsv and creates a new file called *all_data.txt**._

cat ../Data/*unique.tsv >> ../Data/all_data.txt

Now lets compare the new datasets.How many records are in the trimmed GloBI datasets? Is there a difference between unique and not?

wc -l Ixodes_data.csv

wc -l Ixodes_data_unique.csv

Let’s do something in R

Load trimmed dataset into R using R-studio. We will start by stepping through some R code and discuss the results. The R code we are using can be downloaded to follow along or you can see an html preview of the code.

We will start by just finding the columns and create a subset of the data to import into Google Sheets. Time permitting, we will talk about some of the interesting data issues we are finding in the dataset.

Exercise 1: What do the columns mean?

There are 88 columns in the interactions data file. In this exercise, we will find the columns and pick out which ones are commonly useful in research data. You can create your own list or use this Google Sheet with the first 100 rows of the Ixodes_data_unique.csv file.

  1. Obtain a list of all of the column names.
  2. How many of them deal with taxon names?
  3. What column/s include the citation information?
  4. What column/s contains the interaction information?
  5. What is the difference between the source and target columns?
  6. Describe one other important column.

Import into a sqlite database

Databases are a great way to manage large datasets and handle data filtering, sorting and grouping. Sqlite is commonly used with R as it is easily transferable with the R code. We are not going to learn sqlite today, but there are some great Carpentries tutorials to get you started, including the Introduction to sqlite. Let’s step through a few commands to see how easy it is to take a CSV file and create a sqlite database.

sqlite3 globi.db

.mode csv

.import Ixodes_data_unique.csv interactions

PRAGMA table_info(interactions);

SELECT sourceTaxonGenusName, count(sourceTaxonGenusName) FROM interactions group by sourceTaxonGenusName;

SELECT interactionTypeName, count(interactionTypeName) FROM interactions group by interactionTypeName;


Key Points

  • There is a lot of interaction data available and shell is one helpful tool to reduce the size of the dataset.

  • Sharing code helps everyone.

Exploring Ixodes (tick) Records By Pointing and Clicking


Teaching: 15 min
Exercises: 15 min
  • How can I explore indexed Ixodes interaction records?

  • When should I use GloBI’s webtools?

  • How can I help point out suspicious, or missing, data?

  • How can I make suggestions?

  • Familiarize with existing web tools

  • Load sample Ixodes interaction csv datafiles into a spreadsheet

  • Articulate limitations of Web APIs

  • Locate the species interaction data sources

Getting started

At the end of this time we will regroup and report back the other workshop participants about what we did in this breakout group. Who would like to be the person/s who report back for the breakout group?

Let’s collaboratively take notes in the Google Document. The link to the document is in the chat.

Global Biotic Interactions offers help to explore available species interaction datasets using basic web tools. These webtools help to answer questions like: “Which organisms do Ixodes interact with?”, “Which datasets support describe Ixodes interactions?”, and “How can I suggest improvements or point out suspicious data?”

The web page at helps search for specific interactions by (taxonomic) name, interaction type (e.g., eats, parasite of, pollinates), and data source. Similar to general purpose search engines, only a subset of matching results is shown, and more can be requested if desired. The results include a description of the interacting taxa as well as the authority and data source that is said to support (or refute!) the interaction claim.

Exercise 1. Searching for interactions

Use GloBI’s interaction search page to:

  • first, search for interactions that involve ticks (Ixodes)
  • then, narrow to search to include only tick (Ixodes) - mammal (Mammalia) interactions
  • finally, select only parasitic tick-mammal interactions

For each of the narrowing search, describe some of the data sources and references. What kind of references are they? Which data sources support the interaction claims?

Context of Interaction Results

Now that we’ve explored ways to select specific questions, let’s have a closer look at the links embedded in the search results. One of the goals of the web search tool is to provide a minimal, easy to use, way to get a sense of the wealth of interaction data that is already openly available.

Exercise 2. A closer look at interaction search results

  1. Various icons and links appear in search results. Make screenshots of the some search result and describe what these icons and links do.

  2. Various verbs (e.g., interacts with, parasite of) are used to categorize interactions. List a few of these verbs (aka interactions terms) and describe what you think they mean in your own words. Compare the description with the web resources the verbs link to.

  3. Describe how you imagine your colleagues would use the species interaction search web tool. Discuss current limitations and improvement opportunities.

Point-and-click Tool 2: Interaction Browser

Another search tool, the interaction browser, can be found at This tools uses network and bundle diagrams to help visualize a sample of the selected interaction data. Also, a geospatial constraints can be specified to select a specific area of interest. Finally, a sample csv file with supporting interaction data records can be downloaded for review.

Exercise 3. A more visual exploration of interaction data

First, open the GloBI Browser. By default, interactions of the green turtle (Chelonia mydas) are shown. It should look something like:

Now, familiarize yourself with the four panels by clicking around and exploring their interactive features:

  • Can you update the search criteria to only select tick (Ixodes) - mammal (Mammalia) interactions?
  • What happens when you select only North America in the geospatial selector?
  • How would you share your resulting diagram with others?

Finally, click on the “download csv data sample” to download the related interactions.csv file.

Open the interactions csv file in your favorite editor and describe each of the columns in your own words. Which columns need further explanation?

(Extra Credit) Make a list of all the distinct mammalian hosts and Ixodes names that were included in the csv sample.

Key Points

  • Web tools are for exploring indexed data and providing feedback

  • Web tools facilitate communication within biodiversity data community

  • Web tools are dynamic and subject to change

Working with Data Sources


Teaching: 10 min
Exercises: 10 min
  • How does GloBI discover interaction data?

  • How does GloBI integrate interaction data?

  • Is GloBI a data repository or a search index?

  • Understand where to find GloBI data sources

  • Understand where to find GloBI data reviews

Data Sources: GloBI’s Building Blocks

Global Biotic Interactions (, GloBI) relies on existing species interaction datasets, or data sources. These data sources are regularly re-indexed by GloBI to include recent updates. So, rather than a data management system or repository, GloBI is more like a search engine that helps find biotic interaction data in openly available datasets.

The kinds of data sources indexed by GloBI are pretty diverse: some datasets come from professionally managed natural history collections or specialized data portals, whereas others are manually transcribe interaction records from literature, or observation records provided by citizen scientists.

A list of GloBI data sources can be found at

Exercise 1: Find Data Sources

Visit and locate:

  • the USNM Ixodes Collection,

  • Seltmann’s Tick Interaction Database, and

  • iNaturalist observation records.

For each data source, click on the badge to explore some of their indexed interactions.

Describe one interaction for each data source in the collaborative notes.

GloBI builds the search index in the following steps:

  1. Find registered interaction data in Github and Zenodo.
  2. Access/download and version digital dataset
  3. Integrate interaction records using translation tables into a knowledge graph
  4. allow for Reuse by publishing integrated data products and services

These steps are repeated regularly, often many times a week, to include new additions or other updates.

Want to Learn More about Data Indexing Steps?

Visit and learn more about the GloBI indexing process and the tools.

Data Source Reviews

To help better understand how GloBI interprets data sources, automated data reviews are made available for each data source, or dataset.

These dataset specific reviews include:

Exercise 2: Parasite Tracker Data Source: CAS

Visit the GloBI’s Parasite Tracker project page at and locate the California Academy of Sciences / Entomology Collection.

Click on each of the buttons and describe the function of the “review”, “GloBI”, “config”, “issues” and “names” badges.

Also, in the review log, note how many interactions GloBI found in the California Academy of Sciences Entomology collection.

In this lesson episode, you’ve learned that GloBI is a search index that helps to explore interaction data in existing data sources.

Also, you found the list of GloBI data sources and discovered the search-by-datasource, review and configuration links.

Next Up: Reviewing the Reviews

If you’d like to learn more about what a data review is, please jump to Reviewing Interaction Records.

If you are specifically interested in how GloBI links to taxonomic names, please visit lesson episode Reviewing Taxonomic Names.

Key Points

  • GloBI is built using existing data sources

  • Data sources are continously and automatically indexed by GloBI

  • GloBI provides automated reviews of data sources

Data Sources: Taxonomic Name Review


Teaching: 10 min
Exercises: 10 min
  • How does GloBI interpret taxonomic names?

  • How can I find taxonomic names that GloBI didn’t understand?

  • Does GloBI use single taxonomic backbone?

  • Understand how GloBI links provided names to existing taxonomic naming schemes

  • Know to where to go to find, and discuss, suspicious taxonomic names

Getting started

At the end of this time we will regroup and report back the other workshop participants about what we did in this breakout group. Who would like to be the person/s who report back for the breakout group?

Let’s collaboratively take notes in the Google Document. The link to the document is in the chat.


The goal of this lesson is to understand how, and why, GloBI indexes and links taxonomic names.

Taxonomic names are used in literature and datasets to classify living organisms. To increase the discoverability of species interaction data, GloBI links and indexes provided taxonomic names to enable queries like: which mammals (Mammalia) are known the host ticks (ticks)? Also, taxonomic name linking enables retrieval of key images and common names to help provide a context and make it easier to interpret an interaction claim.

Taxonomic Name Linking Challenges

However, significant challenges exist to interpret and link taxonomic names, especially when dealing with datasets from different eras, authors, and disciplines.

These taxonomic name interpretation challenges include, but are not limited to:

  1. Use of common names instead of scientific names (e.g., “kip” (Dutch) or “chicken” (English) vs. “Gallus gallus domesticus”)
  2. Typos (e.g., “Gallvs gallus” instead of “Gallus gallus”)
  3. Ambiguous names (e.g., “Anura” is a genus of flowering plants in the daisy family of as well as an order for frogs).
  4. Outdates/ disputed names (e.g., taxonomic revisions re-interpret classifications and re-assign names)
  5. Incomplete hierarchies (e.g., data sources provide species name, but no higher order taxonomic ranks)

GloBI uses existing taxonomic name parsing and resolving tools to help find reasonable links between provided (or verbatim) names from data sources and existing taxonomic name lists. Rather than using a single taxonomic backbone, a variety of name sources is used. These name sources include, but are not limited to: Integrated Taxonomic Information System (ITIS), World Register of Marine Species (WoRMS), NCBI Taxonomy, Wikidata Taxonomy, Encyclopedia of Life (EOL) species pages, FishBase, SeaLifeBase, iNaturalist Taxonomy, GBIF Backbone Taxonomy, and Plazi’s TreatmentBank.

Note the icons representing the various taxonomic name schemes in the screenshot of an interaction claim of a sheep tick (Ixodes ricinus) eating a human (Homo sapiens).

Click on the image to try and reproduce the results. Then, discover the taxonomic linkages using the icons below the image. Note what project provides the images for the taxa. Also, record some examples of the urls pointing to the taxonomic name resources by right-clicking on the icon and copy-pasting link using “copy link location” (or similar). What do you notice?

GloBI’s Taxonomic Name Linking Process

The process to index and resolve taxonomic names currently consists of two phases.

Phase 1. Create a Taxonomic Name Map (aka GloBI’s Taxon Graph)

a. extract taxonomic names/ids from data sources

b. pre-process and parse names/ids

c. match names/ids against existing names resolvers/name lists

d. version and publish (updated) taxon name map/graph

Phase 2. Create Search Index

a. load specific version of taxon name map/graph

b. on encountering mapped taxonomic names/id, add to extra information index

c. on countering unmapped name/id, tag name with “no:match”

d. version and publish interaction search index

Want to Learn More about GloBI's Taxonomic Name Matching?

Visit to learn more about taxonomic name matching. Also, you might want to have a look at a recent publication of GloBI’s Taxon Graph at .

Finding Suspicious or New Names

If GloBI encountered a name that has not yet been successfully mapped in GloBI’s Taxon Graph, the name is labeled with “no:match” in the search engine. Now, we can use this label to find interaction records that include names that are new to GloBI or failed to match to supported taxonomic name schemes.

So, “no:match” names might include names that contain typos, but may also include names that are valid, but have not yet been included in GloBI’s Taxon Graph.

Exercise 2. Finding Suspicious Names

Show three ways you can find suspicious names using GloBI tools.

Feedback Loops

Exercise 3. Reporting a suspicious name or starting a discussion

Add add examples on how to reach out to peers to discuss suspicious names or records.

Key Points

  • Taxonomic name linking facilitates discovery, review, and interpretation, of interaction records

  • GloBI uses a versioned taxonomic name map to map verbatim names into known taxonomic schemes

  • GloBI attempts to provide reasonable links using a controlled and iterative process

  • GloBI taxonomic name linking process is likely imperfect and subjective

Data Sources: Interaction Data Record Review


Teaching: 15 min
Exercises: 5 min
  • How can I understand how GloBI interpreted my dataset?

  • Can GloBI help me improve my dataset?

  • Find out where GloBI reviews are located.

  • Find out how to explore GloBI reviews


The goal of this lesson is to introduce you to data reviews in GloBI.

Getting started

At the end of this time we will regroup and report back the other workshop participants about what we did in this breakout group. Who would like to be the person/s who report back for the breakout group?

Let’s collaboratively take notes in the Google Document. The link to the document is in the chat.

What is a review

A review in GloBI is an output that lets us know how GloBI interpreted or viewed the data being indexed. It is an opportunity to see if you agree with the interpretation or find issues in the data that can be corrected. It also provides some cool statistics about the number of interaction records indexed from a particular dataset. Reviews are done by dataset only. Reviews are useful for people who are submitting data, data curators, or anyone who wants to know more about a particular dataset.

Finding the reviews

Remember we explored in working with data sources that all GloBI data sources are listed on the sources page. In addition, providers for Terrestrial Parasite Tracker have their own webpage.

Let’s examine a GloBI data review. Visit and locate the ucsb-izc collection. Click on the badge.

What is what? An overview of the data.

   _____ _       ____ _____   _____            _                
  / ____| |     |  _ \_   _| |  __ \          (_)               
 | |  __| | ___ | |_) || |   | |__) |_____   ___  _____      __ 
 | | |_ | |/ _ \|  _ < | |   |  _  // _ \ \ / / |/ _ \ \ /\ / / 
 | |__| | | (_) | |_) || |_  | | \ \  __/\ V /| |  __/\ V  V /  
  \_____|_|\___/|____/_____| |_|  \_\___| \_/ |_|\___| \_/\_/   
 | |           |  ____| | |                                     
 | |__  _   _  | |__  | | |_ ___  _ __                          
 | '_ \| | | | |  __| | | __/ _ \| '_ \                         
 | |_) | |_| | | |____| | || (_) | | | |                        
 |_.__/ \__, | |______|_|\__\___/|_| |_|                        
         __/ |                                                  

Review of [globalbioticinteractions/ucsb-izc] started at [2021-04-26T06:04:32+02:00].

Review of [globalbioticinteractions/ucsb-izc] included:
  - 1440 interaction(s)
  - 25 note(s)
  - 1442 info(s)

[globalbioticinteractions/ucsb-izc] has 25 reviewer note(s):
      7 found unsupported interaction type with name: [Visiting]
      6 source taxon name missing: using institutionCode/collectionCode/collectionId/catalogNumber/occurrenceId as placeholder
      3 found unsupported interaction type with name: [Sitting on]
      3 found unsupported interaction type with name: [Hovering over]
      2 found unsupported interaction type with name: [Feeding on]
      1 found unsupported interaction type with name: [Visitng]
      1 found unsupported interaction type with name: [visiting]
      1 found unsupported interaction type with name: [Tended by]
      1 found unsupported interaction type with name: [Next to]

1440 interaction(s) is number of interactions indexed by GloBI from this dataset.

25 note(s) or rows in the dataset that are flagged and might be interesting to have a look at.

1442 info(s) or information about biotic interaction indexing process. These are not flagged records, but more like comments or data logging.

6 source taxon name missing where the taxon name field is blank or empty

unsupported interaction type indicates that no mapping is defined by the data source. Terms are defined by linking to the Relations Ontology.

What is what? I want more information.

GloBI data reviews are packaged in downloadable text files (tab and comma delimited). The review-sample files are small enough to quickly view.

This review generated the following resources:
  - review.svg (review badge)
  - review.tsv.gz (data review)
  - review-sample.tsv (data review sample tab-separated)
  - review-sample.json (data review sample json)
  - review-sample.csv (data review sample csv)
  - indexed-interactions.tsv.gz (indexed interactions)
  - indexed-interactions.csv.gz (indexed interactions)
  - indexed-interactions-sample.tsv (indexed interactions sample)
  - indexed-interactions-sample.csv (indexed interactions sample)
  - indexed-names.tsv.gz (indexed names)
  - indexed-names.csv.gz (indexed names)
  - indexed-names-sample.tsv (indexed names sample)
  - indexed-names-sample.csv (indexed names sample)
  - indexed-citations.tsv.gz (indexed citations)
  - indexed-citations.csv.gz (indexed citations)
  - nanopub.ttl.gz (interactions nanopubs)
  - nanopub-sample.ttl (interactions nanopub sample)
  - (review archive)

Get a glimpse of the review in Google Sheets

1 - Import a review-sample file into Google Sheets 2 - Download the full review archive 3 - Import the indexed-interactions-sample.tsv into Google Sheets

Exercise 1: Find Data Sources

Visit and locate the USNM Ixodes Collection, Seltmann’s > Tick Interaction Database, or iNaturalist observation records. For each, click on the badge to open the review page. For one of the datasets: 1 - find out how many interactions have been indexed for each dataset. 2 - import the indexed-interactions-sample.tsv into Google Sheets

Key Points

  • There are many ways to access GloBI reviews.

  • GloBI reviews can help data managers better understand their data and how GloBI interprets it.