Each occurrence record contains taxonomic information and information
about the observation itself, like its location and the date of
observation. These pieces of information are recorded and categorised
into respective fields. When you import data using
galah
, columns of the resulting tibble
correspond to these fields.
Data fields are important because they provide a means to manipulate
queries to return only the information that you need, and no more.
Consequently, much of the architecture of galah
has been
designed to make narrowing as simple as possible. These functions
include:
galah_identify
galah_filter
galah_select
galah_group_by
galah_geolocate
galah_down_to
These names have been chosen to echo comparable functions from
dplyr
; namely filter
, select
and
group_by
. With the exception of
galah_geolocate
, they also use dplyr
tidy
evaluation and syntax. This means that how you use dplyr
functions is also how you use galah_
functions.
Perhaps unsurprisingly, search_taxa
searches for
taxonomic information. It uses fuzzy matching to work a lot like the
search bar on the Atlas of Living
Australia website, and you can use it to search for taxa by their
scientific name. Finding your desired taxon with
search_taxa
is an important step to using this taxonomic
information to download data with galah
.
For example, to search for reptiles, we first need to identify whether we have the correct query:
search_taxa("Reptilia")
## # A tibble: 1 × 9
## search_term scientific_name taxon_concept_id rank match…¹ kingdom phylum class issues
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Reptilia REPTILIA https://biodiversity.org.au/afd/taxa/682e1228-5b3c-45ff-833b… class exactM… Animal… Chord… Rept… noIss…
## # … with abbreviated variable name ¹match_type
If we want to be more specific by providing additional taxonomic
information to search_taxa
, you can provide a
data.frame
containing more levels of the taxonomic
hierarchy:
search_taxa(data.frame(genus = "Eolophus", kingdom = "Aves"))
## # A tibble: 1 × 13
## search_term scientific_name scientific_name_authorship taxon_conce…¹ rank match…² kingdom phylum class order family genus issues
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Eolophus_Aves Eolophus Bonaparte, 1854 https://biod… genus exactM… Animal… Chord… Aves Psit… Cacat… Eolo… noIss…
## # … with abbreviated variable names ¹taxon_concept_id, ²match_type
Once we know that our search matches the correct taxon or taxa, we
can use galah_identify
to narrow the results of our
queries:
galah_call() |>
galah_identify("Reptilia") |>
atlas_counts()
## # A tibble: 1 × 1
## count
## <int>
## 1 1474321
taxa <- search_taxa(data.frame(genus = "Eolophus", kingdom = "Aves"))
galah_call() |>
galah_identify(taxa) |>
atlas_counts()
## # A tibble: 1 × 1
## count
## <int>
## 1 994158
If you’re using an international atlas, search_taxa
will
automatically switch to using the local name-matching service. For
example, Spain uses the GBIF taxonomic backbone, but integrates
seamlessly with our standard workflow.
galah_config(atlas = "Spain")
galah_call() |>
galah_identify("Lepus") |>
galah_group_by(species) |>
atlas_counts()
## # A tibble: 5 × 2
## species count
## <chr> <int>
## 1 Lepus granatensis 9000
## 2 Lepus europaeus 2884
## 3 Lepus castroviejoi 153
## 4 Lepus capensis 2
## 5 Lepus californicus 1
Conversely, the UK’s National Biodiversity Network (NBN), has its’ own taxonomic backbone, but is supported using the same function call.
galah_config(atlas = "United Kingdom")
galah_call() |>
galah_identify("Bufo") |>
galah_group_by(species) |>
atlas_counts()
## # A tibble: 2 × 2
## species count
## <chr> <int>
## 1 Bufo bufo 75165
## 2 Bufo spinosus 1
Perhaps the most important function in galah
is
galah_filter
, which is used to filter the rows of
queries:
# Get total record count since 2000
galah_call() |>
galah_filter(year > 2000) |>
atlas_counts()
## # A tibble: 1 × 1
## count
## <int>
## 1 73800220
# Get total record count for iNaturalist in 2021
galah_call() |>
galah_filter(
year > 2000,
dataResourceName == "iNaturalist Australia"
) |>
atlas_counts()
## # A tibble: 1 × 1
## count
## <int>
## 1 3628323
To find available fields and corresponding valid values, use the
field lookup functions show_all(fields)
,
search_all(fields)
& show_values()
.
Finally, a special case of galah_filter
is to make more
complex taxonomic queries than are possible using
search_taxa
. By using the taxonConceptID
field, it is possible to build queries that exclude certain taxa, for
example. This can be useful for paraphyletic concepts such as
invertebrates:
galah_call() |>
galah_filter(
taxonConceptID == search_taxa("Animalia")$taxon_concept_id,
taxonConceptID != search_taxa("Chordata")$taxon_concept_id
) |>
galah_group_by(class) |>
atlas_counts()
## # A tibble: 83 × 2
## class count
## <chr> <int>
## 1 Insecta 3954876
## 2 Gastropoda 878641
## 3 Malacostraca 573921
## 4 Arachnida 553923
## 5 Maxillopoda 473377
## 6 Polychaeta 258910
## 7 Bivalvia 215372
## 8 Anthozoa 169309
## 9 Demospongiae 113286
## 10 Ostracoda 58699
## # … with 73 more rows
When working with the ALA, a notable feature is the ability to
specify a profile
to remove records that are suspect in
some way.
galah_call() |>
galah_filter(year > 2000) |>
galah_apply_profile(ALA) |>
atlas_counts()
## # A tibble: 1 × 1
## count
## <int>
## 1 66776476
To see a full list of data quality profiles, use
show_all(profiles)
.
Use galah_group_by
to group record counts and summarise
counts by specified fields:
# Get record counts since 2010, grouped by year and basis of record
galah_call() |>
galah_filter(year > 2015 & year <= 2020) |>
galah_group_by(year, basisOfRecord) |>
atlas_counts()
## # A tibble: 25 × 3
## basisOfRecord year count
## <chr> <chr> <int>
## 1 HUMAN_OBSERVATION 2020 6293285
## 2 HUMAN_OBSERVATION 2019 5517709
## 3 HUMAN_OBSERVATION 2018 5237933
## 4 HUMAN_OBSERVATION 2017 4347304
## 5 HUMAN_OBSERVATION 2016 3512645
## 6 OCCURRENCE 2016 165997
## 7 OCCURRENCE 2018 116242
## 8 OCCURRENCE 2017 102206
## 9 OCCURRENCE 2019 91640
## 10 OCCURRENCE 2020 39429
## # … with 15 more rows
Use galah_select
to choose which columns are returned
when downloading records:
# Get *Reptilia* records from 1930, but only 'eventDate' and 'kingdom' columns
occurrences <- galah_call() |>
galah_identify("reptilia") |>
galah_filter(year == 1930) |>
galah_select(eventDate, kingdom) |>
atlas_occurrences()
occurrences
## # A tibble: 54 × 2
## eventDate kingdom
## <chr> <chr>
## 1 1929-12-31T14:00:00Z Animalia
## 2 1929-12-31T14:00:00Z Animalia
## 3 1929-12-31T14:00:00Z Animalia
## 4 1929-12-31T14:00:00Z Animalia
## 5 1929-12-31T14:00:00Z Animalia
## 6 1929-12-31T14:00:00Z Animalia
## 7 1929-12-31T14:00:00Z Animalia
## 8 1929-12-31T14:00:00Z Animalia
## 9 1929-12-31T14:00:00Z Animalia
## 10 1929-12-31T14:00:00Z Animalia
## # … with 44 more rows
You can also use other dplyr
functions that work with
dplyr::select()
with galah_select()
occurrences <- galah_call() |>
galah_identify("reptilia") |>
galah_filter(year == 1930) |>
galah_select(starts_with("elev") & ends_with("n")) |>
atlas_occurrences()
occurrences
## # A tibble: 54 × 55
## recordID catal…¹ taxon…² verba…³ raw_v…⁴ scien…⁵ taxon…⁶ verna…⁷ kingdom phylum class order family genus species subsp…⁸ dataR…⁹
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 050d4cec-… "J3729" https:… Oxyura… "coast… Oxyura… species "Taipa… Animal… Chord… Rept… Squa… Elapi… "Oxy… "Oxyur… "" dr1132
## 2 0aee0e05-… "39139… https:… Tympan… "Lined… Tympan… species "Grass… Animal… Chord… Rept… Squa… Agami… "Tym… "Tympa… "" dr361
## 3 0cbfa7cb-… "R3683… https:… Natrix… "" COLUBR… family "" Animal… Chord… Rept… Squa… Colub… "" "" "" dr346
## 4 0fb28f26-… "" https:… Notech… "easte… Notech… species "Tiger… Animal… Chord… Rept… Squa… Elapi… "Not… "Notec… "" dr1132
## 5 15e65c09-… "34102" https:… Emydur… "Murra… Emydur… subspe… "Macqu… Animal… Chord… Rept… Test… Cheli… "Emy… "Emydu… "Emydu… dr1132
## 6 170fbb84-… "77798" https:… Deniso… "ornam… Deniso… species "Ornam… Animal… Chord… Rept… Squa… Elapi… "Den… "Denis… "" dr1132
## 7 19851ac1-… "" https:… Pogona… "beard… Pogona… species "Commo… Animal… Chord… Rept… Squa… Agami… "Pog… "Pogon… "" dr1132
## 8 1d52ab81-… "" https:… Bellat… "land … Bellat… species "Land … Animal… Chord… Rept… Squa… Scinc… "Bel… "Bella… "" dr1132
## 9 1ef9fad5-… "R.124… https:… Egerni… "" Egerni… species "Cunni… Animal… Chord… Rept… Squa… Scinc… "Ege… "Egern… "" dr340
## 10 23da98ca-… "" https:… Antare… "spott… Antare… species "Spott… Animal… Chord… Rept… Squa… Pytho… "Ant… "Antar… "" dr1132
## # … with 44 more rows, 38 more variables: institutionUid <chr>, institutionName <chr>, collectionUid <chr>, collectionName <chr>,
## # dctermsLicense <chr>, institutionCode <chr>, collectionCode <chr>, locality <chr>, verbatimLatitude <dbl>,
## # verbatimLongitude <dbl>, verbatimCoordinateSystem <chr>, decimalLatitude <dbl>, decimalLongitude <dbl>,
## # coordinatePrecision <dbl>, coordinateUncertaintyInMeters <dbl>, country <chr>, stateProvince <chr>, cl959 <chr>, cl21 <chr>,
## # cl1048 <chr>, minimumElevationInMeters <lgl>, maximumElevationInMeters <lgl>, minimumDepthInMeters <lgl>,
## # maximumDepthInMeters <lgl>, individualCount <int>, recordedBy <chr>, year <int>, month <int>, day <int>, eventDate <chr>,
## # verbatimBasisOfRecord <chr>, basisOfRecord <chr>, occurrenceStatus <chr>, raw_sex <chr>, preparations <chr>, …
Use galah_geolocate
to specify a geographic area or
region to limit your search:
# Get list of perameles species only in area specified:
# (Note: This can also be specified by a shapefile)
wkt <- "POLYGON((131.36328125 -22.506468769126,135.23046875 -23.396716654542,134.17578125 -27.287832521411,127.40820312499 -26.661206402316,128.111328125 -21.037340349154,131.36328125 -22.506468769126))"
galah_call() |>
galah_identify("perameles") |>
galah_geolocate(wkt) |>
atlas_species()
## # A tibble: 2 × 10
## kingdom phylum class order family genus species author species_guid verna…¹
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Animalia Chordata Mammalia Peramelemorphia Peramelidae Perameles Perameles eremiana Spencer, 1897 https://biodi… Desert…
## 2 Animalia Chordata Mammalia Peramelemorphia Peramelidae Perameles Perameles bougainville Quoy & Gaimard, 1824 https://biodi… Shark …
## # … with abbreviated variable name ¹vernacular_name
Use galah_down_to
to specify the lowest taxonomic level
to contruct a taxonomic tree:
galah_call() |>
galah_identify("fungi") |>
galah_down_to(phylum) |>
atlas_taxonomy()
## levelName
## 1 Fungi
## 2 ¦--Dikarya
## 3 ¦ °--Entorrhizomycota
## 4 ¦--Ascomycota
## 5 ¦--Basidiomycota
## 6 ¦--Blastocladiomycota
## 7 ¦--Chytridiomycota
## 8 ¦--Cryptomycota
## 9 ¦--Glomeromycota
## 10 ¦--Microspora
## 11 ¦--Microsporidia
## 12 ¦--Mucoromycota
## 13 ¦--Neocallimastigomycota
## 14 ¦--Zoopagomycota
## 15 °--Zygomycota