Starting with version 1.4.0, there have been three major improvements
to how galah
works:
galah_call
Below we discuss each of these changes in turn. Please note, however,
that these changes are by no means set in stone - it is absolutely
possible to change syntax in future versions of galah
if
alternative names are easier to use and understand. We would appreciate
any feedback from users about what works or what doesn’t work. It is our
goal to create a package that is as easy and intuitive for users as
possible!
dplyr
galah_
functions now evaluate arguments just like
dplyr
. To see what we mean, let’s look at an example of how
dplyr::filter()
works. Notice how
dplyr::filter
and galah_filter
both require
logical arguments to be added by using the ==
sign:
library(dplyr)
%>%
mtcars filter(mpg == 21)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
galah_call() %>%
galah_filter(year == 2021) %>%
atlas_counts()
## # A tibble: 1 × 1
## count
## <int>
## 1 7116810
As another example, notice how galah_group_by()
+
atlas_counts()
works very similarly to
dplyr::group_by()
+ dplyr::count()
:
%>%
mtcars group_by(vs) %>%
count()
## # A tibble: 2 × 2
## # Groups: vs [2]
## vs n
## <dbl> <int>
## 1 0 18
## 2 1 14
galah_call() %>%
galah_group_by(biome) %>%
atlas_counts()
## # A tibble: 2 × 2
## biome count
## <chr> <int>
## 1 TERRESTRIAL 105579552
## 2 MARINE 3526706
We made this move towards tidy evaluation to make it possible to use
piping for building queries to the Atlas of Living Australia. In
practice, this means that data queries can be filtered just like how you
might filter a data.frame
with the tidyverse
suite of functions.
Prior to version 1.4.0, galah
naming conventions had two
major problems:
ala
, but could
actually query many living atlases (not just ALA)select
, but
this is reserved in dplyr
(and elsewhere) for operations
specifically on columnsTo address these concerns (and other smaller points), we have completed a rewrite of our function names to increase clarity (see table below). Deprecated function names will now return a warning message when used, suggesting to users that they switch to the new syntax.
galah 1.3.1 and earlier | galah 1.4.0 |
---|---|
galah_call | |
select_taxa | galah_identify |
select_filters | galah_filter |
select_columns | galah_select |
select_locations | galah_geolocate |
galah_group_by | |
galah_down_to | |
ala_counts | atlas_counts |
ala_occurrences | atlas_occurrences |
ala_species | atlas_species |
ala_media | atlas_media |
ala_taxonomy | atlas_taxonomy |
ala_citation | atlas_citation |
select_taxa | search_taxa, search_identifiers |
search_fields | search_fields |
show_all_fields | |
find_profiles | show_all_profiles |
find_ranks | show_all_ranks |
find_atlases | show_all_atlases |
find_reasons | show_all_reasons |
find_cached_files | show_all_cached_files |
find_field_values | search_field_values |
find_profile_attributes | search_profile_attributes |
galah_call()
Perhaps the largest change from galah
1.4.0 is the
implementation of piping using galah_call()
.
Beginning a query with galah_call()
(be sure to add the
parentheses!) tells galah
that you will be using pipes to
construct your query. Follow this with your preferred pipe
(|>
from base
or %>%
from
magrittr
). You can then narrow your query line-by-line
using galah_
functions. Finally, end with an
atlas_
function to identify what type of data you want from
your query.
Unlike old function names, which will be removed from future
versions, we do intend to continue supported un-piped syntax in future,
although piping only works with revamped function names. If you’re new
to piping, here’s a comparison against code from previous versions of
galah
.
Previously, if you wanted to look up the number of records of each bandicoot species every year from 2010 to 2021, you’d have had to do something like this:
library(purrr)
library(dplyr)
<- ala_species(taxa = select_taxa("perameles"))$species
taxa <- select_filters(year = seq(2010:2021))
years
%>%
taxa map_dfr( ~ ala_counts(
taxa = select_taxa(list(species = .x)),
filters = years,
group_by = "year")
Not very easy because you had to use multiple atlas_
functions and you had to use loops. However, now with piping you can do
it like this:
galah_call() %>%
galah_identify("perameles") %>%
galah_filter(year > 2010) %>%
galah_group_by(species, year) %>%
atlas_counts()
And a second example, if you wanted to download occurrence records of bandicoots in 2021, and also to include information on which records had zero coordinates, previously you would have had to do this:
atlas_occurrences(taxa = select_taxa("perameles"),
filters = select_filters(year = 2021),
columns = select_columns(group = "basic", "ZERO_COORDINATE"))
Now with piping:
galah_call() |>
galah_identify("perameles") |>
galah_filter(year == 2021) |>
galah_select(group = "basic", ZERO_COORDINATE) |>
atlas_occurrences()