Access locations and times of Sandhill Crane encounters

For this challenge, you will use a database called the Global Biodiversity Information Facility (GBIF). GBIF is compiled from species observation data all over the world, and includes everything from museum specimens to photos taken by citizen scientists in their backyards.

Explore GBIF: Before your get started, go to the GBIF occurrences search page and explore the data.

See also:

Contribute to open data

You can get your own observations added to GBIF using iNaturalist!

Set up your code to prepare for download

We will be getting data from a source called GBIF (Global Biodiversity Information Facility). We need a package called pygbif to access the data, which may not be included in your environment. Install it by running the cell below:

Code
conda list pygbif
Code
%pip install -q -e ..
Code
from landmapyr.initial import create_data_dir, robust_code
Code
from landmapyr.gbif import gbif_credentials, gbif_species_key
from landmapyr.gbif import download_gbif, load_gbif, gbif_monthly
from landmapyr.gbif import ecoregions, join_ecoregions_monthly
from landmapyr.gbif import count_by_ecoregions
from landmapyr.gbif import simplify_ecoregions_gdf, join_occurrence

Import packages: In the imports cell, we’ve included some packages that you will need. Add imports for packages that will help you:

  • Work with reproducible file paths
  • Work with tabular data

For now, run gbif.py. Soon, incorporate it into landmapyr package.

Code
robust_code()
data_dir = create_data_dir('species')
gbif_dir = create_data_dir('species/gbif_sandhill')
gbif_dir
'/Users/brianyandell/earth-analytics/data/species/gbif_sandhill'

Register and log in to GBIF

You will need a GBIF account to complete this challenge. You can use your GitHub account to authenticate with GBIF. Then, run the following code to save your credentials on your computer.

Warning

Your email address must match the email you used to sign up for GBIF!

Tip

If you accidentally enter your credentials wrong, you can set reset_credentials=True instead of reset_credentials=False.

Code
gbif_credentials(False)

Get the species key

** Your task**

  1. Replace the species_name with the name of the species you want to look up
  2. Run the code to get the species key
Code
species_name, species_key = gbif_species_key('grus canadensis')
species_name, species_key
('Antigone canadensis', 2474953)

Download data from GBIF

Code
gbif_path = download_gbif(gbif_dir, species_key)
gbif_path
'/Users/brianyandell/earth-analytics/data/species/gbif_sandhill/0012336-260423192947929.zip'

download key is 0020917-241007104925546 GBIF.org (17 October 2024) GBIF Occurrence Download https://doi.org/10.15468/dl.4d3k48

Load the GBIF data into Python

Load GBIF data:

  • Look at the beginning of the file you downloaded using the code below. What do you think the delimiter is?
  • Run the following code cell. What happens?
  • Uncomment and modify the parameters of pd.read_csv() below until your data loads successfully and you have only the columns you want.

You can use the following code to look at the beginning of your file:

I copied from Lauren Alexandra and Lauren Gleason

Code
gbif_df = load_gbif(gbif_path)
gbif_df.head()
countryCode stateProvince decimalLatitude decimalLongitude month year
gbifID
4103735033 US NaN 40.6437 -98.8950 3 2023
4953718975 US NaN 31.5633 -109.7160 11 2023
4135517029 US NaN 42.3291 -84.2403 5 2023
4953718976 US NaN 32.0840 -109.0510 1 2023
4953718225 US NaN 32.0840 -109.0510 3 2023

Canada Breeding Locations

Code
ac_CA = gbif_df.loc[gbif_df['countryCode'] == 'CA']
ac_CA.value_counts()
countryCode  stateProvince     decimalLatitude  decimalLongitude  month  year
CA           Ontario           41.955400        -82.514000        5      2023    395
                               42.038967        -82.509125        5      2023    217
             British Columbia  49.100000        -123.185000       1      2023    183
             Ontario           44.350597        -79.883620        4      2023    168
             British Columbia  49.100000        -123.185000       12     2023    159
                                                                                ... 
             Ontario           43.349340        -80.209730        5      2023      1
                               43.349964        -80.375656        4      2023      1
                               43.353832        -79.860730        4      2023      1
                               43.354767        -81.499330        4      2023      1
             Yukon Territory   69.596910        -140.185060       6      2023      1
Name: count, Length: 15708, dtype: int64

US Breeding Locations

Code
ac_US = gbif_df.loc[gbif_df['countryCode'] == 'US']
ac_US.value_counts()
countryCode  stateProvince  decimalLatitude  decimalLongitude  month  year
US           Ohio           41.627710        -83.191890        5      2023    1576
                            41.645070        -83.263720        5      2023     505
             Wisconsin      43.033360        -89.351380        4      2023     336
             Arizona        31.561499        -109.720020       1      2023     321
             Ohio           41.626520        -83.188970        5      2023     312
                                                                              ... 
             Michigan       42.253153        -85.857013        8      2023       1
                            42.253130        -85.988390        5      2023       1
                            42.253110        -83.695220        9      2023       1
                            42.253010        -85.302475        3      2023       1
             Wyoming (WY)   44.610549        -110.220852       7      2023       1
Name: count, Length: 103892, dtype: int64

Convert GBIF data to a GeoDataFrame by Month

Code
monthly_gdf = gbif_monthly(gbif_df)
monthly_gdf
year month geometry
gbifID
4103735033 2023 3 POINT (-98.895 40.6437)
4953718975 2023 11 POINT (-109.716 31.5633)
4135517029 2023 5 POINT (-84.2403 42.3291)
4953718976 2023 1 POINT (-109.051 32.084)
4953718225 2023 3 POINT (-109.051 32.084)
... ... ... ...
4159135015 2023 5 POINT (-82.54347 41.98161)
4884744706 2023 7 POINT (-86.14826 42.53785)
4159295666 2023 5 POINT (-80.4592 42.5761)
4408876179 2023 7 POINT (-151.51888 59.63738)
4409335777 2023 8 POINT (-151.52234 59.64154)

313542 rows × 3 columns

Download and save ecoregion boundaries

Ecoregions represent boundaries formed by biotic and abiotic conditions: geology, landforms, soils, vegetation, land use, wildlife, climate, and hydrology.

Code
ecoregions_gdf = ecoregions(data_dir)
ecoregions_gdf.plot(edgecolor='black', color='skyblue')
Figure 1
Code
%%bash
find ~/earth-analytics/data/species -name '*.shp'
Code
%store ecoregions_gdf monthly_gdf
Stored 'ecoregions_gdf' (GeoDataFrame)
Stored 'monthly_gdf' (GeoDataFrame)

Identify the ecoregion for each observation

Code
gbif_ecoregion_gdf = join_ecoregions_monthly(ecoregions_gdf, monthly_gdf)
gbif_ecoregion_gdf
year month name
ecoregion
4 2023 8 Ahklun and Kilbuck Upland Tundra
4 2023 7 Ahklun and Kilbuck Upland Tundra
4 2023 7 Ahklun and Kilbuck Upland Tundra
4 2023 7 Ahklun and Kilbuck Upland Tundra
4 2023 7 Ahklun and Kilbuck Upland Tundra
... ... ... ...
833 2023 4 Northern Rockies conifer forests
833 2023 5 Northern Rockies conifer forests
833 2023 5 Northern Rockies conifer forests
833 2023 6 Northern Rockies conifer forests
833 2023 5 Northern Rockies conifer forests

307693 rows × 3 columns

Count the observations in each ecoregion each month

Code
occurrence_df = count_by_ecoregions(gbif_ecoregion_gdf, 'ecoregion', 'name', 'month')
occurrence_df
occurrences norm_occurrences
ecoregion month
4 7 5 0.004427
9 5 3 0.000745
6 2 0.001061
8 8 0.004741
9 13 0.007170
... ... ... ...
833 7 169 0.004581
8 173 0.004080
9 131 0.002875
10 95 0.001874
11 25 0.000438

788 rows × 2 columns

Code
# plot to check distrubions 
occurrence_df.reset_index().plot.scatter(
    x='month', y='norm_occurrences', c='ecoregion',
    logy=True
)
Figure 2

Create a simplified GeoDataFrame for plot

Code
ecoregions_gdf = simplify_ecoregions_gdf(ecoregions_gdf)
ecoregions_gdf
name area geometry
ecoregion
0 Adelie Land tundra 0.038948 MULTIPOLYGON EMPTY
1 Admiralty Islands lowland rain forests 0.170599 POLYGON ((16411777.375 -229101.376, 16384825.7...
2 Aegean and Western Turkey sclerophyllous and m... 13.844952 MULTIPOLYGON (((3391149.749 4336064.109, 33846...
3 Afghan Mountains semi-desert 1.355536 MULTIPOLYGON (((7369001.698 4093509.259, 73168...
4 Ahklun and Kilbuck Upland Tundra 8.196573 MULTIPOLYGON (((-17930832.005 8046779.358, -17...
... ... ... ...
842 Sulawesi lowland rain forests 9.422097 MULTIPOLYGON (((14113374.546 501721.962, 14128...
843 East African montane forests 5.010930 MULTIPOLYGON (((4298787.669 -137583.786, 42727...
844 Eastern Arc forests 0.890325 MULTIPOLYGON (((4267432.68 -493759.165, 428533...
845 Borneo montane rain forests 9.358407 MULTIPOLYGON (((13126956.393 539092.917, 13136...
846 Kinabalu montane alpine meadows 0.352694 POLYGON ((12981819.186 696445.445, 12997053.80...

847 rows × 3 columns

Code
%store gbif_path
%who
Stored 'gbif_path' (str)
ac_CA    ac_US   count_by_ecoregions     create_data_dir     data_dir    download_gbif   ecoregions  ecoregions_gdf  gbif_credentials    
gbif_df  gbif_dir    gbif_ecoregion_gdf  gbif_monthly    gbif_path   gbif_species_key    join_ecoregions_monthly     join_occurrence     load_gbif   
monthly_gdf  occurrence_df   ojs_define  robust_code     simplify_ecoregions_gdf     species_key     species_name    

Mapping monthly distribution

Code
occurrence_gdf = join_occurrence(ecoregions_gdf, occurrence_df)
occurrence_gdf
name area geometry norm_occurrences
ecoregion month
4 7 Ahklun and Kilbuck Upland Tundra 8.196573 MULTIPOLYGON (((-17930832.005 8046779.358, -17... 0.004427
9 5 Alaska-St. Elias Range tundra 28.388010 MULTIPOLYGON (((-16886232.729 9049093.235, -16... 0.000745
6 Alaska-St. Elias Range tundra 28.388010 MULTIPOLYGON (((-16886232.729 9049093.235, -16... 0.001061
8 Alaska-St. Elias Range tundra 28.388010 MULTIPOLYGON (((-16886232.729 9049093.235, -16... 0.004741
9 Alaska-St. Elias Range tundra 28.388010 MULTIPOLYGON (((-16886232.729 9049093.235, -16... 0.007170
... ... ... ... ... ...
833 7 Northern Rockies conifer forests 35.905513 POLYGON ((-13358313.218 7236575.932, -13331349... 0.004581
8 Northern Rockies conifer forests 35.905513 POLYGON ((-13358313.218 7236575.932, -13331349... 0.004080
9 Northern Rockies conifer forests 35.905513 POLYGON ((-13358313.218 7236575.932, -13331349... 0.002875
10 Northern Rockies conifer forests 35.905513 POLYGON ((-13358313.218 7236575.932, -13331349... 0.001874
11 Northern Rockies conifer forests 35.905513 POLYGON ((-13358313.218 7236575.932, -13331349... 0.000438

788 rows × 4 columns

Code
%store occurrence_gdf
Stored 'occurrence_gdf' (GeoDataFrame)

Plot monthly distribution

Static Plot

Code
from landmapyr.plots import plot_occurrence
plot_occurrence(occurrence_gdf)
Figure 3

Optional Dynamic Plot

Code
from landmapyr.hv_plots import hvplot_occurrence
occurrence_hvplot = hvplot_occurrence(occurrence_gdf)
# Save the plot
occurrence_hvplot.save('sandhill-crane-migration.html', embed=True)
  0%|          | 0/12 [00:00<?, ?it/s] 17%|█▋        | 2/12 [00:00<00:00, 14.02it/s] 33%|███▎      | 4/12 [00:00<00:00, 11.88it/s] 50%|█████     | 6/12 [00:00<00:00, 10.64it/s] 67%|██████▋   | 8/12 [00:00<00:00, 10.37it/s] 83%|████████▎ | 10/12 [00:00<00:00, 11.24it/s]100%|██████████| 12/12 [00:01<00:00, 12.49it/s]                                               
WARNING:W-1005 (FIXED_SIZING_MODE): 'fixed' sizing mode requires width and height to be set: figure(id='d086bcb4-198f-4b50-aa52-af1d87f6dd8f', ...)
Code
occurrence_hvplot
Figure 4

April Observations

Code
occurrence_gdf_complete = occurrence_gdf.reset_index()

april_occ = occurrence_gdf_complete.loc[occurrence_gdf_complete['month'] == 4].sort_values(by=['norm_occurrences'], ascending=False)

april_occ_top_5 = april_occ[0:5]
april_occ_bottom_5 = april_occ[-5:]
Code
# Top Five Ecoregions

april_occ_top_5
ecoregion month name area geometry norm_occurrences
115 81 4 British Columbia coastal conifer forests 14.653986 MULTIPOLYGON (((-14364688.43 7420408.623, -143... 0.007219
225 149 4 Central Tallgrass prairie 36.779324 POLYGON ((-10534926.556 5619565.277, -10517878... 0.005960
557 546 4 Palouse prairie 9.866972 MULTIPOLYGON (((-12951912.056 5827151.995, -12... 0.005675
257 173 4 Colorado Rockies forests 15.113154 MULTIPOLYGON (((-12173003.318 4534115.934, -12... 0.005394
487 471 4 New England-Acadian forests 38.509900 MULTIPOLYGON (((-7182650.847 5741141.666, -715... 0.004976
Code
# Bottom Five Ecoregions

april_occ_bottom_5
ecoregion month name area geometry norm_occurrences
611 639 4 Sonoran desert 21.416224 MULTIPOLYGON (((-12499491.62 3383569.444, -124... 0.000087
136 88 4 California coastal sage and chaparral 3.172258 MULTIPOLYGON (((-12820829.454 3243992.707, -12... 0.000087
237 162 4 Chihuahuan desert 46.807295 MULTIPOLYGON (((-12343440.455 3790837.437, -12... 0.000079
711 783 4 Western Gulf coastal grasslands 8.340400 POLYGON ((-10826974.582 3185079.852, -10843709... 0.000029
127 87 4 California Central Valley grasslands 4.727694 POLYGON ((-13595834.408 4868653.384, -13554815... 0.000023