Access locations and times of Siberian Crane encounters

For this challenge, you will use a database called the Global Biodiversity Information Facility (GBIF). GBIF is compiled from species observation data all over the world, and includes everything from museum specimens to photos taken by citizen scientists in their backyards.

Explore GBIF: Before your get started, go to the GBIF occurrences search page and explore the data.

See also:

Contribute to open data

You can get your own observations added to GBIF using iNaturalist!

Set up your code to prepare for download

We will be getting data from a source called GBIF (Global Biodiversity Information Facility). We need a package called pygbif to access the data, which may not be included in your environment. Install it by running the cell below:

Code
conda list pygbif
Code
%pip install -q -e ..
Code
from landmapyr.initial import create_data_dir, robust_code
from landmapyr.gbif import gbif_credentials, gbif_species_key
from landmapyr.gbif import download_gbif, load_gbif, gbif_monthly
from landmapyr.gbif import ecoregions, join_ecoregions_monthly
from landmapyr.gbif import count_by_ecoregions
from landmapyr.gbif import simplify_ecoregions_gdf, join_occurrence

Import packages: In the imports cell, we’ve included some packages that you will need. Add imports for packages that will help you:

  • Work with reproducible file paths
  • Work with tabular data
Code
robust_code()
data_dir = create_data_dir('species')
gbif_dir = create_data_dir('species/gbif_siberian')
gbif_dir
'/Users/brianyandell/earth-analytics/data/species/gbif_siberian'

Register and log in to GBIF

You will need a GBIF account to complete this challenge. You can use your GitHub account to authenticate with GBIF. Then, run the following code to save your credentials on your computer.

Warning

Your email address must match the email you used to sign up for GBIF!

Tip

If you accidentally enter your credentials wrong, you can set reset_credentials=True instead of reset_credentials=False. Look to top of screen for entry of credentials.

Code
gbif_credentials(False)

Get the species key

** Your task**

  1. Replace the species_name with the name of the species you want to look up
  2. Run the code to get the species key
Code
species_name, species_key = gbif_species_key('grus leucogeranus')
species_name, species_key
('Grus leucogeranus', 2474961)

Download data from GBIF

Code
gbif_path = download_gbif(gbif_dir, species_key, year=None)
gbif_path
'/Users/brianyandell/earth-analytics/data/species/gbif_siberian/0001177-250227182400228.zip'
INFO:Your download key is 0001177-250227182400228
INFO:Download file size: 171492 bytes
INFO:On disk at /Users/brianyandell/earth-analytics/data/species/gbif_siberian/0001177-250227182400228.zip

Load the GBIF data into Python

Load GBIF data:

  • Look at the beginning of the file you downloaded using the code below. What do you think the delimiter is?
  • Run the following code cell. What happens?
  • Uncomment and modify the parameters of pd.read_csv() below until your data loads successfully and you have only the columns you want.

You can use the following code to look at the beginning of your file:

I copied from Lauren Alexandra and Lauren Gleason

Code
gbif_df = load_gbif(gbif_path)
print(gbif_df.head())
          countryCode stateProvince  decimalLatitude  decimalLongitude  month  \
gbifID                                                                          
985829831          IN     Rajasthan        27.161905         77.522800    2.0   
979229641          CN       Jiangxi        28.870571        116.433170   11.0   
978902062          IR    Mazandaran        36.667110         52.550186   11.0   
978782158          IN     Rajasthan        27.161905         77.522800    1.0   
977810003          IN     Rajasthan        27.161905         77.522800    1.0   

             year  
gbifID             
985829831  1991.0  
979229641  1988.0  
978902062  2011.0  
978782158  1991.0  
977810003  1992.0  

Convert GBIF data to a GeoDataFrame by Month

Code
monthly_gdf = gbif_monthly(gbif_df)
monthly_gdf
year month geometry
gbifID
985829831 1991.0 2.0 POINT (77.5228 27.1619)
979229641 1988.0 11.0 POINT (116.43317 28.87057)
978902062 2011.0 11.0 POINT (52.55019 36.66711)
978782158 1991.0 1.0 POINT (77.5228 27.1619)
977810003 1992.0 1.0 POINT (77.5228 27.1619)
... ... ... ...
1019036144 1983.0 6.0 POINT (-90 43.75)
1019036117 1983.0 6.0 POINT (-90 43.75)
1019036092 1983.0 6.0 POINT (-90 43.75)
1019036069 1983.0 6.0 POINT (-90 43.75)
1019035937 1983.0 6.0 POINT (-90 43.75)

2936 rows × 3 columns

Download and save ecoregion boundaries

Ecoregions represent boundaries formed by biotic and abiotic conditions: geology, landforms, soils, vegetation, land use, wildlife, climate, and hydrology.

Code
ecoregions_gdf = ecoregions(data_dir)
ecoregions_gdf.plot(edgecolor='black', color='skyblue')
Figure 1
Code
%%bash
find ~/earth-analytics/data/species -name '*.shp'
Code
%store ecoregions_gdf monthly_gdf
Stored 'ecoregions_gdf' (GeoDataFrame)
Stored 'monthly_gdf' (GeoDataFrame)

Identify the ecoregion for each observation

Code
gbif_ecoregion_gdf = join_ecoregions_monthly(ecoregions_gdf, monthly_gdf)
gbif_ecoregion_gdf
year month name
ecoregion
5 2015.0 3.0 Al-Hajar foothill xeric woodlands and shrublands
5 2015.0 3.0 Al-Hajar foothill xeric woodlands and shrublands
5 2014.0 7.0 Al-Hajar foothill xeric woodlands and shrublands
5 2017.0 12.0 Al-Hajar foothill xeric woodlands and shrublands
8 NaN NaN Alashan Plateau semi-desert
... ... ... ...
802 2023.0 1.0 Yellow Sea saline meadow
802 2018.0 1.0 Yellow Sea saline meadow
802 2015.0 2.0 Yellow Sea saline meadow
802 2018.0 1.0 Yellow Sea saline meadow
802 2015.0 1.0 Yellow Sea saline meadow

2269 rows × 3 columns

Count the observations in each ecoregion each year and month

Code
occurrence_month_df = count_by_ecoregions(gbif_ecoregion_gdf,
                        'ecoregion', 'name', 'month')
occurrence_month_df
occurrences norm_occurrences
ecoregion month
5 3.0 2 0.098214
24 5.0 6 0.156250
9.0 2 0.142857
53 3.0 9 0.098214
74 1.0 3 0.016181
... ... ... ...
758 5.0 20 0.132275
6.0 16 0.066253
802 1.0 4 0.021575
2.0 3 0.025840
12.0 2 0.013605

78 rows × 2 columns

Code
occurrence_year_df = count_by_ecoregions(gbif_ecoregion_gdf,
                        'ecoregion', 'name', 'year')
occurrence_year_df
occurrences norm_occurrences
ecoregion year
5 2015.0 2 0.194444
24 2014.0 2 0.156250
2017.0 2 0.065868
2024.0 2 0.092593
53 2020.0 4 0.059259
... ... ... ...
758 1996.0 3 0.084746
802 2014.0 2 0.125000
2015.0 3 0.233333
2018.0 3 0.038462
2023.0 2 0.032520

140 rows × 2 columns

Code
# plot to check distrubions 
occurrence_year_df.reset_index().plot.scatter(
    x='year', y='occurrences', c='ecoregion',
    logy=True
)
Figure 2

Create a simplified GeoDataFrame for plot

Code
ecoregions_gdf = simplify_ecoregions_gdf(ecoregions_gdf)
ecoregions_gdf
name area geometry
ecoregion
0 Adelie Land tundra 0.038948 MULTIPOLYGON EMPTY
1 Admiralty Islands lowland rain forests 0.170599 POLYGON ((16411777.375 -229101.376, 16384825.7...
2 Aegean and Western Turkey sclerophyllous and m... 13.844952 MULTIPOLYGON (((3391149.749 4336064.109, 33846...
3 Afghan Mountains semi-desert 1.355536 MULTIPOLYGON (((7369001.698 4093509.259, 73168...
4 Ahklun and Kilbuck Upland Tundra 8.196573 MULTIPOLYGON (((-17930832.005 8046779.358, -17...
... ... ... ...
842 Sulawesi lowland rain forests 9.422097 MULTIPOLYGON (((14113374.546 501721.962, 14128...
843 East African montane forests 5.010930 MULTIPOLYGON (((4298787.669 -137583.786, 42727...
844 Eastern Arc forests 0.890325 MULTIPOLYGON (((4267432.68 -493759.165, 428533...
845 Borneo montane rain forests 9.358407 MULTIPOLYGON (((13126956.393 539092.917, 13136...
846 Kinabalu montane alpine meadows 0.352694 POLYGON ((12981819.186 696445.445, 12997053.80...

847 rows × 3 columns

Mapping yearly distribution

Code
occurrence_gdf = join_occurrence(ecoregions_gdf, occurrence_year_df)
occurrence_gdf
name area geometry norm_occurrences
ecoregion year
5 2015.0 Al-Hajar foothill xeric woodlands and shrublands 4.099668 POLYGON ((6264504.021 2842331.306, 6336024.085... 0.194444
24 2014.0 Amur meadow steppe 15.118769 MULTIPOLYGON (((15067649.194 6001589.024, 1503... 0.156250
2017.0 Amur meadow steppe 15.118769 MULTIPOLYGON (((15067649.194 6001589.024, 1503... 0.065868
2024.0 Amur meadow steppe 15.118769 MULTIPOLYGON (((15067649.194 6001589.024, 1503... 0.092593
53 2020.0 Azerbaijan shrub desert and steppe 6.794797 POLYGON ((5427403.54 5089371.081, 5512543.361 ... 0.059259
... ... ... ... ... ...
758 1996.0 Upper Midwest US forest-savanna transition 15.481685 MULTIPOLYGON (((-9686382.157 5638236.966, -973... 0.084746
802 2014.0 Yellow Sea saline meadow 0.517810 POLYGON ((13451648.07 3834357.593, 13303152.21... 0.125000
2015.0 Yellow Sea saline meadow 0.517810 POLYGON ((13451648.07 3834357.593, 13303152.21... 0.233333
2018.0 Yellow Sea saline meadow 0.517810 POLYGON ((13451648.07 3834357.593, 13303152.21... 0.038462
2023.0 Yellow Sea saline meadow 0.517810 POLYGON ((13451648.07 3834357.593, 13303152.21... 0.032520

140 rows × 4 columns

Static Plot

Code
from landmapyr.plots import plot_occurrence
plot_occurrence(occurrence_gdf, 'year')
Figure 3

Optional Dynamic Plot

Code
from landmapyr.hv_plots import hvplot_occurrence
occurrence_hvplot = hvplot_occurrence(occurrence_gdf, 'year')
# Save the plot
occurrence_hvplot.save('siberian-crane-years.html', embed=True)
  0%|          | 0/48 [00:00<?, ?it/s]  8%|▊         | 4/48 [00:00<00:01, 39.88it/s] 19%|█▉        | 9/48 [00:00<00:00, 42.88it/s] 29%|██▉       | 14/48 [00:00<00:00, 43.86it/s] 40%|███▉      | 19/48 [00:00<00:00, 45.43it/s] 52%|█████▏    | 25/48 [00:00<00:00, 47.46it/s] 62%|██████▎   | 30/48 [00:00<00:00, 48.24it/s] 73%|███████▎  | 35/48 [00:00<00:00, 47.00it/s] 83%|████████▎ | 40/48 [00:00<00:00, 47.09it/s] 96%|█████████▌| 46/48 [00:00<00:00, 48.25it/s]                                               
WARNING:W-1005 (FIXED_SIZING_MODE): 'fixed' sizing mode requires width and height to be set: figure(id='886de1ba-7a8b-4f14-b823-837eb07d6c2b', ...)
Code
occurrence_hvplot