Code
conda list pygbifFor this challenge, you will use a database called the Global Biodiversity Information Facility (GBIF). GBIF is compiled from species observation data all over the world, and includes everything from museum specimens to photos taken by citizen scientists in their backyards.
Explore GBIF: Before your get started, go to the GBIF occurrences search page and explore the data.
See also:
Contribute to open data
You can get your own observations added to GBIF using iNaturalist!
We will be getting data from a source called GBIF (Global Biodiversity Information Facility). We need a package called pygbif to access the data, which may not be included in your environment. Install it by running the cell below:
conda list pygbif%pip install -q -e ..from landmapyr.initial import create_data_dir, robust_code
from landmapyr.gbif import gbif_credentials, gbif_species_key
from landmapyr.gbif import download_gbif, load_gbif, gbif_monthly
from landmapyr.gbif import ecoregions, join_ecoregions_monthly
from landmapyr.gbif import count_by_ecoregions
from landmapyr.gbif import simplify_ecoregions_gdf, join_occurrenceImport packages: In the imports cell, we’ve included some packages that you will need. Add imports for packages that will help you:
robust_code()
data_dir = create_data_dir('species')
gbif_dir = create_data_dir('species/gbif_siberian')
gbif_dir'/Users/brianyandell/earth-analytics/data/species/gbif_siberian'
You will need a GBIF account to complete this challenge. You can use your GitHub account to authenticate with GBIF. Then, run the following code to save your credentials on your computer.
Warning
Your email address must match the email you used to sign up for GBIF!
Tip
If you accidentally enter your credentials wrong, you can set
reset_credentials=Trueinstead ofreset_credentials=False. Look to top of screen for entry of credentials.
gbif_credentials(False)** Your task**
- Replace the
species_namewith the name of the species you want to look up- Run the code to get the species key
species_name, species_key = gbif_species_key('grus leucogeranus')
species_name, species_key('Grus leucogeranus', 2474961)
gbif_path = download_gbif(gbif_dir, species_key, year=None)
gbif_path'/Users/brianyandell/earth-analytics/data/species/gbif_siberian/0001177-250227182400228.zip'
INFO:Your download key is 0001177-250227182400228
INFO:Download file size: 171492 bytes
INFO:On disk at /Users/brianyandell/earth-analytics/data/species/gbif_siberian/0001177-250227182400228.zip
Load GBIF data:
delimiter is?pd.read_csv() below until your data loads successfully and you have only the columns you want.You can use the following code to look at the beginning of your file:
I copied from Lauren Alexandra and Lauren Gleason
gbif_df = load_gbif(gbif_path)
print(gbif_df.head()) countryCode stateProvince decimalLatitude decimalLongitude month \
gbifID
985829831 IN Rajasthan 27.161905 77.522800 2.0
979229641 CN Jiangxi 28.870571 116.433170 11.0
978902062 IR Mazandaran 36.667110 52.550186 11.0
978782158 IN Rajasthan 27.161905 77.522800 1.0
977810003 IN Rajasthan 27.161905 77.522800 1.0
year
gbifID
985829831 1991.0
979229641 1988.0
978902062 2011.0
978782158 1991.0
977810003 1992.0
monthly_gdf = gbif_monthly(gbif_df)
monthly_gdf| year | month | geometry | |
|---|---|---|---|
| gbifID | |||
| 985829831 | 1991.0 | 2.0 | POINT (77.5228 27.1619) |
| 979229641 | 1988.0 | 11.0 | POINT (116.43317 28.87057) |
| 978902062 | 2011.0 | 11.0 | POINT (52.55019 36.66711) |
| 978782158 | 1991.0 | 1.0 | POINT (77.5228 27.1619) |
| 977810003 | 1992.0 | 1.0 | POINT (77.5228 27.1619) |
| ... | ... | ... | ... |
| 1019036144 | 1983.0 | 6.0 | POINT (-90 43.75) |
| 1019036117 | 1983.0 | 6.0 | POINT (-90 43.75) |
| 1019036092 | 1983.0 | 6.0 | POINT (-90 43.75) |
| 1019036069 | 1983.0 | 6.0 | POINT (-90 43.75) |
| 1019035937 | 1983.0 | 6.0 | POINT (-90 43.75) |
2936 rows × 3 columns
Ecoregions represent boundaries formed by biotic and abiotic conditions: geology, landforms, soils, vegetation, land use, wildlife, climate, and hydrology.
ecoregions_gdf = ecoregions(data_dir)
ecoregions_gdf.plot(edgecolor='black', color='skyblue')%%bash
find ~/earth-analytics/data/species -name '*.shp'%store ecoregions_gdf monthly_gdfStored 'ecoregions_gdf' (GeoDataFrame)
Stored 'monthly_gdf' (GeoDataFrame)
Identify the ecoregion for each observation
gbif_ecoregion_gdf = join_ecoregions_monthly(ecoregions_gdf, monthly_gdf)
gbif_ecoregion_gdf| year | month | name | |
|---|---|---|---|
| ecoregion | |||
| 5 | 2015.0 | 3.0 | Al-Hajar foothill xeric woodlands and shrublands |
| 5 | 2015.0 | 3.0 | Al-Hajar foothill xeric woodlands and shrublands |
| 5 | 2014.0 | 7.0 | Al-Hajar foothill xeric woodlands and shrublands |
| 5 | 2017.0 | 12.0 | Al-Hajar foothill xeric woodlands and shrublands |
| 8 | NaN | NaN | Alashan Plateau semi-desert |
| ... | ... | ... | ... |
| 802 | 2023.0 | 1.0 | Yellow Sea saline meadow |
| 802 | 2018.0 | 1.0 | Yellow Sea saline meadow |
| 802 | 2015.0 | 2.0 | Yellow Sea saline meadow |
| 802 | 2018.0 | 1.0 | Yellow Sea saline meadow |
| 802 | 2015.0 | 1.0 | Yellow Sea saline meadow |
2269 rows × 3 columns
Count the observations in each ecoregion each year and month
occurrence_month_df = count_by_ecoregions(gbif_ecoregion_gdf,
'ecoregion', 'name', 'month')
occurrence_month_df| occurrences | norm_occurrences | ||
|---|---|---|---|
| ecoregion | month | ||
| 5 | 3.0 | 2 | 0.098214 |
| 24 | 5.0 | 6 | 0.156250 |
| 9.0 | 2 | 0.142857 | |
| 53 | 3.0 | 9 | 0.098214 |
| 74 | 1.0 | 3 | 0.016181 |
| ... | ... | ... | ... |
| 758 | 5.0 | 20 | 0.132275 |
| 6.0 | 16 | 0.066253 | |
| 802 | 1.0 | 4 | 0.021575 |
| 2.0 | 3 | 0.025840 | |
| 12.0 | 2 | 0.013605 |
78 rows × 2 columns
occurrence_year_df = count_by_ecoregions(gbif_ecoregion_gdf,
'ecoregion', 'name', 'year')
occurrence_year_df| occurrences | norm_occurrences | ||
|---|---|---|---|
| ecoregion | year | ||
| 5 | 2015.0 | 2 | 0.194444 |
| 24 | 2014.0 | 2 | 0.156250 |
| 2017.0 | 2 | 0.065868 | |
| 2024.0 | 2 | 0.092593 | |
| 53 | 2020.0 | 4 | 0.059259 |
| ... | ... | ... | ... |
| 758 | 1996.0 | 3 | 0.084746 |
| 802 | 2014.0 | 2 | 0.125000 |
| 2015.0 | 3 | 0.233333 | |
| 2018.0 | 3 | 0.038462 | |
| 2023.0 | 2 | 0.032520 |
140 rows × 2 columns
# plot to check distrubions
occurrence_year_df.reset_index().plot.scatter(
x='year', y='occurrences', c='ecoregion',
logy=True
)Create a simplified GeoDataFrame for plot
ecoregions_gdf = simplify_ecoregions_gdf(ecoregions_gdf)
ecoregions_gdf| name | area | geometry | |
|---|---|---|---|
| ecoregion | |||
| 0 | Adelie Land tundra | 0.038948 | MULTIPOLYGON EMPTY |
| 1 | Admiralty Islands lowland rain forests | 0.170599 | POLYGON ((16411777.375 -229101.376, 16384825.7... |
| 2 | Aegean and Western Turkey sclerophyllous and m... | 13.844952 | MULTIPOLYGON (((3391149.749 4336064.109, 33846... |
| 3 | Afghan Mountains semi-desert | 1.355536 | MULTIPOLYGON (((7369001.698 4093509.259, 73168... |
| 4 | Ahklun and Kilbuck Upland Tundra | 8.196573 | MULTIPOLYGON (((-17930832.005 8046779.358, -17... |
| ... | ... | ... | ... |
| 842 | Sulawesi lowland rain forests | 9.422097 | MULTIPOLYGON (((14113374.546 501721.962, 14128... |
| 843 | East African montane forests | 5.010930 | MULTIPOLYGON (((4298787.669 -137583.786, 42727... |
| 844 | Eastern Arc forests | 0.890325 | MULTIPOLYGON (((4267432.68 -493759.165, 428533... |
| 845 | Borneo montane rain forests | 9.358407 | MULTIPOLYGON (((13126956.393 539092.917, 13136... |
| 846 | Kinabalu montane alpine meadows | 0.352694 | POLYGON ((12981819.186 696445.445, 12997053.80... |
847 rows × 3 columns
occurrence_gdf = join_occurrence(ecoregions_gdf, occurrence_year_df)
occurrence_gdf| name | area | geometry | norm_occurrences | ||
|---|---|---|---|---|---|
| ecoregion | year | ||||
| 5 | 2015.0 | Al-Hajar foothill xeric woodlands and shrublands | 4.099668 | POLYGON ((6264504.021 2842331.306, 6336024.085... | 0.194444 |
| 24 | 2014.0 | Amur meadow steppe | 15.118769 | MULTIPOLYGON (((15067649.194 6001589.024, 1503... | 0.156250 |
| 2017.0 | Amur meadow steppe | 15.118769 | MULTIPOLYGON (((15067649.194 6001589.024, 1503... | 0.065868 | |
| 2024.0 | Amur meadow steppe | 15.118769 | MULTIPOLYGON (((15067649.194 6001589.024, 1503... | 0.092593 | |
| 53 | 2020.0 | Azerbaijan shrub desert and steppe | 6.794797 | POLYGON ((5427403.54 5089371.081, 5512543.361 ... | 0.059259 |
| ... | ... | ... | ... | ... | ... |
| 758 | 1996.0 | Upper Midwest US forest-savanna transition | 15.481685 | MULTIPOLYGON (((-9686382.157 5638236.966, -973... | 0.084746 |
| 802 | 2014.0 | Yellow Sea saline meadow | 0.517810 | POLYGON ((13451648.07 3834357.593, 13303152.21... | 0.125000 |
| 2015.0 | Yellow Sea saline meadow | 0.517810 | POLYGON ((13451648.07 3834357.593, 13303152.21... | 0.233333 | |
| 2018.0 | Yellow Sea saline meadow | 0.517810 | POLYGON ((13451648.07 3834357.593, 13303152.21... | 0.038462 | |
| 2023.0 | Yellow Sea saline meadow | 0.517810 | POLYGON ((13451648.07 3834357.593, 13303152.21... | 0.032520 |
140 rows × 4 columns
from landmapyr.plots import plot_occurrence
plot_occurrence(occurrence_gdf, 'year')from landmapyr.hv_plots import hvplot_occurrence
occurrence_hvplot = hvplot_occurrence(occurrence_gdf, 'year')
# Save the plot
occurrence_hvplot.save('siberian-crane-years.html', embed=True) 0%| | 0/48 [00:00<?, ?it/s] 8%|▊ | 4/48 [00:00<00:01, 39.88it/s] 19%|█▉ | 9/48 [00:00<00:00, 42.88it/s] 29%|██▉ | 14/48 [00:00<00:00, 43.86it/s] 40%|███▉ | 19/48 [00:00<00:00, 45.43it/s] 52%|█████▏ | 25/48 [00:00<00:00, 47.46it/s] 62%|██████▎ | 30/48 [00:00<00:00, 48.24it/s] 73%|███████▎ | 35/48 [00:00<00:00, 47.00it/s] 83%|████████▎ | 40/48 [00:00<00:00, 47.09it/s] 96%|█████████▌| 46/48 [00:00<00:00, 48.25it/s]
WARNING:W-1005 (FIXED_SIZING_MODE): 'fixed' sizing mode requires width and height to be set: figure(id='886de1ba-7a8b-4f14-b823-837eb07d6c2b', ...)
occurrence_hvplot