Code
conda list pygbifFor this challenge, you will use a database called the Global Biodiversity Information Facility (GBIF). GBIF is compiled from species observation data all over the world, and includes everything from museum specimens to photos taken by citizen scientists in their backyards.
Explore GBIF: Before your get started, go to the GBIF occurrences search page and explore the data.
See also:
Contribute to open data
You can get your own observations added to GBIF using iNaturalist!
We will be getting data from a source called GBIF (Global Biodiversity Information Facility). We need a package called pygbif to access the data, which may not be included in your environment. Install it by running the cell below:
conda list pygbif%pip install -q -e ..from landmapyr.initial import create_data_dir, robust_codefrom landmapyr.gbif import gbif_credentials, gbif_species_key
from landmapyr.gbif import download_gbif, load_gbif, gbif_monthly
from landmapyr.gbif import ecoregions, join_ecoregions_monthly
from landmapyr.gbif import count_by_ecoregions
from landmapyr.gbif import simplify_ecoregions_gdf, join_occurrenceImport packages: In the imports cell, we’ve included some packages that you will need. Add imports for packages that will help you:
For now, run gbif.py. Soon, incorporate it into landmapyr package.
robust_code()
data_dir = create_data_dir('species')
gbif_dir = create_data_dir('species/gbif_sandhill')
gbif_dir'/Users/brianyandell/earth-analytics/data/species/gbif_sandhill'
You will need a GBIF account to complete this challenge. You can use your GitHub account to authenticate with GBIF. Then, run the following code to save your credentials on your computer.
Warning
Your email address must match the email you used to sign up for GBIF!
Tip
If you accidentally enter your credentials wrong, you can set
reset_credentials=Trueinstead ofreset_credentials=False.
gbif_credentials(False)** Your task**
- Replace the
species_namewith the name of the species you want to look up- Run the code to get the species key
species_name, species_key = gbif_species_key('grus canadensis')
species_name, species_key('Antigone canadensis', 2474953)
gbif_path = download_gbif(gbif_dir, species_key)
gbif_path'/Users/brianyandell/earth-analytics/data/species/gbif_sandhill/0012336-260423192947929.zip'
download key is 0020917-241007104925546 GBIF.org (17 October 2024) GBIF Occurrence Download https://doi.org/10.15468/dl.4d3k48
Load GBIF data:
delimiter is?pd.read_csv() below until your data loads successfully and you have only the columns you want.You can use the following code to look at the beginning of your file:
I copied from Lauren Alexandra and Lauren Gleason
gbif_df = load_gbif(gbif_path)
gbif_df.head()| countryCode | stateProvince | decimalLatitude | decimalLongitude | month | year | |
|---|---|---|---|---|---|---|
| gbifID | ||||||
| 4103735033 | US | NaN | 40.6437 | -98.8950 | 3 | 2023 |
| 4953718975 | US | NaN | 31.5633 | -109.7160 | 11 | 2023 |
| 4135517029 | US | NaN | 42.3291 | -84.2403 | 5 | 2023 |
| 4953718976 | US | NaN | 32.0840 | -109.0510 | 1 | 2023 |
| 4953718225 | US | NaN | 32.0840 | -109.0510 | 3 | 2023 |
ac_CA = gbif_df.loc[gbif_df['countryCode'] == 'CA']
ac_CA.value_counts()countryCode stateProvince decimalLatitude decimalLongitude month year
CA Ontario 41.955400 -82.514000 5 2023 395
42.038967 -82.509125 5 2023 217
British Columbia 49.100000 -123.185000 1 2023 183
Ontario 44.350597 -79.883620 4 2023 168
British Columbia 49.100000 -123.185000 12 2023 159
...
Ontario 43.349340 -80.209730 5 2023 1
43.349964 -80.375656 4 2023 1
43.353832 -79.860730 4 2023 1
43.354767 -81.499330 4 2023 1
Yukon Territory 69.596910 -140.185060 6 2023 1
Name: count, Length: 15708, dtype: int64
ac_US = gbif_df.loc[gbif_df['countryCode'] == 'US']
ac_US.value_counts()countryCode stateProvince decimalLatitude decimalLongitude month year
US Ohio 41.627710 -83.191890 5 2023 1576
41.645070 -83.263720 5 2023 505
Wisconsin 43.033360 -89.351380 4 2023 336
Arizona 31.561499 -109.720020 1 2023 321
Ohio 41.626520 -83.188970 5 2023 312
...
Michigan 42.253153 -85.857013 8 2023 1
42.253130 -85.988390 5 2023 1
42.253110 -83.695220 9 2023 1
42.253010 -85.302475 3 2023 1
Wyoming (WY) 44.610549 -110.220852 7 2023 1
Name: count, Length: 103892, dtype: int64
monthly_gdf = gbif_monthly(gbif_df)
monthly_gdf| year | month | geometry | |
|---|---|---|---|
| gbifID | |||
| 4103735033 | 2023 | 3 | POINT (-98.895 40.6437) |
| 4953718975 | 2023 | 11 | POINT (-109.716 31.5633) |
| 4135517029 | 2023 | 5 | POINT (-84.2403 42.3291) |
| 4953718976 | 2023 | 1 | POINT (-109.051 32.084) |
| 4953718225 | 2023 | 3 | POINT (-109.051 32.084) |
| ... | ... | ... | ... |
| 4159135015 | 2023 | 5 | POINT (-82.54347 41.98161) |
| 4884744706 | 2023 | 7 | POINT (-86.14826 42.53785) |
| 4159295666 | 2023 | 5 | POINT (-80.4592 42.5761) |
| 4408876179 | 2023 | 7 | POINT (-151.51888 59.63738) |
| 4409335777 | 2023 | 8 | POINT (-151.52234 59.64154) |
313542 rows × 3 columns
Ecoregions represent boundaries formed by biotic and abiotic conditions: geology, landforms, soils, vegetation, land use, wildlife, climate, and hydrology.
ecoregions_gdf = ecoregions(data_dir)
ecoregions_gdf.plot(edgecolor='black', color='skyblue')%%bash
find ~/earth-analytics/data/species -name '*.shp'%store ecoregions_gdf monthly_gdfStored 'ecoregions_gdf' (GeoDataFrame)
Stored 'monthly_gdf' (GeoDataFrame)
Identify the ecoregion for each observation
gbif_ecoregion_gdf = join_ecoregions_monthly(ecoregions_gdf, monthly_gdf)
gbif_ecoregion_gdf| year | month | name | |
|---|---|---|---|
| ecoregion | |||
| 4 | 2023 | 8 | Ahklun and Kilbuck Upland Tundra |
| 4 | 2023 | 7 | Ahklun and Kilbuck Upland Tundra |
| 4 | 2023 | 7 | Ahklun and Kilbuck Upland Tundra |
| 4 | 2023 | 7 | Ahklun and Kilbuck Upland Tundra |
| 4 | 2023 | 7 | Ahklun and Kilbuck Upland Tundra |
| ... | ... | ... | ... |
| 833 | 2023 | 4 | Northern Rockies conifer forests |
| 833 | 2023 | 5 | Northern Rockies conifer forests |
| 833 | 2023 | 5 | Northern Rockies conifer forests |
| 833 | 2023 | 6 | Northern Rockies conifer forests |
| 833 | 2023 | 5 | Northern Rockies conifer forests |
307693 rows × 3 columns
Count the observations in each ecoregion each month
occurrence_df = count_by_ecoregions(gbif_ecoregion_gdf, 'ecoregion', 'name', 'month')
occurrence_df| occurrences | norm_occurrences | ||
|---|---|---|---|
| ecoregion | month | ||
| 4 | 7 | 5 | 0.004427 |
| 9 | 5 | 3 | 0.000745 |
| 6 | 2 | 0.001061 | |
| 8 | 8 | 0.004741 | |
| 9 | 13 | 0.007170 | |
| ... | ... | ... | ... |
| 833 | 7 | 169 | 0.004581 |
| 8 | 173 | 0.004080 | |
| 9 | 131 | 0.002875 | |
| 10 | 95 | 0.001874 | |
| 11 | 25 | 0.000438 |
788 rows × 2 columns
# plot to check distrubions
occurrence_df.reset_index().plot.scatter(
x='month', y='norm_occurrences', c='ecoregion',
logy=True
)Create a simplified GeoDataFrame for plot
ecoregions_gdf = simplify_ecoregions_gdf(ecoregions_gdf)
ecoregions_gdf| name | area | geometry | |
|---|---|---|---|
| ecoregion | |||
| 0 | Adelie Land tundra | 0.038948 | MULTIPOLYGON EMPTY |
| 1 | Admiralty Islands lowland rain forests | 0.170599 | POLYGON ((16411777.375 -229101.376, 16384825.7... |
| 2 | Aegean and Western Turkey sclerophyllous and m... | 13.844952 | MULTIPOLYGON (((3391149.749 4336064.109, 33846... |
| 3 | Afghan Mountains semi-desert | 1.355536 | MULTIPOLYGON (((7369001.698 4093509.259, 73168... |
| 4 | Ahklun and Kilbuck Upland Tundra | 8.196573 | MULTIPOLYGON (((-17930832.005 8046779.358, -17... |
| ... | ... | ... | ... |
| 842 | Sulawesi lowland rain forests | 9.422097 | MULTIPOLYGON (((14113374.546 501721.962, 14128... |
| 843 | East African montane forests | 5.010930 | MULTIPOLYGON (((4298787.669 -137583.786, 42727... |
| 844 | Eastern Arc forests | 0.890325 | MULTIPOLYGON (((4267432.68 -493759.165, 428533... |
| 845 | Borneo montane rain forests | 9.358407 | MULTIPOLYGON (((13126956.393 539092.917, 13136... |
| 846 | Kinabalu montane alpine meadows | 0.352694 | POLYGON ((12981819.186 696445.445, 12997053.80... |
847 rows × 3 columns
%store gbif_path
%whoStored 'gbif_path' (str)
ac_CA ac_US count_by_ecoregions create_data_dir data_dir download_gbif ecoregions ecoregions_gdf gbif_credentials
gbif_df gbif_dir gbif_ecoregion_gdf gbif_monthly gbif_path gbif_species_key join_ecoregions_monthly join_occurrence load_gbif
monthly_gdf occurrence_df ojs_define robust_code simplify_ecoregions_gdf species_key species_name
Mapping monthly distribution
occurrence_gdf = join_occurrence(ecoregions_gdf, occurrence_df)
occurrence_gdf| name | area | geometry | norm_occurrences | ||
|---|---|---|---|---|---|
| ecoregion | month | ||||
| 4 | 7 | Ahklun and Kilbuck Upland Tundra | 8.196573 | MULTIPOLYGON (((-17930832.005 8046779.358, -17... | 0.004427 |
| 9 | 5 | Alaska-St. Elias Range tundra | 28.388010 | MULTIPOLYGON (((-16886232.729 9049093.235, -16... | 0.000745 |
| 6 | Alaska-St. Elias Range tundra | 28.388010 | MULTIPOLYGON (((-16886232.729 9049093.235, -16... | 0.001061 | |
| 8 | Alaska-St. Elias Range tundra | 28.388010 | MULTIPOLYGON (((-16886232.729 9049093.235, -16... | 0.004741 | |
| 9 | Alaska-St. Elias Range tundra | 28.388010 | MULTIPOLYGON (((-16886232.729 9049093.235, -16... | 0.007170 | |
| ... | ... | ... | ... | ... | ... |
| 833 | 7 | Northern Rockies conifer forests | 35.905513 | POLYGON ((-13358313.218 7236575.932, -13331349... | 0.004581 |
| 8 | Northern Rockies conifer forests | 35.905513 | POLYGON ((-13358313.218 7236575.932, -13331349... | 0.004080 | |
| 9 | Northern Rockies conifer forests | 35.905513 | POLYGON ((-13358313.218 7236575.932, -13331349... | 0.002875 | |
| 10 | Northern Rockies conifer forests | 35.905513 | POLYGON ((-13358313.218 7236575.932, -13331349... | 0.001874 | |
| 11 | Northern Rockies conifer forests | 35.905513 | POLYGON ((-13358313.218 7236575.932, -13331349... | 0.000438 |
788 rows × 4 columns
%store occurrence_gdfStored 'occurrence_gdf' (GeoDataFrame)
from landmapyr.plots import plot_occurrence
plot_occurrence(occurrence_gdf)from landmapyr.hv_plots import hvplot_occurrence
occurrence_hvplot = hvplot_occurrence(occurrence_gdf)
# Save the plot
occurrence_hvplot.save('sandhill-crane-migration.html', embed=True) 0%| | 0/12 [00:00<?, ?it/s] 17%|█▋ | 2/12 [00:00<00:00, 14.02it/s] 33%|███▎ | 4/12 [00:00<00:00, 11.88it/s] 50%|█████ | 6/12 [00:00<00:00, 10.64it/s] 67%|██████▋ | 8/12 [00:00<00:00, 10.37it/s] 83%|████████▎ | 10/12 [00:00<00:00, 11.24it/s]100%|██████████| 12/12 [00:01<00:00, 12.49it/s]
WARNING:W-1005 (FIXED_SIZING_MODE): 'fixed' sizing mode requires width and height to be set: figure(id='d086bcb4-198f-4b50-aa52-af1d87f6dd8f', ...)
occurrence_hvplotoccurrence_gdf_complete = occurrence_gdf.reset_index()
april_occ = occurrence_gdf_complete.loc[occurrence_gdf_complete['month'] == 4].sort_values(by=['norm_occurrences'], ascending=False)
april_occ_top_5 = april_occ[0:5]
april_occ_bottom_5 = april_occ[-5:]# Top Five Ecoregions
april_occ_top_5| ecoregion | month | name | area | geometry | norm_occurrences | |
|---|---|---|---|---|---|---|
| 115 | 81 | 4 | British Columbia coastal conifer forests | 14.653986 | MULTIPOLYGON (((-14364688.43 7420408.623, -143... | 0.007219 |
| 225 | 149 | 4 | Central Tallgrass prairie | 36.779324 | POLYGON ((-10534926.556 5619565.277, -10517878... | 0.005960 |
| 557 | 546 | 4 | Palouse prairie | 9.866972 | MULTIPOLYGON (((-12951912.056 5827151.995, -12... | 0.005675 |
| 257 | 173 | 4 | Colorado Rockies forests | 15.113154 | MULTIPOLYGON (((-12173003.318 4534115.934, -12... | 0.005394 |
| 487 | 471 | 4 | New England-Acadian forests | 38.509900 | MULTIPOLYGON (((-7182650.847 5741141.666, -715... | 0.004976 |
# Bottom Five Ecoregions
april_occ_bottom_5| ecoregion | month | name | area | geometry | norm_occurrences | |
|---|---|---|---|---|---|---|
| 611 | 639 | 4 | Sonoran desert | 21.416224 | MULTIPOLYGON (((-12499491.62 3383569.444, -124... | 0.000087 |
| 136 | 88 | 4 | California coastal sage and chaparral | 3.172258 | MULTIPOLYGON (((-12820829.454 3243992.707, -12... | 0.000087 |
| 237 | 162 | 4 | Chihuahuan desert | 46.807295 | MULTIPOLYGON (((-12343440.455 3790837.437, -12... | 0.000079 |
| 711 | 783 | 4 | Western Gulf coastal grasslands | 8.340400 | POLYGON ((-10826974.582 3185079.852, -10843709... | 0.000029 |
| 127 | 87 | 4 | California Central Valley grasslands | 4.727694 | POLYGON ((-13595834.408 4868653.384, -13554815... | 0.000023 |