Understanding Climate Change Using Data Science
  • Home
  • Research
  • AI Literacy
  • Twitter
  • Facebook
  1. 2024 Climate Dashboard
  2. 9  Arctic and Antarctic Ice

  • FRONT MATTER
    • Preface
    • About the Book
    • About the Author

  • Programming and Visualization Primer
    • 1  Setup and Installation
    • 2  Python Primer
    • 3  Pandas

  • 2024 Climate Dashboard
    • 4  Introduction
    • 5  U.S. and Global Temperatures
    • 6  Seasonal Temperature
    • 7  High and Low Temperatures
    • 8  Temperature and Heat
    • 9  Arctic and Antarctic Ice
    • 10  Oceans
    • 11  Sea Level Rise (SLR)
    • 12  Part 1: Conclusion

  • Are We Responsible? Anthropocene Effect?
    • 13  About Part-2
    • 14  Greenhouse Gas Emissions

  • Anthropocene? Why Should We Care About Climate Change?
    • 15  About Part-3
    • 16  NOAA Climate Indicators: Droughts
    • 17  Disaster Declarations by FEMA

  • What Can We Do? Personal Action, Mitigation and Resilience

Table of contents

  • 9.0.1 Extent of Sea Ice Data
  • 9.1 South Data:
    • 9.1.1 Annual Data Analysis
  • 9.2 Ice and Snow Indicators from EPA
    • 9.2.1 Arctic Ice
    • 9.2.2 Antarctic Ice
    • 9.2.3 Ice Sheets
    • 9.2.4 Glaciers
    • 9.2.5 Why do we care about Glaciers?
  • 9.3 Arctic Glaciers
  • 9.4 Glaciers in Glacier National Park
  • 9.5 Great Lakes
  • 9.6 Summary
    • 9.6.1 Some Python concepts that we learned in this Chapter
    • 9.6.2 Some Climate Insights that we learned in this Chapter
  • 9.7 Exercises / Check your knowledge of Python
  • 9.8 Data Sources
  • Edit this page
  • Report an issue
  • View source
  1. 2024 Climate Dashboard
  2. 9  Arctic and Antarctic Ice

9  Arctic and Antarctic Ice

_DALL·E 3 Prompt: Images depicting the Arctic landscape with ice melting, illustrating the effects of climate change

DalleE

(the Dalle prompt that we used to create the above image!) DALL·E 2024-02-17 23.49.12 - Create a moving and impactful scene that starkly illustrates the devastating effects of climate change on the Arctic landscape, focusing on stranded a polar bear

In this chapter, let us try to understand how Ice over the Arctic Circle is changing over time.

But first, why is Arctic Ice important and why should we care about it?

According to the EPA,

“The extent of area covered by Arctic sea ice is an important indicator of changes in global climate because warmer air and water temperatures are reducing the amount of sea ice present. Because sea ice is light-colored, it reflects more sunlight (solar energy) back to space than liquid water, thereby playing an important role in maintaining the Earth’s energy balance and helping to keep polar regions cool.”

Learning Objectives: Climate

The learning objectives are

  1. Understand how the extent of arctic ice is changing over time

  2. Is the extent of Arctic Ice increasing over the years?

  3. Is the extent of Arctic Ice increasing or decreasing in the winter months?

  4. What months are there usually an increase in Arctic Ice and what months are there a decrease in Arctic Ice?

Climate Terms of Interest
  1. Arctic Ice
  2. Sea ice
  3. Polar regions

Let us look at some images first, the extent of Ice at Arctic and Antarctic regions

Arctic Ice Dec 1979

Arctic Ice Dec 2022
Figure 9.1: Arctic Ice in Dec 1978 vs Dec 2022

While it is not very clear from the pictures, it seems that sea ice extent in Dec 2022 (11.9 million square km) is 1.8 million square km less than Dec 1979 (13.7 million square km).

Antarctic Ice Dec 1978

Antarctic Ice Dec 2022
Figure 9.2: Arctic Ice in Dec 1978 vs Dec 2022

Similarly, the sea ice extent in December 2022 (8.8 million square km) is 1.6 million sq km lower than that in December 1978 (10.4 million square km).

We chose December 1978 and December 2022 as they are the earliest and latest data available as of our writing this chapter.

It seems that the total reduction in the extent of sea ice from 1978-2022 is 3.4 million square km. Is it small or large?

  • One way to put it in context that we understand is the size of landmass of the United States is 9.15 million square kilometers (source: World Bank). Put it another way,
  • the loss of sea ice accounts for more than a third of the U.S. landmass or three Texas and three Californias put together!
  • The loss is even bigger than India, the country with the seventh largest land area at 2.97 million square km!

You can find more fun facts on Glaciers

Let us a dig a bit deeper in to the actual data for the extent of sea ice in the Arctic and Antarctic regions.

9.0.1 Extent of Sea Ice Data

We can get the data on millions of square kilometers of ice in the Arctic circle for a given month and year.

The north directory contains all of the Arctic data and images and the south directory contains all of the Antarctic data

The data is from NOAA.

This is how the directories looks like

Index of Arctic NOAA files

Index of Antarctic NOAA files
Figure 9.3: NOAA directory structure of Arctic and Antarctic monthly files

From looking at the list of files in the directory, we can observe

  1. the data is updated as of December 2023.
  2. there is a file for each month.

However, the format of all the files are the same. It is always a good idea to open the file in the browser or excel and check out how the data and variables are displayed. It is also a good idea to read the notes about the data as the notes can contain valuable information about the variables and how they are constructed, if there are any issues with the variables, sources of the data etc.

Let us write the code to get all the datasets programmatically i.e. instead of downloading the data and saving them on our own computer (which may not be the latest available data), we can run the code and get the latest version of the data available as of the date we run the code.

import plotly.express as px 
import pandas as pd
import os
import io
import zipfile

# we need to take care of unverified https error. either install certificates command on mac or
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

# let us get the data for all the years since 1990
# The ASCII data files tabulate extent and area (in millions of square kilometers) by year for a given month

# create an empty data frame
df_north = pd.DataFrame()

# Months that are between 1 and 12, we put 13 because it always ends at one less than the end number 
for month in range(1,13): 

# If the month is between 1 and 9 add 0 to the number. For example, 1 would now be 01. 
    if month <10: 
        month = '0' + str(month) 
    
    url = f"https://masie_web.apps.nsidc.org/pub/DATASETS/NOAA/G02135/north/monthly/data/N_{month}_extent_v3.0.csv"
    df_north_month = pd.read_csv(url)   

    # let us concatenate the monthly files
 
    # we are adding all the files together because there are seperate files for each month.

    df_north = pd.concat([df_north,df_north_month], axis=0)  

#rename variables
df_north.columns = df_north.columns.str.replace(' ', '')
df_north.rename({'mo': 'month'}, axis = 1, inplace=True)

If you get an error with the code above, you maybe working on Windows or a different OS than the Mac that we were on.

You can try slight modifications to the code as listed below

import plotly.express as px 
import pandas as pd
import os
import io
import zipfile
import urllib ## NEW LINE 1 ##

# we need to take care of unverified https error. either install certificates command on mac or
import ssl
# ssl._create_default_https_context = ssl._create_unverified_context  ## COMMENTED OUT 1 ##
context = ssl._create_unverified_context()  ## NEW LINE 2 ##

# let us get the data for all the years since 1990
# The ASCII data files tabulate extent and area (in millions of square kilometers) by year for a given month

# create an empty data frame
df_north = pd.DataFrame()

# Months that are between 1 and 12, we put 13 because it always ends at one less than the end number 
for month in range(1,13): 

# If the month is between 1 and 9 add 0 to the number. For example, 1 would now be 01. 
    if month <10: 
        month = '0' + str(month) 
    
    url = f"https://masie_web.apps.nsidc.org/pub/DATASETS/NOAA/G02135/north/monthly/data/N_{month}_extent_v3.0.csv"
    #df_north_month = pd.read_csv(url)   ## COMMENTED OUT 2 ##
    temp_data = urllib.request.urlopen(url, context=context) ## NEW LINE 3 ##
    df_north_month = pd.read_table(temp_data, sep=',')  ## NEW LINE 4 ##

    # let us concatenate the monthly files
 
    # we are adding all the files together because there are seperate files for each month.

    df_north = pd.concat([df_north,df_north_month], axis=0)  

#rename variables
df_north.columns = df_north.columns.str.replace(' ', '')
df_north.rename({'mo': 'month'}, axis = 1, inplace=True)

Let us unpack the code step-by-step.

  1. First, in lines 1-5, we imported the various packages that we need including some that we don’t regulalry use such as as os, io and zipfile
  2. We may not need lines 8-9, but sometimes we get a SSL certficate error on a mac from the website. So, we can import ssl and execute the command to address it
  3. On line 15, we created a empty data frame to hold the data
  4. On line 18, we start a for loop to get all the months in the year from 1 to 12
  5. by looking at the directory structure, wenote that we need to add a ‘0’ to the months which are single digits. So, lines 21 and 22 pad the months from 1-9 with a ‘0’ so that we have two-digit months
  6. We define the base URL on line 24. Note that we included {month} variable so that everytime the for loop changes the value of the month, the URL changes, corresponding to the files in the directory
  7. On line 25, we read the CSV file into a dataframe df_north_month to hold the data for that month
  8. Note that we are still in hte loop. We need to add this monthly file to the empty dataset that we created. That is when the value of the month is 1, line 35 adds the data for January to the empty dataset so that df_north dataset now has January data
  9. When the month takes 2, the February data is stacked below the January data (that is now in the df_north dataset) and so forth
  10. On lines 28 and 29, we rename the variables. On line 28, we remove the empty spaces in the columns of the dataset. On line 29, we rename the variable “mo” to “month” so that it is more descriptive

Before we start graphing and answering our questions, let us check if the data is imported and concateneated correctly. It is always a good practice to check the dataset before we start analyzing or graphing it.

nrows, ncols = df_north.shape
print(f'the data set has {nrows} observations and {ncols} of columns or variables')
the data set has 551 observations and 6 of columns or variables

Let us check the variable names and the type of the variables (numbers, strings etc.)

df_north.dtypes
year           int64
month          int64
data-type     object
region        object
extent       float64
area         float64
dtype: object

What are the variables in this dataset? It is always a good idea to check the documentation for any dataset. The documentation for this NOAA dataset is available at Arctic Ice Documentation.

Page 16 of the user documentation mentions, the variables are 1. Year: 4-digit year 2. mo (note, we renamed to month): 1- or 2-digit month 3. data_type: indicates what the input data set. It is either Goddard: Sea Ice Concentrations from Nimbus-7 SMMR and DMSP SSM/I-SSMIS Passive Microwave Data or NRTSI-G: Near-Real-Time DMSP SSMIS Daily Polar Gridded Sea Ice Concentrations 4. region: Hemisphere that this data covers. 1. N: Northern 2. S: Southern 5. extent: Sea ice extent in millions of square km 6. area Sea ice area in millions of square km

As we can see, we have year and month, both are integers as they should be. Other relevant variables are region=“N”, the extent and area of the ice in millions of square miles.

We can describe the data to get an overview of the variables. Note that htis will only give the descriptive statistics of numerical variables in the dataset.

df_north.describe()
year month extent area
count 551.000000 551.000000 551.000000 551.000000
mean 2001.292196 6.493648 -24.996261 -45.239365
std 13.270059 3.455096 602.558501 737.150626
min 1978.000000 1.000000 -9999.000000 -9999.000000
25% 1990.000000 3.500000 8.465000 6.135000
50% 2001.000000 6.000000 12.020000 9.910000
75% 2013.000000 9.000000 14.245000 12.285000
max 2024.000000 12.000000 16.340000 13.900000

The data seems to start in 1978 and continues to 2024 (as can be seen from the min and max for the variable year)

Notice that the minimum for extent is -9999. We can infer that -9999 are cases where there were no observations made. We need to replace this with a missing value. Otherwise, it could mess up our averages and also the graphs.

import numpy as np
df_north = df_north.replace(-9999, np.nan)
df_north.describe() 
year month extent area
count 551.000000 551.000000 549.000000 548.000000
mean 2001.292196 6.493648 11.338907 9.252026
std 13.270059 3.455096 3.279964 3.267431
min 1978.000000 1.000000 3.570000 2.410000
25% 1990.000000 3.500000 8.480000 6.182500
50% 2001.000000 6.000000 12.030000 9.950000
75% 2013.000000 9.000000 14.270000 12.292500
max 2024.000000 12.000000 16.340000 13.900000

Notice that it looks like the variable area has two extra spaces. If we were to graph it and use the variable area we would always have to include the two spaces. To make it easier let us remove the two spaces. If there are also other variables with spaces lets go ahead and remove them.

df_north.columns = df_north.columns.str.replace(' ', '')

To start graphing let us look at the months that have the most and least extent of ice over time.

We can’t use the variable area, because the data for area is not constant whereas the extent is.

# We put the number 5 because we want to see the top 5 
df_north.nlargest(5, 'extent')
year month data-type region extent area
0 1979 3 Goddard N 16.34 13.21
0 1979 2 Goddard N 16.18 13.18
4 1983 3 Goddard N 16.09 12.93
8 1987 2 Goddard N 16.05 13.02
1 1980 3 Goddard N 16.04 12.99

We can immediately notice that, though the data is from 1978-2023, the meximum extent of the ice coverage seems to be in 1979. The latest year which ranks in the top 5 spot for extent of ice is 1987.

We can also see that the month that was the largest extent of ice was 3 (March) or 2 (February). We expect that there would be months when ice accumulates and months when ice decreases in a given year. We are more concerned with how the extent of ice is varying over years.

We can graph the extent of the ice for February and March over the years later. Now let’s check for the smallest:

df_north.nsmallest(5, 'extent')
# We put the number 5 because we want to see the top 5 
year month data-type region extent area
33 2012 9 Goddard N 3.57 2.41
41 2020 9 Goddard N 4.00 2.83
28 2007 9 Goddard N 4.27 2.82
40 2019 9 Goddard N 4.36 3.17
44 2023 9 NRTSI-G N 4.37 2.79

The smallest month is month 9 in all the lowest five spots.

df_north.nsmallest(10, 'extent')
year month data-type region extent area
33 2012 9 Goddard N 3.57 2.41
41 2020 9 Goddard N 4.00 2.83
28 2007 9 Goddard N 4.27 2.82
40 2019 9 Goddard N 4.36 3.17
44 2023 9 NRTSI-G N 4.37 2.79
45 2024 9 NRTSI-G N 4.38 2.89
37 2016 9 Goddard N 4.53 2.91
32 2011 9 Goddard N 4.56 3.21
36 2015 9 Goddard N 4.62 3.42
29 2008 9 Goddard N 4.69 3.26

It seems except in one year, 2012, all the other lowest 11 ice extent in the North Arctic is in September. We can plot this over time also.

Can we say that the extent of ice in the north arctic is lower in significantly smaller than it was in the 80’s? It may be but not from this table. We need to compare apples to apples and compare for the same month over time. Alternately, we can take averages over the years.

Since we now have the information that we need, let’s start graphing.

fig = px.scatter(df_north.loc[df_north.month==3], x='year', y='extent') 
fig.show()
fig.write_image("../../images/ArcticIce/fig1.png")
1980199020002010202014.51515.516
yearextent
plotly-logomark

We saved the image that we generated to a file using fig.write_image. We can display the saved image now.

fig1

Why did we use loc? We used loc because we are specifying that we only want observations where the month has a value of 3. We only want to look at the extent from month 3 over the years so that we can compare the extent of the ice better.

We can see from the scatter plot that the extent of the Arctic Ice, from the month March, is decreasing over the years. The graph starts from the year 1979 and in 1979 the extent of the Arctic Ice was 16.34 million square miles. The graph ends at the year 2023 and the extent is 14.44 million square miles. It seems there is almost 2 million square miles of ice extent lost over the years.

What if we do for the entire year?

fig = px.line(df_north, x='year', y='extent')
fig.show()
19801985199019952000200520102015202046810121416
yearextent
plotly-logomark

We can’t comprehend anything from this graph because the lines are scattered and we can’t understand where it is decreasing and increasing. Let’s specify a month.

fig = px.line(df_north.loc[df_north.month==3], x='year', y='extent', title="Extent of arctic Ice in March")
fig.show()
19801985199019952000200520102015202014.51515.516
Extent of arctic Ice in Marchyearextent
plotly-logomark

Now let’s start plotting Februrary.

fig = px.line(df_north.loc[df_north.month==2], x='year', y='extent', title="Extent of arctic Ice in Feburary")
fig.show()
1980198519901995200020052010201520201414.51515.516
Extent of arctic Ice in Feburaryyearextent
plotly-logomark

We can see that the extent of arctic Ice in February has more steep decreases than the extent of arctic Ice in March. Overall, February and March were the months with the most extent of arctic Ice and it has been slowly decreasing over the years.

Let’s start graphing the months with the smallest extent. We only have to graph for the month September.

fig = px.line(df_north.loc[df_north.month==9], x='year', y='extent', title="Extent of arctic Ice in September")
fig.show()
1980198519901995200020052010201520203.544.555.566.577.5
Extent of arctic Ice in Septemberyearextent
plotly-logomark

September is the month with the smallest extent of arctic Ice. Sadly, it is still significantly decreasing over the years.

Let us check the average for each year and plot it over the years.

df_north.groupby(['year'])['extent'].mean().head(5)
year
1978    12.660000
1979    12.350000
1980    12.348333
1981    12.146667
1982    12.467500
Name: extent, dtype: float64

We can see the averages for each year. We see a general decline but appears that in 2023 there is highest amount of the ice extent? Is this correct? What are we missing? Notice that we are still in July and the data stops in June. The highest months for the ice extent are Feb and March and the lowest are in the second half. We don’t have data yet. So it is better to keep only years with full data for graphing purposes.

df_year = df_north.groupby('year').agg({'extent': ['mean']}).reset_index()

# we need to remove the multiindex after the group by.
# see https://stackoverflow.com/questions/44023770/pandas-getting-rid-of-the-multiindex
df_year.columns = df_year.columns.map(''.join)

we can see that the mean values are stored in a column “extentmean” that we can use

fig = px.line(df_year, x='year', y='extentmean', title="Annual Mean Extent of arctic Ice (N)")
fig.show()
19801985199019952000200520102015202010.51111.51212.5
Annual Mean Extent of arctic Ice (N)yearextentmean
plotly-logomark

From the graph, the sharp uptick in 2023 reminds us that we only have partial year data for 2023. We need to remove this for the annual average graph:

fig = px.line(df_year[df_year['year']!=2024], x='year', y='extentmean', title="Annual Mean Extent of arctic Ice (N) 1979-2023")
fig.show()
19801985199019952000200520102015202010.51111.51212.5
Annual Mean Extent of arctic Ice (N) 1979-2023yearextentmean
plotly-logomark

We can clearly see a decrease in the extent of arctic ice over the last thirty years. The annual mean for 1978 is 12.66 million square miles (btw, we may need to drop 1978 also. We also need to check whether we have full data for 1978.) and the average for 2022 is 10.69 million square miles. A decrease of almost 2 million square miles or (10.69−12.66)/12.66, a 15.56% decrease. While this is a cause for concern and may be due to climate change, we need to be cautious in drawing conclusions as the data is only for three decades. However, there are measurements over longer time period, based on Antarctic ice core, that suggests a cause for alarm.

We can also use similar code to compute the min and maximum over the years and plot them together.

df_year = df_north.groupby('year').agg({'extent': ['mean', 'min', 'max']}).reset_index()
df_year.columns = df_year.columns.map(''.join)
fig = px.line(df_year.loc[df_year.year!=2024], x='year', y=['extentmean', 'extentmin', 'extentmax'], title="Extent of arctic Ice in September")
fig.show()
1980199020002010202046810121416
variableextentmeanextentminextentmaxExtent of arctic Ice in Septemberyearvalue
plotly-logomark

Notice that we use .loc to filter the data to get the same output as before and we used three y variables. More importantly, this graph can illustrate a couple of points.

  1. note the extentmean seems much more flatter than in the graph before. However, if you hover over the graph, you can see the values for each year for the average are the same as before i.e. 12.66 million sq. miles for 1978 and 10.69 sq miles for 2022. How come the graphs look so different then? If you notice more carefully, you will see the y-axis started much lower here, around 4 mn aq miles as opposed to 10.5 mn sq miles in the previous graph, as we included the minimum per year also in the same graph. This illustrates that one needs to be careful with how the graphs are presented as one can easily change large differences visually by changing the scale of the graphs.

  2. On the other hand, we need to be carful about interpreting flat minimum, mean and maximum as they may happen at different times of the year.

9.1 South Data:

So far we have analyzed the data for the North pole / Arctic Ice.

Let’s analyze the south pole or Antarctic ice data. The code should be very simlar except we might have to make a few changes.

import plotly.express as px 
import pandas as pd
import os
import io
import zipfile

# we need to take care of unverified https error. either install certificates command on mac or
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

# let us get the data for all the years since 1990
# The ASCII data files tabulate extent and area (in millions of square kilometers) by year for a given month

# create an empty data frame
df_south = pd.DataFrame()

# Months that are between 1 and 12, we put 13 because it always ends at one less than the end number 
for month in range(1,13): 

# If the month is between 1 and 9 add 0 to the number. For example, 1 would now be 01. 
    if month <10: 
        month = '0' + str(month) 
    
    url = f"https://masie_web.apps.nsidc.org/pub/DATASETS/NOAA/G02135/south/monthly/data/S_{month}_extent_v3.0.csv"
    df_south_month = pd.read_csv(url)   

    # let us concatenate the monthly files
 
    # we are adding all the files together because there are seperate files for each month.

    df_south = pd.concat([df_south,df_south_month], axis=0)  

#rename variables
df_south.columns = df_south.columns.str.replace(' ', '')
df_south.rename({'mo': 'month'}, axis = 1, inplace=True)

Let us use copilot to give us the documentation for the code that we wrote above

  • The provided code snippet is written in Python and is used to fetch, process, and concatenate monthly sea ice extent data for the Southern Hemisphere from the National Snow and Ice Data Center (NSIDC). Let’s break down the code step by step:

    • First, the necessary libraries are imported. plotly.express is used for data visualization, pandas for data manipulation, os and io for handling file operations, and zipfile for working with zip files.
    • Additionally, the ssl library is imported to handle HTTPS context settings, which is necessary to avoid unverified HTTPS errors when fetching data from the web.
    • The line ssl._create_default_https_context = ssl._create_unverified_context ensures that the HTTPS context is set to an unverified mode, allowing the script to bypass SSL certificate verification.
  • An empty DataFrame named df_south is created to store the concatenated data for all months.

  • The script then iterates over a range of months from 1 to 12 using a for loop. For each month, if the month number is less than 10, a leading zero is added to ensure the month is formatted as a two-digit string (e.g., ‘01’ for January).

  • Within the loop, a URL is constructed for each month’s data file using an f-string. The URL points to the CSV file containing the sea ice extent data for the specified month.

  • The pd.read_csv(url) function is used to read the CSV file from the URL into a DataFrame named df_south_month.

  • The script then concatenates the monthly DataFrame df_south_month with the main DataFrame df_south using the pd.concat function.

  • This process is repeated for each month, resulting in a single DataFrame that contains data for all months from January to December.

  • After all the monthly data has been concatenated, the script cleans up the DataFrame by removing any spaces from the column names using df_south.columns.str.replace(’ ‘,’‘). Additionally, the column named ’mo’ is renamed to ‘month’ using the df_south.rename function with the axis=1 parameter to specify that the renaming should be applied to the columns.

Overall, this code efficiently fetches and processes monthly sea ice extent data from the NSIDC, concatenates it into a single DataFrame, and cleans up the column names for further analysis or visualization.

Wow! The documentation is correct and it seems to be better than the documentation that we wrote for the North part of the data. We first wrote the North data analysis a couple of years ago when we didn’t have access to ChatGPT or Github CoPilot.

In fact, while revising the book, we realized that we could write some of the code in this book more efficiently. But we didn’t change it as that is how we learned and we wanted to showcase the process of our learning. We started using Copilot for documentation though as it gives more detailed documentation that we normally write (and better language too, though a bit more verbose than us)!

If the above code doesn’t work for you because you are using a different OS, try the following code.

import plotly.express as px 
import pandas as pd
import os
import io
import zipfile
import urllib ## NEW LINE 1 ##

# we need to take care of unverified https error. either install certificates command on mac or
import ssl
# ssl._create_default_https_context = ssl._create_unverified_context  ## COMMENTED OUT 1 ##
context = ssl._create_unverified_context()  ## NEW LINE 2 ##

# let us get the data for all the years since 1990
# The ASCII data files tabulate extent and area (in millions of square kilometers) by year for a given month

# create an empty data frame
df_south = pd.DataFrame()

# Months that are between 1 and 12, we put 13 because it always ends at one less than the end number 
for month in range(1,13): 

# If the month is between 1 and 9 add 0 to the number. For example, 1 would now be 01. 
    if month <10: 
        month = '0' + str(month) 
    
    url = f"https://masie_web.apps.nsidc.org/pub/DATASETS/NOAA/G02135/south/monthly/data/S_{month}_extent_v3.0.csv"
    #df_south_month = pd.read_csv(url)   ## COMMENTED OUT 2 ##
    temp_data = urllib.request.urlopen(url, context=context) ## NEW LINE 3 ##
    df_south_month = pd.read_table(temp_data, sep=',')  ## NEW LINE 4 ##
    # let us concatenate the monthly files
 
    # we are adding all the files together because there are seperate files for each month.

    df_south = pd.concat([df_south,df_south_month], axis=0)  

#rename variables
df_south.columns = df_south.columns.str.replace(' ', '')
df_south.rename({'mo': 'month'}, axis = 1, inplace=True)

Let us look at the data more closely:

df_south.dtypes

df_south.describe()
year month extent area
count 551.000000 551.000000 551.000000 551.00000
mean 2001.292196 6.493648 -24.825118 -27.59755
std 13.270059 3.455096 602.585746 602.40990
min 1978.000000 1.000000 -9999.000000 -9999.00000
25% 1990.000000 3.500000 5.745000 4.01500
50% 2001.000000 6.000000 11.850000 9.03000
75% 2013.000000 9.000000 16.790000 13.31000
max 2024.000000 12.000000 19.760000 15.75000

Notice that we have the same problem that was in the North Data Set here too. The minimum for extent is -9999. So, once again let’s replace it.

import numpy as np
df_south = df_south.replace(-9999, np.nan)
df_south.describe() 
year month extent area
count 551.000000 551.000000 549.000000 549.000000
mean 2001.292196 6.493648 11.510674 8.728142
std 13.270059 3.455096 5.586368 4.592586
min 1978.000000 1.000000 1.910000 1.150000
25% 1990.000000 3.500000 5.840000 4.070000
50% 2001.000000 6.000000 11.930000 9.100000
75% 2013.000000 9.000000 16.800000 13.310000
max 2024.000000 12.000000 19.760000 15.750000

As before, we replace all occurrences of the value -9999 in the DataFrame df_south with np.nan, which represents a missing value (NaN stands for “Not a Number”).

This is a common data cleaning step, as -9999 is often used as a placeholder for missing or invalid data in datasets.

By converting these placeholders to np.nan, subsequent data analysis and processing can handle these missing values appropriately.

Learning from the North data, let us create a new variable that can give us monthly data

df_south['yearmonth'] = df_south['year']*100 + df_south['month']

The new yearmonth column is generated by combining the values from the year and month columns. Specifically, it multiplies the year value by 100 and then adds the month value. This operation effectively concatenates the year and month into a single integer. For example, if the year is 2023 and the month is 10, the resulting yearmonth value would be 202310.

This approach is useful for creating a unique identifier for each year-month combination, which can be particularly helpful for time series analysis or when you need to group data by specific periods. By converting the year and month into a single integer, it simplifies sorting and filtering operations based on these combined time periods.

fig = px.line(df_south, x="year", y="area")
fig.show()
198019851990199520002005201020152020246810121416
yeararea
plotly-logomark

This doesn’t look that good!

fig = px.bar(df_south, x="yearmonth", y="area")
fig.show()
198k198.5k199k199.5k200k200.5k201k201.5k202k0246810121416
yearmontharea
plotly-logomark

The variable area doesn’t have two extra spaces as it was in the North dataset. This again shows the importance of looking at the data and cleaning the data. In real world, 80% of the time is spent on cleaning the data before using it :(

Now, we basically have to do the same code as we did in the North Data Set in order to compare.

df_south.nlargest(5, 'extent')
year month data-type region extent area yearmonth
35 2014 9 Goddard S 19.76 15.75 201409
34 2013 9 Goddard S 19.39 15.28 201309
33 2012 9 Goddard S 19.21 15.41 201209
27 2006 9 Goddard S 19.09 15.18 200609
34 2013 10 Goddard S 19.02 15.13 201310
df_south.nsmallest(5, 'extent')
year month data-type region extent area yearmonth
44 2023 2 NRTSI-G S 1.91 1.15 202302
45 2024 2 NRTSI-G S 2.14 1.51 202402
43 2022 2 Goddard S 2.21 1.38 202202
38 2017 2 Goddard S 2.29 1.64 201702
39 2018 2 Goddard S 2.33 1.63 201802

Oh, this is interesting. The month with the smallest extent in the North Data Set is September and the month with the largest extent in the South Data Set is September. The months with the smallest extent in the North Data set are March and February. The month with the smallest extent in the South Data Set is February.

fig = px.line(df_south.loc[df_south.month==9], x='year', y='extent', title="Extent of arctic Ice in September(South)")
fig.show()

fig = px.line(df_south.loc[df_south.month==2], x='year', y='extent', title="Extent of arctic Ice in Feburary(South)")
fig.show()
1980198519901995200020052010201520201717.51818.51919.5
Extent of arctic Ice in September(South)yearextent
plotly-logomark
19801985199019952000200520102015202022.533.54
Extent of arctic Ice in Feburary(South)yearextent
plotly-logomark

For the first graph, we can see that recently the extent of arctic Ice is decreasing by a significant amount. In the second graph, there are a numerous sharp increases and decreases and as of now it is still decreasing.

9.1.1 Annual Data Analysis

We can use the annual data from NOAA as National Snow and Ice Data Center computes it more accurately. Specifically, the documentation mentions that “…the Excel workbook contains four spreadsheets with monthly average sea ice extent and area, plus weighted average monthly extent and area for each year, from 1979 to the most recent month, for both the Northern Hemisphere (NH) and the Southern Hemisphere (SH), in millions of square kilometers…”

For more details see

  • Fetterer, F., K. Knowles, W. N. Meier, M. Savoie, and A. K. Windnagel. (2017). Sea Ice Index, Version 3 [Data Set]. Boulder, Colorado USA. National Snow and Ice Data Center. https://doi.org/10.7265/N5K072F8. Date Accessed 12-25-2023.

  • the Sea Ice Index product landing page (https://nsidc.org/data/g02135)

This will also give us some experience working with reading Excel sheets into a pandas DataFrame.

import pandas as pd

# we need to take care of unverified https error. either install certificates command on mac or
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

seaice_year = pd.ExcelFile("https://masie_web.apps.nsidc.org/pub/DATASETS/NOAA/G02135/seaice_analysis/Sea_Ice_Index_Monthly_Data_by_Year_G02135_v3.0.xlsx")

If the above code doesn’t work due to differences in OS, try instead:

import pandas as pd
import urllib ## NEW LINE 1 ##

# we need to take care of unverified https error. either install certificates command on mac or
import ssl
# ssl._create_default_https_context = ssl._create_unverified_context  ## COMMENTED OUT 1 ##
context = ssl._create_unverified_context()  ## NEW LINE 2 ##

# seaice_year = pd.ExcelFile("https://masie_web.apps.nsidc.org/pub/DATASETS/NOAA/G02135/seaice_analysis/Sea_Ice_Index_Monthly_Data_by_Year_G02135_v3.0.xlsx") ## COMMENTED OUT 2 ##
temp_data = urllib.request.urlopen("https://masie_web.apps.nsidc.org/pub/DATASETS/NOAA/G02135/seaice_analysis/Sea_Ice_Index_Monthly_Data_by_Year_G02135_v3.0.xlsx"
                        , context=context).read() ## NEW LINE 3 ##
seaice_year = pd.ExcelFile(temp_data) ## NEW LINE 4##

It is always a good idea to check the sheets in the excel files

print(seaice_year.sheet_names)
['NH-Extent', 'NH-Area', 'SH-Extent', 'SH-Area', 'Documentation']

It looks like there are five sheets within this excel spreadsheet. From the names, we can tell that they are for northern hemisphere sea ice extent, northern hemisphere sea ice area, south sea ice extent and south sea ice area along with a documentation sheet.

If we open the sheets, we can see that first column is year, then there is separate column for each month, from January to December, and then an annual column.

We can list the columns directly

seaice_nh_df = pd.read_excel(seaice_year, sheet_name='NH-Extent')
print(seaice_nh_df.columns.tolist())
['Unnamed: 0', 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'Unnamed: 13', 'Annual']

As we can see from the spreadsheet, the ‘Unnamed:0’ column is year. Our interest is in year and Annual columns

seaice_nh_df = seaice_nh_df.iloc[:,[0, 14]]
seaice_nh_df.rename(columns={'Unnamed: 0' : 'year', 'Annual': 'Annual Area'}, inplace=True)

We can do the same for Southern hemisphere

seaice_sh_df = pd.read_excel(seaice_year, sheet_name='SH-Extent')
seaice_sh_df = seaice_sh_df.iloc[:,[0, 14]]
seaice_sh_df.rename(columns={'Unnamed: 0' : 'year', 'Annual': 'Annual Area'}, inplace=True)

We can follow the same sea ice extent analysis as before

import plotly.express as px
fig = px.line(seaice_sh_df, x= 'year', y= 'Annual Area', title="Extent of Antarctic Ice: Annual")
fig.show()
1980198519901995200020052010201520209.51010.51111.51212.513
Extent of Antarctic Ice: AnnualyearAnnual Area
plotly-logomark
import plotly.express as px
fig = px.line(seaice_nh_df, x= 'year', y= 'Annual Area', title="Extent of Arctic Ice: Annual")
fig.show()
19801985199019952000200520102015202010.51111.51212.5
Extent of Arctic Ice: AnnualyearAnnual Area
plotly-logomark

By the way, do you see any sudden increases or drops towards the end? We always need to be careful to see if the data is available for the entire year. Otherwise, as some months can have a lower or higher value, the graphs can be misleading!

Let us drop 2024 as we don’t have full data for the entire year.

import plotly.express as px
fig = px.line(seaice_sh_df.loc[seaice_sh_df.year!=2024], x= 'year', y= 'Annual Area', title="Extent of Antarctic Ice: Annual")
fig.show()
1980198519901995200020052010201520201010.51111.51212.513
Extent of Antarctic Ice: AnnualyearAnnual Area
plotly-logomark

The pattern remains the same as before for the Antarctic Ice.

How about Arctic Ice?

import plotly.express as px
fig = px.line(seaice_nh_df.loc[seaice_nh_df.year!=2024], x= 'year', y= 'Annual Area', title="Extent of Arctic Ice: Annual")
fig.show()
19801985199019952000200520102015202010.51111.51212.5
Extent of Arctic Ice: AnnualyearAnnual Area
plotly-logomark

We see that the last year, the upward trend is removed as we consider only full years of data (that is stop in 2023)

As before, with a refined annual sea ice extent, we still see a significant decrease in the extent of sea ice, both in the southern hemisphere and northern hemisphere.

9.2 Ice and Snow Indicators from EPA

Recently, EPA has started putting together a number of climate indicators, including on Snow and Ice.

Let us make use of the data underlyng the EPA charts

9.2.1 Arctic Ice

Arctic Sea Ice Melt Season

We can create the graph ourlselves using the data from the EPA website

```{python}
import pandas as pd

url = "https://www.epa.gov/system/files/other-files/2023-11/arctic-sea-ice_fig-3.csv"

df = pd.read_csv(url)
```

Unfortunately, we get a HTTP Error 403: Forbidden error. It seems that the server doesn’t want to accept our request.

We have two options. We can either download the data into a local CSV file and then read it. As we would like to update the data whenever it gets updated on EPA, we may want to find a programmatic solution.

We turn to our best friend Google and Stack Overflow. Luckily some one has addressed this issue before. See Stack overflow.

We can adapt their code.

from io import StringIO
import pandas as pd

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/118.0",
}
url = "https://www.epa.gov/system/files/other-files/2023-11/arctic-sea-ice_fig-3.csv"

f = StringIO(requests.get(url, headers=headers).text)

df_arcticice = pd.read_csv(f,  skiprows=6)
df_arcticice.head()
Year Start date Duration End date
0 1979 169.275 88.412 257.687
1 1980 166.438 91.913 258.351
2 1981 164.909 91.975 256.884
3 1982 168.775 87.800 256.575
4 1983 169.446 87.883 257.329

It seems, we need to 1. use requests to get the CSV file 2. use StringIO to parse the file 3. we skip the first 6 rows as it has text details about the columns

fig = px.bar(df_arcticice, x = 'Year', y = 'Duration', color = 'Year')
fig.show()

fig = px.line(df_arcticice, x ='Year', y = 'Duration', 
    title = 'Arctic Sea Ice Duration',
)
fig.update_layout(title={'x':0.75})
fig.show()
19801990200020102020020406080100120
198019851990199520002005201020152020YearYearDuration
plotly-logomark
198019851990199520002005201020152020859095100105110115120125130
Arctic Sea Ice DurationYearDuration
plotly-logomark

by the way, we centered the tile using fig.update_layout(title={‘x’:0.5}). 0.5 centers. how about 0.75?

Let us look at the largest values of duration of the arctic ice melting.

df_arcticice.nlargest(5, 'Duration')
Year Start date Duration End date
28 2007 157.275 131.019 288.294
37 2016 159.160 128.902 288.063
33 2012 157.354 128.453 285.807
40 2019 158.310 127.937 286.247
41 2020 162.057 126.871 288.928

Duration stands for how long arctic Ice has been melting. arctic Ice can melt for many months. As we can see from the code above, the biggest duration of arctic Ice melting was in 2007. It has decreased over the past years but not by a great amount. We can also see this in the graph. Around the 80s and 90’s it was a bit stable going up and down but around the 2000’s the duration started going up.

Using this graph and from what we know, we can make a connection to the other data set. If the duration increases over the years then the extent or area of the arctic ice starts decreasing.

9.2.2 Antarctic Ice

EPA shows a nice figure on Antarctic Ice similar to what we have shown from NSIDC data before.

EPA Antarctic Ice

Let us chart it ourselves using the excel data provided by EPA. It is not that difficult as it is just reading the data and plotting it.

from io import StringIO
import pandas as pd

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/118.0",
}
url = "https://www.epa.gov/system/files/other-files/2022-07/antarctic-sea-ice-fig-1.csv"

f = StringIO(requests.get(url, headers=headers).text)

df_antice = pd.read_csv(f,  skiprows=6)
df_antice.head()
Year February September
0 1979 1.212361 7.027059
1 1980 1.088808 7.266443
2 1981 1.108113 7.181500
3 1982 1.208500 7.084975
4 1983 1.185334 7.177639
fig = px.line(df_antice, x='Year', y=['February', 'September'], title = "February and September Monthly Average Antarctic Sea Ice Extent, 1979–2021").update_layout(yaxis_title="Sea Ice Extent (mn sq miles)", legend={'title_text':''})

fig.show()
19801985199019952000200520102015202012345678
FebruarySeptemberFebruary and September Monthly Average Antarctic Sea Ice Extent, 1979–2021YearSea Ice Extent (mn sq miles)
plotly-logomark

We don’t seem to see much change in the sea ice extent in the peak month (September) and lowest month (February). One thing to note from our graphs using the data directly from NSIDC is that we originally plotted the data in millions of square km and the EPA data are in millions of square miles.

The annual averages and the length of the melting season informs us more than the data in any single month.

We can see that in recent years, both the arctic and antarctic sea extent has been decreasing when measured over the year.

9.2.3 Ice Sheets

As noted by NSIDC,

  • Earth has just two ice sheets, one covers most of Greenland, the largest island in the world, and the other spans across the Antarctic continent.

  • Together, the Antarctic and Greenland Ice Sheets contain more than 99 percent of the freshwater ice on Earth. The amount of fresh water stored in both the Antarctic and Greenland Ice Sheets total over 68 percent of the fresh water on Earth.

  • The Antarctic Ice Sheet extends almost 14 million square kilometers (5.4 million square miles), roughly the area of the contiguous United States and Mexico combined.

  • The Greenland Ice Sheet extends about 1.7 million square kilometers (656,000 square miles), covering 80 percent of the world’s largest island, and is equivalent to about three times the size of Texas.

  • Both ice sheets measure on average about 3.2 kilometers (2 miles) thick.

We next move on to another indicator, Ice Sheets.

Ice Sheets

We are not going to plot it as it is a repetition of the code that we have done before. If we would like, we need to 1. read the CSV data, making sure that we don’t get a 403 error 2. plot the data and adjust the legends and labels as necessary

from io import StringIO
import pandas as pd

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/118.0",
}
url = "https://www.epa.gov/sites/default/files/2021-04/ice_sheets_fig-1.csv"

f = StringIO(requests.get(url, headers=headers).text)

df_icesheets = pd.read_csv(f, skiprows = 6)

df_icesheets
Year NASA - Antarctica land ice mass NASA - Greenland land ice mass IMBIE - Antarctica cumulative ice mass change IMBIE - Antarctica cumulative ice mass change uncertainty IMBIE - Greenland cumulative ice mass change IMBIE - Greenland cumulative ice mass change uncertainty
0 1992.000000 NaN NaN 418.310350 7.221207 395.365526 16.281278
1 1992.083333 NaN NaN 425.377021 50.029991 401.965526 23.025204
2 1992.166667 NaN NaN 433.552016 53.064819 408.565526 28.200000
3 1992.250000 NaN NaN 424.496837 59.023007 415.165526 32.562555
4 1992.333333 NaN NaN 423.362721 65.012377 421.765526 36.406043
... ... ... ... ... ... ... ...
512 2020.710000 -2437.63 -4996.08 NaN NaN NaN NaN
513 2020.790000 -2537.64 -4928.75 NaN NaN NaN NaN
514 2020.870000 -2587.65 -4922.26 NaN NaN NaN NaN
515 2020.960000 -2746.00 -4899.46 NaN NaN NaN NaN
516 NaN NaN NaN NaN NaN NaN NaN

517 rows × 7 columns

#listcol = df_icesheets.columns.tolist()
#fig = px.line(df_icesheets, x='Year', y=listcol)

#y = ['IMBIE - Antarctica cumulative ice mass change', 'IMBIE - Antarctica cumulative ice mass change uncertainty', 'IMBIE - Greenland cumulative ice mass change', 'IMBIE - Greenland cumulative ice mass change uncertainty']

fig = px.line(df_icesheets, x = 'Year', y = ['IMBIE - Antarctica cumulative ice mass change',  'IMBIE - Greenland cumulative ice mass change'])

fig.update_layout(
    yaxis_title='Cumulative Mass Change (Billions of Tonnes)',
    title='Cumulative Mass Balance of Antarctica and Greenland, 1992-2020 ',
    hovermode="x",
    legend=dict(x=0.1,y=0.2)
)
fig.update_layout(legend={'title_text':''})
fig.show()
199520002005201020152020−3500−3000−2500−2000−1500−1000−5000500
IMBIE - Antarctica cumulative ice mass changeIMBIE - Greenland cumulative ice mass changeCumulative Mass Balance of Antarctica and Greenland, 1992-2020YearCumulative Mass Change (Billions of Tonnes)
plotly-logomark

Why doesn’t the graph look similar to the graph that we downloaded from EPA directly.

Real world data is never clean and we need to understand the changes. By browsing through the excel spreadsheet, it appears that around 2002 once the NASA data is available, the frequency of data seems to have changed and there are missing values.

nrows, ncols = df_icesheets.shape
print(f'the data set has {nrows} observations and {ncols} of columns or variables')
the data set has 517 observations and 7 of columns or variables

What did we do? Copilot to the rescue :)

We first extracted the number of rows and columns from the DataFrame df_icesheets. The shape attribute of a DataFrame returns a tuple containing two values: the number of rows and the number of columns. By assigning df_icesheets.shape to nrows, ncols, the code unpacks this tuple into two separate variables: nrows for the number of rows and ncols for the number of columns.

Let us remove the missing values and plot again

ycols = ['IMBIE - Antarctica cumulative ice mass change',  'IMBIE - Greenland cumulative ice mass change']
df = df_icesheets.dropna(subset=ycols)
nrows, ncols = df.shape
print(f'the data set has {nrows} observations and {ncols} of columns or variables')
the data set has 306 observations and 7 of columns or variables

we can see that around 211 observations are dropped.

fig = px.line(df, x='Year', y = ycols, title = "Cumulative Mass Balance of Greenland and Antarctica, 1992–2020").update_layout(yaxis_title="Cumulative change in Mass (bn metric tonnes)", legend={'title_text':''})
fig.update_layout(
    yaxis_title='Cumulative Mass Change (Billions of Tonnes)',
    title='Cumulative Mass Balance of Antarctica and Greenland, 1992-2020 ',
    hovermode="x",
    legend=dict(x=0.1,y=0.2)
)
fig.update_layout(legend={'title_text':''})
fig.show()
19952000200520102015−3500−3000−2500−2000−1500−1000−5000500
IMBIE - Antarctica cumulative ice mass changeIMBIE - Greenland cumulative ice mass changeCumulative Mass Balance of Antarctica and Greenland, 1992-2020YearCumulative Mass Change (Billions of Tonnes)
plotly-logomark

By the way, we can go back to Berkeley Earth and highlight a figure that they mention as

A representation of the length of time during which sea ice levels in the Arctic Ocean were at 15% or less, allowing for clear passage through the Arctic Ocean.

Arctic Passable. Source: Berkeley Earth https://berkeleyearth.org/dv/periods-when-the-arctic-ocean-has-been-passable/

9.2.4 Glaciers

As noted by NSIDC

A glacier is an accumulation of ice and snow that slowly flows over land. At higher elevations, more snow typically falls than melts, adding to its mass. Eventually, the surplus of built-up ice begins to flow downhill. At lower elevations, there is usually a higher rate of melt or icebergs break off that removes ice mass.

There are two broad categories of glaciers: alpine glaciers and ice sheets.

Alpine glaciers are frozen rivers of ice, slowly flowing under their own weight down mountainsides and into valleys.

More information on Glaciers is available at https://wgms.ch/global-glacier-state/

Glaciologists assess the state of a glacier by measuring its annual mass balance as the combined results of snow accumulation (mass gain) and melt (mass loss) during a given year. The mass balance reflects the atmospheric conditions over a (hydrological) year and, if measured over a long period and displayed in a cumulative way, trends in mass balance are an indicator of climate change. Seasonal melt contributes to runoff and the annual balance (i.e. the net change of glacier mass) contributes to sea level change.

Glacier Mass

The data is available at http://wgms.ch/data/faq/mb_ref.csv

from io import StringIO
import pandas as pd
import requests

def url_data_request(url, skiprows):
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/118.0",
    }
    f = StringIO(requests.get(url, headers=headers).text)
    df_url = pd.read_csv(f, skiprows = skiprows)
    return df_url

url = "http://wgms.ch/data/faq/mb_ref.csv"
df = url_data_request(url,0)
df
Year MB_REF_count REF_regionAVG REF_regionAVG_cum-rel-1970
0 1950 5 -1142 5392
1 1951 4 -345 5047
2 1952 4 -561 4486
3 1953 8 -562 3924
4 1954 7 -420 3504
... ... ... ... ...
69 2019 61 -994 -22144
70 2020 61 -883 -23027
71 2021 61 -677 -23704
72 2022 60 -1090 -24794
73 2023 53 -1229 -26023

74 rows × 4 columns

The dataset has 4 columns. Let us plot the cumulative mass change of glaciers where 1970 is set as zero.

import plotly.express as px
fig = px.line(df, x= 'Year', y = 'REF_regionAVG_cum-rel-1970', 
    labels={"REF_regionAVG_cum-rel-1970": "Cumulative Mass Change in m w.e "},
    title = 'Global Cumulative Change of Glaciers'
    )
fig.show()
19501960197019801990200020102020−25k−20k−15k−10k−5k05k
Global Cumulative Change of GlaciersYearCumulative Mass Change in m w.e
plotly-logomark

Glacier

Some useful references to get further information are:

  1. https://wgms.ch/global-glacier-state/

  2. WGMS (2021, updated, and earlier reports). Global Glacier Change Bulletin No. 4 (2018-2019). Zemp, M., Nussbaumer, S. U., Gärtner-Roer, I., Bannwart, J., Paul, F., and Hoelzle, M. (eds.), ISC(WDS)/IUGG(IACS)/UNEP/UNESCO/WMO, World Glacier Monitoring Service, Zurich, Switzerland, 278 pp., publication based on database version: doi:10.5904/wgms-fog-2021-05.

They state that

“…In the hydrological years 2020/21 and 2021/22, observed reference glaciers experienced an ice loss of 0.8 m w.e. and of 1.2 m w.e., respectively. With this, eight out of the ten most negative mass balance years were recorded after 2010….”

The observed glaciers were close to steady states during the 1960s followed by increasingly strong ice loss until present.

Glaciers across the world

We can see also the extent of Glaciers from the EPA data

Glaciers

As EPA states > “The line on the upper graph represents the average of all the glaciers that were measured. Negative values indicate a net loss of ice and snow compared with the base year of 1956. For consistency, measurements are in meters of water equivalent, which represent changes in the average thickness of a glacier.”

We can also plot the data from EPA ourlselves

from io import StringIO
import pandas as pd

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/118.0",
}

url = "https://www.epa.gov/system/files/other-files/2024-05/glaciers_fig-1.csv"

f = StringIO(requests.get(url, headers=headers).text)

df_glaciers = pd.read_csv(f, skiprows = 6)

df_glaciers = df_glaciers.iloc[1:, :]

Let’s plot the data:

fig = px.line(df_glaciers, x='Year', y='Mean cumulative mass balance')

fig.update_layout(
    title = "Average Cumulative Mass Balance of 'Reference' Glaciers Worldwide, 1956–2023",
    yaxis_title="Cumulative Mass Balance (meters of water equivalent)",
    title_x = 0.5
    )

fig.show()
1960197019801990200020102020−30−25−20−15−10−50
Average Cumulative Mass Balance of 'Reference' Glaciers Worldwide, 1956–2023YearCumulative Mass Balance (meters of water equivalent)
plotly-logomark

Note, we used a ’’ in the title for Reference as the title is enclosed in ““.

It seems that there is a significant loss of mass, measured in water equivalents, since the base year of 1956.

Per the EPA eplanation for the figure,

On average, glaciers worldwide have been losing mass since at least the 1970s (see Figure 1), which in turn has contributed to observed changes in sea level (see the Sea Level indicator). A longer measurement record from a smaller number of glaciers suggests that they have been shrinking since the 1950s. The rate at which glaciers are losing mass appears to have accelerated over roughly the last decade.

9.2.5 Why do we care about Glaciers?

OK. Glaciers are losing a significant mass, as measured in water equivalents. That’a a lot of water! But why do we care? Is it something that we need to worry about? Is there a link between Loss of mass in glaciers −> Sea Level Raise?

Let us check out the information from <https://wgms.ch/sea-level-rise/?

Melting ice sheets in Greenland and the Antarctic as well as ice melt from glaciers all over the world are causing sea levels to rise. For the IPCC special report on “Ocean and Cryosphere in a Changing Climate“, and AR6 WG-I, an international research team combined glaciological field observations with geodetic satellite measurements – as available from the WGMS – to reconstruct annual mass changes of more than 19’000 glaciers worldwide. This new assessment shows that glaciers alone lost more than 9,000 billion tons of ice between 1961/62 and 2015/16, raising water levels by 27 millimetres. This global glacier mass loss corresponds to an ice cube with the area of Germany and a thickness of 30 metres. The largest contributors were glaciers in Alaska, followed by the melting ice fields in Patagonia and glaciers in the Arctic regions. Glaciers in the European Alps, the Caucasus mountain range and New Zealand were also subject to significant ice loss; however, due to their relatively small glacierized areas they played only a minor role when it comes to the rising global sea levels.

Glacier-Sea LEvel Rise

Glacier-Sea LEvel Rise

Let is get some more data about glaciers from the EPA.

from io import StringIO
import pandas as pd

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/118.0",
}

url = "https://www.epa.gov/system/files/other-files/2024-05/glaciers_fig-2.csv"

f = StringIO(requests.get(url, headers=headers).text)

df = pd.read_csv(f,  skiprows=6)
df.head()
Year South Cascade Glacier Gulkana Glacier Wolverine Glacier Lemon Creek Glacier
0 1952 NaN NaN NaN 1.29
1 1953 NaN NaN NaN 0.80
2 1954 NaN NaN NaN 0.69
3 1955 NaN NaN NaN 1.88
4 1956 NaN NaN NaN 1.31

Also note that we got a bit lazy here and started using df instead of naming the dataset more appropriately, say df_glaciers_epa. But we are not reusing datasets from before so it is easier to use the short df and overwrite the data frames. It becomes more of an issue when we need to reuse a old dataframe or merge with another data frame.

We see there are a lot of NaN values as some of hte measurements may have started at a later time than 1952.

fig = px.line(df, x="Year", y=['South Cascade Glacier', 'Gulkana Glacier', 'Wolverine Glacier', 'Lemon Creek Glacier'])

fig.update_layout(
    title = "Cumulative Mass Balance of Four U.S. Glaciers, 1952–2023",
    yaxis_title="Cumulative Mass Balance (meters of water equivalent)",
    title_x = 0.5,
    legend=dict(
        x=0.05,
        y=0.5
    )
)
fig.show()
1960197019801990200020102020−50−40−30−20−100
variableSouth Cascade GlacierGulkana GlacierWolverine GlacierLemon Creek GlacierCumulative Mass Balance of Four U.S. Glaciers, 1952–2023YearCumulative Mass Balance (meters of water equivalent)
plotly-logomark

We have using similar commands before, but two items to note

  1. y=[‘South Cascade Glacier’, ‘Gulkana Glacier’, ‘Wolverine Glacier’, ‘Lemon Creek Glacier’]
  • Here, instead of one column, the y parameter is a list of column names from the DataFrame: ‘South Cascade Glacier’, ‘Gulkana Glacier’, ‘Wolverine Glacier’, and ‘Lemon Creek Glacier’. These columns contain the data points for each respective glacier, which will be plotted on the y-axis.
  1. We have used fig.update_laypout to set the main title, yaxis title.
  2. We have also centered the title by using title_x = 0.5. In order to keep the chart area larger, we have moved the legend inside the chart. We chose x= 0.05 and y = 0.5 based on the empty space in the chart.

OK. Enough with the code. What is the figure telling us?

As per the EPA explanation,

This figure shows the cumulative mass balance of four U.S. reference glaciers since measurements began in the 1950s or 1960s. For each glacier, the mass balance is set at zero for the base year of 1965. Negative values indicate a net loss of ice and snow compared with the base year. For consistency, measurements are in meters of water equivalent, which represent changes in the average thickness of a glacier.

The four U.S. reference glaciers have shown an overall decline in mass balance since the 1950s and 1960s and an accelerated rate of decline in recent years. Year-to-year trends vary, with some glaciers gaining mass in certain years (for example, Wolverine Glacier during the 1980s), but the measurements clearly indicate a loss of glacier mass over time.

Trends for the four U.S. reference glaciers are consistent with the retreat of glaciers observed throughout the western United States, Alaska, and other parts of the world. Observations of glaciers losing mass are also consistent with warming trends in U.S. and global temperatures during this time period (see the U.S. and Global Temperature indicator).

The images from the EPA website also tell a clear story.

McCall Glacier 1958

McCall Glacier 2003

We can see stark differences!

9.3 Arctic Glaciers

Let us move on to Arctic Glaciers from the EPA website

Cumulative Mass Balance of Eight Arctic Glaciers, 1945–2020

As EPA states,

This figure shows the cumulative change in mass balance of a set of eight glaciers located at a latitude of 66°N or higher with data beginning in 1945. Each time series has been adjusted to make 1970 a common zero point because this was the first year for which mass balance data are available for all eight glaciers. Negative values indicate a net loss of ice and snow compared with the base year of 1970. For consistency, measurements are in meters of water equivalent, which reflects the mass loss of the glacier by representing changes in the glacier’s average thickness.

As before, we can download the data from EPA.

url = "https://www.epa.gov/system/files/other-files/2022-07/arctic-glaciers-fig-1_0.csv"

df_arcticg = pd.read_csv(url, skiprows = 6)

df_arcticg
Year Meighen White Austre Midtre Devon Melville Storglaciaeren Engabreen Average
0 1945 NaN NaN NaN NaN NaN NaN 11.14 NaN 9.25
1 1946 NaN NaN NaN NaN NaN NaN 9.26 NaN 7.37
2 1947 NaN NaN NaN NaN NaN NaN 8.97 NaN 7.08
3 1948 NaN NaN NaN NaN NaN NaN 9.56 NaN 7.67
4 1949 NaN NaN NaN NaN NaN NaN 8.81 NaN 6.92
... ... ... ... ... ... ... ... ... ... ...
71 2016 -9.12 -10.35 -23.61 -18.69 -7.33 -16.31 -5.57 2.10 -11.19
72 2017 -8.92 -10.34 -24.49 -19.46 -7.28 -16.09 -7.17 0.47 -11.74
73 2018 -9.75 NaN -25.20 -20.02 -7.80 NaN -7.48 1.26 -12.10
74 2019 -10.55 NaN -26.94 -21.61 -8.30 NaN -7.62 2.43 -12.70
75 2020 NaN NaN NaN NaN NaN NaN NaN NaN NaN

76 rows × 10 columns

We can see that from 1945 to 1957 they didn’t record anything for the first 6 glaciers. Even after 1957 they didn’t record anything for two of the glaciers. So, let’s just graph the average first.

Let us plot the average:

fig = px.line(df_arcticg, x="Year", y="Average")


fig.update_layout(
    title="Cumulative Mass Balance of Eight Arctic Glaciers, 1945–2019",
    title_x = 0.5
)

fig.show()
19501960197019801990200020102020−10−50510
Cumulative Mass Balance of Eight Arctic Glaciers, 1945–2019YearAverage
plotly-logomark

We can definitely see that the average mass balance is going down over time and is in negative territory. Note that negative values indicate a net loss of ice and snow compared with the base year of 1970.

We only plotted the average. There are many missing value NaN for other glaciers in the initial years. The nice thing with NaN is that Plotly can ignore the NaN and plot the data when it is available.

Let us list the columns in the data frame

df_arcticg.columns
Index(['Year', 'Meighen', 'White ', 'Austre', 'Midtre', 'Devon', 'Melville',
       'Storglaciaeren', 'Engabreen', 'Average'],
      dtype='object')

We can see that there are many glaciers that we can plot.

fig = px.line(df_arcticg, x="Year", y=['Meighen', 'White ', 'Austre', 'Midtre', 'Devon', 'Melville',
       'Storglaciaeren', 'Engabreen'])


fig.update_layout(
    title=" Cumulative Mass Balance of Eight Arctic Glaciers, 1945–2020",
    title_x = 0.5,
    yaxis_title="Cumulative Mass Balance (meters of water equivalent)",
    legend=dict(
        x=0.05,
        y=0.05
    )
)
fig.show()
19501960197019801990200020102020−25−20−15−10−50510
variableMeighenWhite AustreMidtreDevonMelvilleStorglaciaerenEngabreenCumulative Mass Balance of Eight Arctic Glaciers, 1945–2020YearCumulative Mass Balance (meters of water equivalent)
plotly-logomark

What do we take away from the figures? EPA’s key insights for the figures mention

Since 1945, the eight Arctic glaciers in Figure 1 have experienced a decline in mass balance overall. One exception to this pattern is Engabreen, near the subpolar zone along the coast of Norway, which gained mass over the period of record. This trend is anomalous and does not reflect the significant global loss of land ice as measured and catalogued from numerous glaciers elsewhere in the Arctic and around the world in the 20th century. Engabreen is more strongly influenced by precipitation than glaciers elsewhere in the Arctic.

The loss of land-based ice in the Arctic has accelerated in recent decades. The Arctic is the largest contributing source of land ice to global sea level rise.

Lets do one more thing as we have eight glaciers.

subset_columns = ['Meighen', 'White ', 'Austre', 'Midtre', 'Devon', 'Melville','Storglaciaeren', 'Engabreen']

correlation_matrix = df_arcticg[subset_columns].corr()

print(correlation_matrix.round(3))
                Meighen  White   Austre  Midtre  Devon  Melville  \
Meighen           1.000   0.984   0.917   0.917  0.994     0.976   
White             0.984   1.000   0.950   0.949  0.991     0.990   
Austre            0.917   0.950   1.000   1.000  0.938     0.967   
Midtre            0.917   0.949   1.000   1.000  0.936     0.965   
Devon             0.994   0.991   0.938   0.936  1.000     0.985   
Melville          0.976   0.990   0.967   0.965  0.985     1.000   
Storglaciaeren    0.870   0.823   0.790   0.778  0.877     0.817   
Engabreen         0.374   0.248   0.120   0.115  0.365     0.192   

                Storglaciaeren  Engabreen  
Meighen                  0.870      0.374  
White                    0.823      0.248  
Austre                   0.790      0.120  
Midtre                   0.778      0.115  
Devon                    0.877      0.365  
Melville                 0.817      0.192  
Storglaciaeren           1.000      0.668  
Engabreen                0.668      1.000  

What is the correlation matrix?

  • A correlation matrix is a table showing correlation coefficients between variables. Each cell in the table shows the correlation between two variables.
  • The value ranges from -1 to 1, where:
    • 1 indicates a perfect positive correlation.
    • -1 indicates a perfect negative correlation.
    • 0 indicates no correlation.

The correlation_matrix is typically a DataFrame or a 2D array that contains correlation coefficients between different variables.

As we could see, the correlations are all positive indicating that all the glaciers are losing mass. Except Engabreen, the correlation among all the other glaciers is very high, more than 0.8 on average.

We can also plot the correlation matrix

fig = px.imshow(correlation_matrix)
fig.show()
MeighenWhite AustreMidtreDevonMelvilleStorglaciaerenEngabreenEngabreenStorglaciaerenMelvilleDevonMidtreAustreWhite Meighen
0.20.30.40.50.60.70.80.91
plotly-logomark

The imshow function is designed to display image-like data, making it suitable for visualizing matrices. In this context, it will generate a heatmap where the color intensity represents the strength and direction of the correlations.

We can see that most of the plot is in yellow, with the correlation is greater than 0.8. Engabreen is the only one with a different color, indicating that the mass loss there is not significantly correlated with the other seven glaciers. As EPA states, Engabreen is more strongly influenced by precipitation than glaciers elsewhere in the Arctic.

9.4 Glaciers in Glacier National Park

There are a few other figures on the EPA website that take a closer look Glaciers in Glacier National Park.

Total Glacier Surface Area in Glacier National Park, 1966–2015

We can follow the same steps as before 1. get the data from the EPA website 2. make sure that we read the data correctly 3. plot the figure

Why don’t you try it as an exercise?

Can you modify the following code?

```{python}
url = "https://www.epa.gov/sites/default/files/2021-03/glacier-park_fig-1.csv"

df_totalSA = pd.read_csv(url, skiprows = 6, encoding = "ISO-8859-1")

df_totalSA
```

How can you add titles and legends to the code below? See our code before.

```{python}
fig = px.bar(df_totalSA, x="Year", y='Total glacier area')

fig.show()
```

What does this figure convey?

The total surface area of the 37 named glaciers in Glacier National Park decreased by about 34 percent between 1966 and 2015.

Change in Glacier Surface Area in Glacier National Park, 1966–2015

We don’t have all the data to plot this figure. But the EPA notes that

Every glacier’s surface area was smaller in 2015 than it was in 1966. Three glaciers temporarily gained some area during part of the overall time period.

The overall trend of shrinking surface area of glaciers in Glacier National Park is consistent with the retreat of glaciers observed in the United States and worldwide.

9.5 Great Lakes

From the EPA website,

The Great Lakes—Lake Superior, Lake Michigan, Lake Huron, Lake Erie, and Lake Ontario—form the largest group of freshwater lakes on Earth. These lakes support a variety of ecosystems and play a vital role in the economy of the eight neighboring states and the Canadian province of Ontario, providing drinking water, shipping lanes, fisheries, recreational opportunities, and more.

Great Lakes Ice Cover

These graphs show the percentage of surface area in each of the Great Lakes that froze on the maximum day (the “most frozen” day) of each year since 1973. The graph at the lower right is an area-weighted average of all five lakes.

import pandas as pd
url = "https://www.epa.gov/sites/default/files/2021-04/great-lakes-ice_fig-1.csv"

df_lakes = pd.read_csv(url, skiprows = 6)

df_lakes
Year Erie Huron Michigan Ontario Superior All Lakes
0 1973 95.7 66.70 33.0 62.6 69.8 60.6
1 1974 88.5 61.40 39.4 24.5 73.8 58.7
2 1975 80.1 55.10 28.1 14.9 64.9 50.2
3 1976 95.4 52.50 29.5 15.2 49.9 39.9
4 1977 99.8 95.00 93.1 46.7 96.0 89.9
5 1978 100.0 95.80 66.6 57.7 92.5 83.0
6 1979 100.0 96.40 92.3 86.2 97.1 94.7
7 1980 93.4 70.30 38.6 30.6 77.2 62.8
8 1981 96.0 93.80 53.8 47.6 84.7 71.3
9 1982 99.1 93.30 60.2 50.7 85.3 74.8
10 1983 40.8 31.80 23.6 12.5 20.2 22.1
11 1984 95.7 77.60 43.3 40.7 88.2 65.2
12 1985 96.0 87.40 41.3 38.2 81.9 69.8
13 1986 95.5 73.50 66.8 43.7 90.7 76.4
14 1987 88.0 43.50 20.0 8.4 14.5 27.3
15 1988 91.5 60.10 32.7 21.1 67.3 53.6
16 1989 91.6 76.30 30.9 15.5 82.3 59.2
17 1990 72.8 73.80 32.4 29.5 78.1 59.7
18 1991 35.1 43.80 21.5 11.6 89.6 44.3
19 1992 89.8 69.90 32.8 17.5 73.3 45.5
20 1993 94.3 78.30 32.2 29.0 77.2 64.4
21 1994 96.7 96.90 82.7 55.7 96.1 89.0
22 1995 94.0 41.70 21.6 18.8 28.8 32.0
23 1996 100.0 98.20 75.0 45.1 100.0 81.7
24 1997 99.6 65.30 37.8 25.6 89.0 58.4
25 1998 5.4 28.60 15.1 6.2 11.1 14.0
26 1999 74.8 36.00 23.0 17.9 18.3 26.6
27 2000 90.7 41.90 27.2 22.3 33.5 34.8
28 2001 94.0 45.70 29.5 17.9 46.7 34.4
29 2002 14.4 26.10 12.4 4.0 10.3 11.8
30 2003 95.7 96.20 48.0 49.6 95.5 80.2
31 2004 95.4 64.50 36.4 38.5 52.2 51.9
32 2005 93.0 58.90 29.4 37.8 54.5 44.2
33 2006 21.9 32.17 16.2 14.3 16.7 20.4
34 2007 95.8 71.39 37.2 23.8 51.8 55.2
35 2008 93.4 59.51 33.5 14.0 62.7 53.7
36 2009 95.6 85.11 52.3 25.9 93.6 73.9
37 2010 93.1 36.95 23.5 13.1 27.3 27.5
38 2011 95.9 63.79 29.4 31.1 33.6 40.1
39 2012 13.9 23.06 16.7 1.9 8.2 12.9
40 2013 83.7 47.75 24.4 15.6 38.6 38.4
41 2014 96.1 96.10 93.1 60.8 95.8 92.3
42 2015 98.1 96.40 72.9 82.4 95.7 88.8
43 2016 78.7 47.98 26.8 23.5 22.7 33.8
44 2017 35.5 35.35 18.2 5.8 18.7 19.4
45 2018 95.1 81.39 51.3 24.7 77.2 69.0
46 2019 94.3 95.70 55.8 39.8 94.9 80.9
47 2020 15.9 32.10 17.2 10.7 22.6 19.5

We can see that there are columns for: ‘Erie’, ‘Huron’, ‘Michigan’, ‘Ontario’, ‘Superior’, ‘All Lakes’ along with ‘YEAR’.

We can plot

  • lake by lake
  • all the lakes in one plot
  • lake by lake subplots and facet them into one graph

Lets do one lake first

import plotly.express as px
fig = px.line(df_lakes, x='Year', y='Erie')
fig.update_layout(title = "Maximum Ice-Covered Area for Lake Erie, 1973–2020",
    yaxis_title = "Percent Ice Cover",
    title_x = 0.5)

fig.show()
197519801985199019952000200520102015202020406080100
Maximum Ice-Covered Area for Lake Erie, 1973–2020YearPercent Ice Cover
plotly-logomark

We haven’t used functions in a while. So, let us get some more practice with functions. We can see that all we need to change is the lake name or the column for the y-axis and the title.

import plotly.express as px
fig = px.line(df_lakes, x='Year', y='Erie')
fig.update_layout(title = "Maximum Ice-Covered Area for Lake Erie, 1973–2020",
    yaxis_title = "Percent Ice Cover",
    title_x = 0.5)

fig.show()
197519801985199019952000200520102015202020406080100
Maximum Ice-Covered Area for Lake Erie, 1973–2020YearPercent Ice Cover
plotly-logomark
import plotly.express as px

def plot_lake_ice_cover(lake_name):
    fig = px.line(df_lakes, x='Year', y=lake_name)
    fig.update_layout(
        title=f"Maximum Ice-Covered Area for Lake {lake_name}, 1973–2020",
        yaxis_title="Percent Ice Cover",
        title_x=0.5
    )
    fig.show()

What did we do?

  1. The function, plot_lake_ice_cover(), takes a single parameter, lake_name, which represents the name of the lake for which the ice cover data will be visualized.
  2. Within the function, a line plot is created using the Plotly px.line method. This method takes a DataFrame, df_lakes, and plots the data with ‘Year’ on the x-axis and the specified lake_name on the y-axis.
  3. After creating the initial plot, the function customizes the layout of the figure using the update_layout method.
  4. The title of the plot is set to “Maximum Ice-Covered Area for Lake {lake_name}, 1973–2020”, dynamically incorporating the lake_name parameter.
  5. The y-axis is labeled “Percent Ice Cover” to indicate the measurement being plotted.
  6. Additionally, the title is centered horizontally by setting title_x to 0.5.
  7. Finally, the function calls fig.show() to display the plot.

Let us try to call with a different lake name

plot_lake_ice_cover('Huron')
19751980198519901995200020052010201520202030405060708090100
Maximum Ice-Covered Area for Lake Huron, 1973–2020YearPercent Ice Cover
plotly-logomark

We can see that the relevant lake (Huron in this case) data is used along with dynamically includng the Lake name in the title!

How about lake Michigan? Again, all we need to do is to make a function call changing the name of the lake or the input variable.

plot_lake_ice_cover('Michigan')
1975198019851990199520002005201020152020102030405060708090
Maximum Ice-Covered Area for Lake Michigan, 1973–2020YearPercent Ice Cover
plotly-logomark

It beats writing out long lines of code. For example

```{python}
fig = px.line(df_lakes, x='Year', y='Huron')

fig.show()

fig = px.line(df_lakes, x='Year', y='Michigan')

fig.show()

fig = px.line(df_lakes, x='Year', y='Ontario')

fig.show()

fig = px.line(df_lakes, x='Year', y='Superior')

fig.show()

fig = px.line(df_lakes, x='Year', y='All Lakes')

fig.show()
```

All these without including titles and other information!

Lets exactly replicate the EPA figure by faceting all these figures together. Our function helps!

We already have done these in the Seasonal Temperatures graph. We can just adapt it here.

import plotly.graph_objects as go
from plotly.subplots import make_subplots

fig = make_subplots(rows=3, cols=2, subplot_titles=('Erie', 'Huron', 'Michigan', 'Ontario', 'Superior', 'All Lakes'),
        x_title= 'Year',
        y_title = 'Percent Ice Cover')

fig.update_layout(title = "Maximum Ice-Covered Area for Great Lakes, 1973–2020", 
    title_x = 0.5)

fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'Erie'), row=1, col=1)
1980199020002010202050100
Maximum Ice-Covered Area for Great Lakes, 1973–2020ErieHuronMichiganOntarioSuperiorAll LakesYearPercent Ice Cover
plotly-logomark

What do we see? We sew space for 6 figures, 2 columns and 3 rows and we only added Lake Erie so far.

Let us add the rest

fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'Huron'), row=1, col=2)
fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'Erie'), row=2, col=1)
fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'Ontario'), row=2, col=2)
fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'Superior'), row=3, col=1)
fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'All Lakes'), row=3, col=2)
198019902000201020205010019801990200020102020501001980199020002010202050100198019902000201020205010019801990200020102020501001980199020002010202050100
ErieHuronErieOntarioSuperiorAll LakesMaximum Ice-Covered Area for Great Lakes, 1973–2020ErieHuronMichiganOntarioSuperiorAll LakesYearPercent Ice Cover
plotly-logomark

What are the key takeaways based on the graphs? Per EPA,

Since 1973, the maximum area covered in ice on the Great Lakes has varied considerably from year to year. Adding the five Great Lakes together, maximum frozen area has ranged from less than 20 percent in some years to more than 90 percent in others (see Figure 1). All five lakes have experienced some degree of long-term decrease, but the decrease is only statistically significant in one lake (Superior). Years with much-lower-than-normal ice cover appear to have become more frequent during the past two decades, especially in lakes like Erie and Superior that have a history of freezing almost completely.

9.5.0.1 Duratuion of Ice Cover

Let us look at the duration of Ice Cover https://www.epa.gov/climate-indicators/climate-change-indicators-great-lakes-ice-cover

Duration of Ice Cover

These graphs show the number of frozen days per year in each of the Great Lakes since 1973. A frozen day is defined here as a day when at least 5 percent of the lake’s surface area was covered with ice.

We need to follow the same recipte as before

  1. download data
  2. make subplots
import pandas as pd
url = "https://www.epa.gov/sites/default/files/2021-04/great-lakes-ice_fig-2.csv"

df_lakes = pd.read_csv(url, skiprows = 6)

df_lakes
Year Erie Huron Michigan Ontario Superior
0 1973 67 108 85 67 97
1 1974 76 120 106 76 119
2 1975 65 105 97 32 95
3 1976 91 119 108 74 108
4 1977 117 136 124 118 133
5 1978 124 130 113 106 123
6 1979 96 120 117 71 137
7 1980 98 106 90 60 91
8 1981 108 113 104 67 110
9 1982 105 129 105 86 121
10 1983 47 85 64 34 86
11 1984 123 128 113 94 128
12 1985 101 116 94 85 107
13 1986 118 125 114 89 115
14 1987 56 82 72 19 48
15 1988 102 105 89 81 93
16 1989 77 124 107 43 120
17 1990 96 126 108 95 117
18 1991 64 102 81 56 93
19 1992 64 121 81 68 127
20 1993 83 121 89 62 99
21 1994 113 123 107 99 125
22 1995 80 93 61 39 79
23 1996 132 146 123 80 134
24 1997 85 128 113 75 122
25 1998 1 86 56 10 34
26 1999 80 103 80 49 54
27 2000 69 88 65 43 50
28 2001 123 130 111 80 105
29 2002 20 102 59 0 33
30 2003 105 119 102 81 103
31 2004 91 106 89 68 86
32 2005 103 123 104 70 86
33 2006 55 120 68 18 31
34 2007 70 88 69 51 67
35 2008 88 122 92 45 69
36 2009 106 132 106 65 109
37 2010 86 86 78 23 45
38 2011 113 124 102 54 84
39 2012 12 75 52 0 13
40 2013 56 105 84 25 75
41 2014 134 153 145 98 153
42 2015 104 126 109 90 126
43 2016 40 72 62 17 43
44 2017 42 101 80 4 51
45 2018 104 143 99 61 124
46 2019 87 120 93 73 97
47 2020 6 88 72 7 54

Note that reused the df_lakes dataframe. Nothing is the same as before. Now we see that the five lakes, but there is no column for All Lakes.

We could use the same code as before by removing the All Lakes

import plotly.graph_objects as go
from plotly.subplots import make_subplots

fig = make_subplots(rows=3, cols=2, subplot_titles=('Erie', 'Huron', 'Michigan', 'Ontario', 'Superior'),
        x_title= 'Year',
        y_title = 'Duration of Ice Cover')

fig.update_layout(
    title="Duration of Ice Cover in the Great Lakes, 1973–2020", 
    title_x=0.5,
    legend=dict(x=0.7, y=-0.1)
)

fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'Erie'), row=1, col=1)
fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'Huron'), row=1, col=2)
fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'Erie'), row=2, col=1)
fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'Ontario'), row=2, col=2)
fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'Superior'), row=3, col=1)
1980199020002010202005010019801990200020102020050100198019902000201020200501001980199020002010202005010019801990200020102020050100
ErieHuronErieOntarioSuperiorDuration of Ice Cover in the Great Lakes, 1973–2020ErieHuronMichiganOntarioSuperiorYearDuration of Ice Cover
plotly-logomark

The only minor modifications that we have made are

  1. removed “All Lakes” from the subplot_titles
  2. changed the y_title to ‘Duration of Ice Cover’
  3. update the title in fig.update_layout
  4. moved the legend to the empty second column of third row as we only have 5 figures

What do we learn from these figures? EPA keynotes mention

Since 1973, the number of frozen days (duration) on all five Great Lakes has declined. Ice duration on these lakes has decreased at rates ranging from approximately one-fifth of a day per year in Lake Huron to almost a full day per year in Lakes Ontario and Superior (see Figure 2). Overall, this means the Great Lakes are frozen for eight to 46 fewer days now than they were in the early 1970s. The decreases in Lakes Ontario and Superior are statistically significant.

There is also a detailed map below (that we don’t know how to recreate)that shows that many areas of the Great Lakes have experienced significant decreases in ice cover duration, but other parts of the lakes have not changed significantly. Duration has decreased the most in areas near the shore.

Change in Duration of Ice Cover

9.6 Summary

9.6.1 Some Python concepts that we learned in this Chapter

We can collect the Python programming concepts towards the end

  1. Concatenating- Creating a new string or object that merges two strings togehter. In simple terms, we can create a new string variable by combining two strings together usually using the + symbol. An example may be “Climate” + “Change” giving us “Climate Change”.

9.6.2 Some Climate Insights that we learned in this Chapter

  1. Both the extent and thickness of Arctic sea ice has declined rapidly over the last several decades.
  2. The four U.S. reference glaciers have shown an overall decline in mass balance since the 1950s and 1960s and an accelerated rate of decline in recent years
  3. Ice duration on great Lakes and number of frozen days are declining.

9.7 Exercises / Check your knowledge of Python

In the code that we used in this chapter, there are some learning opportunities to think through. For example,

  1. why is the index at 13?
  2. what does for loop do?
  3. what is the value of {month}
  4. what is concatenate doing?
  5. why do we need axis = 0

9.8 Data Sources

Some of the data sources that have used are:

  1. https://nsidc.org/data/g02135/versions/3
  2. https://nsidc.org/data/explore-data
  3. https://www.epa.gov/climate-indicators/climate-change-indicators-arctic-sea-ice
  4. https://www.epa.gov/climate-indicators/climate-change-indicators-antarctic-sea-ice
  5. https://climate.rutgers.edu/snowcover/
8  Temperature and Heat
10  Oceans

Sahasra Chava

 
  • Edit this page
  • Report an issue
  • View source

NextGen360.