9 Arctic and Antarctic Ice

_DALL·E 3 Prompt: Images depicting the Arctic landscape with ice melting, illustrating the effects of climate change

(the Dalle prompt that we used to create the above image!) DALL·E 2024-02-17 23.49.12 - Create a moving and impactful scene that starkly illustrates the devastating effects of climate change on the Arctic landscape, focusing on stranded a polar bear

In this chapter, let us try to understand how Ice over the Arctic Circle is changing over time.

But first, why is Arctic Ice important and why should we care about it?

According to the EPA,

“The extent of area covered by Arctic sea ice is an important indicator of changes in global climate because warmer air and water temperatures are reducing the amount of sea ice present. Because sea ice is light-colored, it reflects more sunlight (solar energy) back to space than liquid water, thereby playing an important role in maintaining the Earth’s energy balance and helping to keep polar regions cool.”

Learning Objectives: Climate

The learning objectives are

Understand how the extent of arctic ice is changing over time
Is the extent of Arctic Ice increasing over the years?
Is the extent of Arctic Ice increasing or decreasing in the winter months?
What months are there usually an increase in Arctic Ice and what months are there a decrease in Arctic Ice?

Climate Terms of Interest

Arctic Ice
Sea ice
Polar regions

Let us look at some images first, the extent of Ice at Arctic and Antarctic regions

While it is not very clear from the pictures, it seems that sea ice extent in Dec 2022 (11.9 million square km) is 1.8 million square km less than Dec 1979 (13.7 million square km).

Similarly, the sea ice extent in December 2022 (8.8 million square km) is 1.6 million sq km lower than that in December 1978 (10.4 million square km).

We chose December 1978 and December 2022 as they are the earliest and latest data available as of our writing this chapter.

It seems that the total reduction in the extent of sea ice from 1978-2022 is 3.4 million square km. Is it small or large?

One way to put it in context that we understand is the size of landmass of the United States is 9.15 million square kilometers (source: World Bank). Put it another way,
the loss of sea ice accounts for more than a third of the U.S. landmass or three Texas and three Californias put together!
The loss is even bigger than India, the country with the seventh largest land area at 2.97 million square km!

You can find more fun facts on Glaciers

Let us a dig a bit deeper in to the actual data for the extent of sea ice in the Arctic and Antarctic regions.

9.0.1 Extent of Sea Ice Data

We can get the data on millions of square kilometers of ice in the Arctic circle for a given month and year.

The north directory contains all of the Arctic data and images and the south directory contains all of the Antarctic data

The data is from NOAA.

This is how the directories looks like

From looking at the list of files in the directory, we can observe

the data is updated as of December 2023.
there is a file for each month.

However, the format of all the files are the same. It is always a good idea to open the file in the browser or excel and check out how the data and variables are displayed. It is also a good idea to read the notes about the data as the notes can contain valuable information about the variables and how they are constructed, if there are any issues with the variables, sources of the data etc.

Let us write the code to get all the datasets programmatically i.e. instead of downloading the data and saving them on our own computer (which may not be the latest available data), we can run the code and get the latest version of the data available as of the date we run the code.

import plotly.express as px 
import pandas as pd
import os
import io
import zipfile

# we need to take care of unverified https error. either install certificates command on mac or
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

# let us get the data for all the years since 1990
# The ASCII data files tabulate extent and area (in millions of square kilometers) by year for a given month

# create an empty data frame
df_north = pd.DataFrame()

# Months that are between 1 and 12, we put 13 because it always ends at one less than the end number 
for month in range(1,13): 

# If the month is between 1 and 9 add 0 to the number. For example, 1 would now be 01. 
    if month <10: 
        month = '0' + str(month) 
    
    url = f"https://masie_web.apps.nsidc.org/pub/DATASETS/NOAA/G02135/north/monthly/data/N_{month}_extent_v3.0.csv"
    df_north_month = pd.read_csv(url)   

    # let us concatenate the monthly files
 
    # we are adding all the files together because there are seperate files for each month.

    df_north = pd.concat([df_north,df_north_month], axis=0)  

#rename variables
df_north.columns = df_north.columns.str.replace(' ', '')
df_north.rename({'mo': 'month'}, axis = 1, inplace=True)

If you get an error with the code above, you maybe working on Windows or a different OS than the Mac that we were on.

You can try slight modifications to the code as listed below

import plotly.express as px 
import pandas as pd
import os
import io
import zipfile
import urllib ## NEW LINE 1 ##

# we need to take care of unverified https error. either install certificates command on mac or
import ssl
# ssl._create_default_https_context = ssl._create_unverified_context  ## COMMENTED OUT 1 ##
context = ssl._create_unverified_context()  ## NEW LINE 2 ##

# let us get the data for all the years since 1990
# The ASCII data files tabulate extent and area (in millions of square kilometers) by year for a given month

# create an empty data frame
df_north = pd.DataFrame()

# Months that are between 1 and 12, we put 13 because it always ends at one less than the end number 
for month in range(1,13): 

# If the month is between 1 and 9 add 0 to the number. For example, 1 would now be 01. 
    if month <10: 
        month = '0' + str(month) 
    
    url = f"https://masie_web.apps.nsidc.org/pub/DATASETS/NOAA/G02135/north/monthly/data/N_{month}_extent_v3.0.csv"
    #df_north_month = pd.read_csv(url)   ## COMMENTED OUT 2 ##
    temp_data = urllib.request.urlopen(url, context=context) ## NEW LINE 3 ##
    df_north_month = pd.read_table(temp_data, sep=',')  ## NEW LINE 4 ##

    # let us concatenate the monthly files
 
    # we are adding all the files together because there are seperate files for each month.

    df_north = pd.concat([df_north,df_north_month], axis=0)  

#rename variables
df_north.columns = df_north.columns.str.replace(' ', '')
df_north.rename({'mo': 'month'}, axis = 1, inplace=True)

Let us unpack the code step-by-step.

First, in lines 1-5, we imported the various packages that we need including some that we don’t regulalry use such as as os, io and zipfile
We may not need lines 8-9, but sometimes we get a SSL certficate error on a mac from the website. So, we can import ssl and execute the command to address it
On line 15, we created a empty data frame to hold the data
On line 18, we start a for loop to get all the months in the year from 1 to 12
by looking at the directory structure, wenote that we need to add a ‘0’ to the months which are single digits. So, lines 21 and 22 pad the months from 1-9 with a ‘0’ so that we have two-digit months
We define the base URL on line 24. Note that we included {month} variable so that everytime the for loop changes the value of the month, the URL changes, corresponding to the files in the directory
On line 25, we read the CSV file into a dataframe df_north_month to hold the data for that month
Note that we are still in hte loop. We need to add this monthly file to the empty dataset that we created. That is when the value of the month is 1, line 35 adds the data for January to the empty dataset so that df_north dataset now has January data
When the month takes 2, the February data is stacked below the January data (that is now in the df_north dataset) and so forth
On lines 28 and 29, we rename the variables. On line 28, we remove the empty spaces in the columns of the dataset. On line 29, we rename the variable “mo” to “month” so that it is more descriptive

Before we start graphing and answering our questions, let us check if the data is imported and concateneated correctly. It is always a good practice to check the dataset before we start analyzing or graphing it.

nrows, ncols = df_north.shape
print(f'the data set has {nrows} observations and {ncols} of columns or variables')

the data set has 551 observations and 6 of columns or variables

Let us check the variable names and the type of the variables (numbers, strings etc.)

df_north.dtypes

year           int64
month          int64
data-type     object
region        object
extent       float64
area         float64
dtype: object

What are the variables in this dataset? It is always a good idea to check the documentation for any dataset. The documentation for this NOAA dataset is available at Arctic Ice Documentation.

Page 16 of the user documentation mentions, the variables are 1. Year: 4-digit year 2. mo (note, we renamed to month): 1- or 2-digit month 3. data_type: indicates what the input data set. It is either Goddard: Sea Ice Concentrations from Nimbus-7 SMMR and DMSP SSM/I-SSMIS Passive Microwave Data or NRTSI-G: Near-Real-Time DMSP SSMIS Daily Polar Gridded Sea Ice Concentrations 4. region: Hemisphere that this data covers. 1. N: Northern 2. S: Southern 5. extent: Sea ice extent in millions of square km 6. area Sea ice area in millions of square km

As we can see, we have year and month, both are integers as they should be. Other relevant variables are region=“N”, the extent and area of the ice in millions of square miles.

We can describe the data to get an overview of the variables. Note that htis will only give the descriptive statistics of numerical variables in the dataset.

df_north.describe()

	year	month	extent	area
count	551.000000	551.000000	551.000000	551.000000
mean	2001.292196	6.493648	-24.996261	-45.239365
std	13.270059	3.455096	602.558501	737.150626
min	1978.000000	1.000000	-9999.000000	-9999.000000
25%	1990.000000	3.500000	8.465000	6.135000
50%	2001.000000	6.000000	12.020000	9.910000
75%	2013.000000	9.000000	14.245000	12.285000
max	2024.000000	12.000000	16.340000	13.900000

The data seems to start in 1978 and continues to 2024 (as can be seen from the min and max for the variable year)

Notice that the minimum for extent is -9999. We can infer that -9999 are cases where there were no observations made. We need to replace this with a missing value. Otherwise, it could mess up our averages and also the graphs.

import numpy as np
df_north = df_north.replace(-9999, np.nan)
df_north.describe()

	year	month	extent	area
count	551.000000	551.000000	549.000000	548.000000
mean	2001.292196	6.493648	11.338907	9.252026
std	13.270059	3.455096	3.279964	3.267431
min	1978.000000	1.000000	3.570000	2.410000
25%	1990.000000	3.500000	8.480000	6.182500
50%	2001.000000	6.000000	12.030000	9.950000
75%	2013.000000	9.000000	14.270000	12.292500
max	2024.000000	12.000000	16.340000	13.900000

Notice that it looks like the variable area has two extra spaces. If we were to graph it and use the variable area we would always have to include the two spaces. To make it easier let us remove the two spaces. If there are also other variables with spaces lets go ahead and remove them.

df_north.columns = df_north.columns.str.replace(' ', '')

To start graphing let us look at the months that have the most and least extent of ice over time.

We can’t use the variable area, because the data for area is not constant whereas the extent is.

# We put the number 5 because we want to see the top 5 
df_north.nlargest(5, 'extent')

	year	month	data-type	region	extent	area
0	1979	3	Goddard	N	16.34	13.21
0	1979	2	Goddard	N	16.18	13.18
4	1983	3	Goddard	N	16.09	12.93
8	1987	2	Goddard	N	16.05	13.02
1	1980	3	Goddard	N	16.04	12.99

We can immediately notice that, though the data is from 1978-2023, the meximum extent of the ice coverage seems to be in 1979. The latest year which ranks in the top 5 spot for extent of ice is 1987.

We can also see that the month that was the largest extent of ice was 3 (March) or 2 (February). We expect that there would be months when ice accumulates and months when ice decreases in a given year. We are more concerned with how the extent of ice is varying over years.

We can graph the extent of the ice for February and March over the years later. Now let’s check for the smallest:

df_north.nsmallest(5, 'extent')
# We put the number 5 because we want to see the top 5

	year	month	data-type	region	extent	area
33	2012	9	Goddard	N	3.57	2.41
41	2020	9	Goddard	N	4.00	2.83
28	2007	9	Goddard	N	4.27	2.82
40	2019	9	Goddard	N	4.36	3.17
44	2023	9	NRTSI-G	N	4.37	2.79

The smallest month is month 9 in all the lowest five spots.

df_north.nsmallest(10, 'extent')

	year	month	data-type	region	extent	area
33	2012	9	Goddard	N	3.57	2.41
41	2020	9	Goddard	N	4.00	2.83
28	2007	9	Goddard	N	4.27	2.82
40	2019	9	Goddard	N	4.36	3.17
44	2023	9	NRTSI-G	N	4.37	2.79
45	2024	9	NRTSI-G	N	4.38	2.89
37	2016	9	Goddard	N	4.53	2.91
32	2011	9	Goddard	N	4.56	3.21
36	2015	9	Goddard	N	4.62	3.42
29	2008	9	Goddard	N	4.69	3.26

It seems except in one year, 2012, all the other lowest 11 ice extent in the North Arctic is in September. We can plot this over time also.

Can we say that the extent of ice in the north arctic is lower in significantly smaller than it was in the 80’s? It may be but not from this table. We need to compare apples to apples and compare for the same month over time. Alternately, we can take averages over the years.

Since we now have the information that we need, let’s start graphing.

fig = px.scatter(df_north.loc[df_north.month==3], x='year', y='extent') 
fig.show()
fig.write_image("../../images/ArcticIce/fig1.png")

We saved the image that we generated to a file using fig.write_image. We can display the saved image now.

Why did we use loc? We used loc because we are specifying that we only want observations where the month has a value of 3. We only want to look at the extent from month 3 over the years so that we can compare the extent of the ice better.

We can see from the scatter plot that the extent of the Arctic Ice, from the month March, is decreasing over the years. The graph starts from the year 1979 and in 1979 the extent of the Arctic Ice was 16.34 million square miles. The graph ends at the year 2023 and the extent is 14.44 million square miles. It seems there is almost 2 million square miles of ice extent lost over the years.

What if we do for the entire year?

fig = px.line(df_north, x='year', y='extent')
fig.show()

We can’t comprehend anything from this graph because the lines are scattered and we can’t understand where it is decreasing and increasing. Let’s specify a month.

fig = px.line(df_north.loc[df_north.month==3], x='year', y='extent', title="Extent of arctic Ice in March")
fig.show()

Now let’s start plotting Februrary.

fig = px.line(df_north.loc[df_north.month==2], x='year', y='extent', title="Extent of arctic Ice in Feburary")
fig.show()

We can see that the extent of arctic Ice in February has more steep decreases than the extent of arctic Ice in March. Overall, February and March were the months with the most extent of arctic Ice and it has been slowly decreasing over the years.

Let’s start graphing the months with the smallest extent. We only have to graph for the month September.

fig = px.line(df_north.loc[df_north.month==9], x='year', y='extent', title="Extent of arctic Ice in September")
fig.show()

September is the month with the smallest extent of arctic Ice. Sadly, it is still significantly decreasing over the years.

Let us check the average for each year and plot it over the years.

df_north.groupby(['year'])['extent'].mean().head(5)

year
1978    12.660000
1979    12.350000
1980    12.348333
1981    12.146667
1982    12.467500
Name: extent, dtype: float64

We can see the averages for each year. We see a general decline but appears that in 2023 there is highest amount of the ice extent? Is this correct? What are we missing? Notice that we are still in July and the data stops in June. The highest months for the ice extent are Feb and March and the lowest are in the second half. We don’t have data yet. So it is better to keep only years with full data for graphing purposes.

df_year = df_north.groupby('year').agg({'extent': ['mean']}).reset_index()

# we need to remove the multiindex after the group by.
# see https://stackoverflow.com/questions/44023770/pandas-getting-rid-of-the-multiindex
df_year.columns = df_year.columns.map(''.join)

we can see that the mean values are stored in a column “extentmean” that we can use

fig = px.line(df_year, x='year', y='extentmean', title="Annual Mean Extent of arctic Ice (N)")
fig.show()

From the graph, the sharp uptick in 2023 reminds us that we only have partial year data for 2023. We need to remove this for the annual average graph:

fig = px.line(df_year[df_year['year']!=2024], x='year', y='extentmean', title="Annual Mean Extent of arctic Ice (N) 1979-2023")
fig.show()

We can clearly see a decrease in the extent of arctic ice over the last thirty years. The annual mean for 1978 is 12.66 million square miles (btw, we may need to drop 1978 also. We also need to check whether we have full data for 1978.) and the average for 2022 is 10.69 million square miles. A decrease of almost 2 million square miles or $(10.69 - 12.66) / 12.66$ , a 15.56% decrease. While this is a cause for concern and may be due to climate change, we need to be cautious in drawing conclusions as the data is only for three decades. However, there are measurements over longer time period, based on Antarctic ice core, that suggests a cause for alarm.

We can also use similar code to compute the min and maximum over the years and plot them together.

df_year = df_north.groupby('year').agg({'extent': ['mean', 'min', 'max']}).reset_index()
df_year.columns = df_year.columns.map(''.join)

fig = px.line(df_year.loc[df_year.year!=2024], x='year', y=['extentmean', 'extentmin', 'extentmax'], title="Extent of arctic Ice in September")
fig.show()

Notice that we use .loc to filter the data to get the same output as before and we used three y variables. More importantly, this graph can illustrate a couple of points.

note the extentmean seems much more flatter than in the graph before. However, if you hover over the graph, you can see the values for each year for the average are the same as before i.e. 12.66 million sq. miles for 1978 and 10.69 sq miles for 2022. How come the graphs look so different then? If you notice more carefully, you will see the y-axis started much lower here, around 4 mn aq miles as opposed to 10.5 mn sq miles in the previous graph, as we included the minimum per year also in the same graph. This illustrates that one needs to be careful with how the graphs are presented as one can easily change large differences visually by changing the scale of the graphs.
On the other hand, we need to be carful about interpreting flat minimum, mean and maximum as they may happen at different times of the year.

9.1 South Data:

So far we have analyzed the data for the North pole / Arctic Ice.

Let’s analyze the south pole or Antarctic ice data. The code should be very simlar except we might have to make a few changes.

import plotly.express as px 
import pandas as pd
import os
import io
import zipfile

# we need to take care of unverified https error. either install certificates command on mac or
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

# let us get the data for all the years since 1990
# The ASCII data files tabulate extent and area (in millions of square kilometers) by year for a given month

# create an empty data frame
df_south = pd.DataFrame()

# Months that are between 1 and 12, we put 13 because it always ends at one less than the end number 
for month in range(1,13): 

# If the month is between 1 and 9 add 0 to the number. For example, 1 would now be 01. 
    if month <10: 
        month = '0' + str(month) 
    
    url = f"https://masie_web.apps.nsidc.org/pub/DATASETS/NOAA/G02135/south/monthly/data/S_{month}_extent_v3.0.csv"
    df_south_month = pd.read_csv(url)   

    # let us concatenate the monthly files
 
    # we are adding all the files together because there are seperate files for each month.

    df_south = pd.concat([df_south,df_south_month], axis=0)  

#rename variables
df_south.columns = df_south.columns.str.replace(' ', '')
df_south.rename({'mo': 'month'}, axis = 1, inplace=True)

Let us use copilot to give us the documentation for the code that we wrote above

The provided code snippet is written in Python and is used to fetch, process, and concatenate monthly sea ice extent data for the Southern Hemisphere from the National Snow and Ice Data Center (NSIDC). Let’s break down the code step by step:
- First, the necessary libraries are imported. plotly.express is used for data visualization, pandas for data manipulation, os and io for handling file operations, and zipfile for working with zip files.
- Additionally, the ssl library is imported to handle HTTPS context settings, which is necessary to avoid unverified HTTPS errors when fetching data from the web.
- The line ssl._create_default_https_context = ssl._create_unverified_context ensures that the HTTPS context is set to an unverified mode, allowing the script to bypass SSL certificate verification.
An empty DataFrame named df_south is created to store the concatenated data for all months.
The script then iterates over a range of months from 1 to 12 using a for loop. For each month, if the month number is less than 10, a leading zero is added to ensure the month is formatted as a two-digit string (e.g., ‘01’ for January).
Within the loop, a URL is constructed for each month’s data file using an f-string. The URL points to the CSV file containing the sea ice extent data for the specified month.
The pd.read_csv(url) function is used to read the CSV file from the URL into a DataFrame named df_south_month.
The script then concatenates the monthly DataFrame df_south_month with the main DataFrame df_south using the pd.concat function.
This process is repeated for each month, resulting in a single DataFrame that contains data for all months from January to December.
After all the monthly data has been concatenated, the script cleans up the DataFrame by removing any spaces from the column names using df_south.columns.str.replace(’ ‘,’‘). Additionally, the column named ’mo’ is renamed to ‘month’ using the df_south.rename function with the axis=1 parameter to specify that the renaming should be applied to the columns.

Overall, this code efficiently fetches and processes monthly sea ice extent data from the NSIDC, concatenates it into a single DataFrame, and cleans up the column names for further analysis or visualization.

Wow! The documentation is correct and it seems to be better than the documentation that we wrote for the North part of the data. We first wrote the North data analysis a couple of years ago when we didn’t have access to ChatGPT or Github CoPilot.

In fact, while revising the book, we realized that we could write some of the code in this book more efficiently. But we didn’t change it as that is how we learned and we wanted to showcase the process of our learning. We started using Copilot for documentation though as it gives more detailed documentation that we normally write (and better language too, though a bit more verbose than us)!

If the above code doesn’t work for you because you are using a different OS, try the following code.

import plotly.express as px 
import pandas as pd
import os
import io
import zipfile
import urllib ## NEW LINE 1 ##

# we need to take care of unverified https error. either install certificates command on mac or
import ssl
# ssl._create_default_https_context = ssl._create_unverified_context  ## COMMENTED OUT 1 ##
context = ssl._create_unverified_context()  ## NEW LINE 2 ##

# let us get the data for all the years since 1990
# The ASCII data files tabulate extent and area (in millions of square kilometers) by year for a given month

# create an empty data frame
df_south = pd.DataFrame()

# Months that are between 1 and 12, we put 13 because it always ends at one less than the end number 
for month in range(1,13): 

# If the month is between 1 and 9 add 0 to the number. For example, 1 would now be 01. 
    if month <10: 
        month = '0' + str(month) 
    
    url = f"https://masie_web.apps.nsidc.org/pub/DATASETS/NOAA/G02135/south/monthly/data/S_{month}_extent_v3.0.csv"
    #df_south_month = pd.read_csv(url)   ## COMMENTED OUT 2 ##
    temp_data = urllib.request.urlopen(url, context=context) ## NEW LINE 3 ##
    df_south_month = pd.read_table(temp_data, sep=',')  ## NEW LINE 4 ##
    # let us concatenate the monthly files
 
    # we are adding all the files together because there are seperate files for each month.

    df_south = pd.concat([df_south,df_south_month], axis=0)  

#rename variables
df_south.columns = df_south.columns.str.replace(' ', '')
df_south.rename({'mo': 'month'}, axis = 1, inplace=True)

Let us look at the data more closely:

df_south.dtypes

df_south.describe()

	year	month	extent	area
count	551.000000	551.000000	551.000000	551.00000
mean	2001.292196	6.493648	-24.825118	-27.59755
std	13.270059	3.455096	602.585746	602.40990
min	1978.000000	1.000000	-9999.000000	-9999.00000
25%	1990.000000	3.500000	5.745000	4.01500
50%	2001.000000	6.000000	11.850000	9.03000
75%	2013.000000	9.000000	16.790000	13.31000
max	2024.000000	12.000000	19.760000	15.75000

Notice that we have the same problem that was in the North Data Set here too. The minimum for extent is -9999. So, once again let’s replace it.

import numpy as np
df_south = df_south.replace(-9999, np.nan)
df_south.describe()

	year	month	extent	area
count	551.000000	551.000000	549.000000	549.000000
mean	2001.292196	6.493648	11.510674	8.728142
std	13.270059	3.455096	5.586368	4.592586
min	1978.000000	1.000000	1.910000	1.150000
25%	1990.000000	3.500000	5.840000	4.070000
50%	2001.000000	6.000000	11.930000	9.100000
75%	2013.000000	9.000000	16.800000	13.310000
max	2024.000000	12.000000	19.760000	15.750000

As before, we replace all occurrences of the value -9999 in the DataFrame df_south with np.nan, which represents a missing value (NaN stands for “Not a Number”).

This is a common data cleaning step, as -9999 is often used as a placeholder for missing or invalid data in datasets.

By converting these placeholders to np.nan, subsequent data analysis and processing can handle these missing values appropriately.

Learning from the North data, let us create a new variable that can give us monthly data

df_south['yearmonth'] = df_south['year']*100 + df_south['month']

The new yearmonth column is generated by combining the values from the year and month columns. Specifically, it multiplies the year value by 100 and then adds the month value. This operation effectively concatenates the year and month into a single integer. For example, if the year is 2023 and the month is 10, the resulting yearmonth value would be 202310.

This approach is useful for creating a unique identifier for each year-month combination, which can be particularly helpful for time series analysis or when you need to group data by specific periods. By converting the year and month into a single integer, it simplifies sorting and filtering operations based on these combined time periods.

fig = px.line(df_south, x="year", y="area")
fig.show()

This doesn’t look that good!

fig = px.bar(df_south, x="yearmonth", y="area")
fig.show()

The variable area doesn’t have two extra spaces as it was in the North dataset. This again shows the importance of looking at the data and cleaning the data. In real world, 80% of the time is spent on cleaning the data before using it :(

Now, we basically have to do the same code as we did in the North Data Set in order to compare.

df_south.nlargest(5, 'extent')

	year	month	data-type	region	extent	area	yearmonth
35	2014	9	Goddard	S	19.76	15.75	201409
34	2013	9	Goddard	S	19.39	15.28	201309
33	2012	9	Goddard	S	19.21	15.41	201209
27	2006	9	Goddard	S	19.09	15.18	200609
34	2013	10	Goddard	S	19.02	15.13	201310

df_south.nsmallest(5, 'extent')

	year	month	data-type	region	extent	area	yearmonth
44	2023	2	NRTSI-G	S	1.91	1.15	202302
45	2024	2	NRTSI-G	S	2.14	1.51	202402
43	2022	2	Goddard	S	2.21	1.38	202202
38	2017	2	Goddard	S	2.29	1.64	201702
39	2018	2	Goddard	S	2.33	1.63	201802

Oh, this is interesting. The month with the smallest extent in the North Data Set is September and the month with the largest extent in the South Data Set is September. The months with the smallest extent in the North Data set are March and February. The month with the smallest extent in the South Data Set is February.

fig = px.line(df_south.loc[df_south.month==9], x='year', y='extent', title="Extent of arctic Ice in September(South)")
fig.show()

fig = px.line(df_south.loc[df_south.month==2], x='year', y='extent', title="Extent of arctic Ice in Feburary(South)")
fig.show()

For the first graph, we can see that recently the extent of arctic Ice is decreasing by a significant amount. In the second graph, there are a numerous sharp increases and decreases and as of now it is still decreasing.

9.1.1 Annual Data Analysis

We can use the annual data from NOAA as National Snow and Ice Data Center computes it more accurately. Specifically, the documentation mentions that “…the Excel workbook contains four spreadsheets with monthly average sea ice extent and area, plus weighted average monthly extent and area for each year, from 1979 to the most recent month, for both the Northern Hemisphere (NH) and the Southern Hemisphere (SH), in millions of square kilometers…”

For more details see

Fetterer, F., K. Knowles, W. N. Meier, M. Savoie, and A. K. Windnagel. (2017). Sea Ice Index, Version 3 [Data Set]. Boulder, Colorado USA. National Snow and Ice Data Center. https://doi.org/10.7265/N5K072F8. Date Accessed 12-25-2023.
the Sea Ice Index product landing page (https://nsidc.org/data/g02135)

This will also give us some experience working with reading Excel sheets into a pandas DataFrame.

import pandas as pd

# we need to take care of unverified https error. either install certificates command on mac or
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

seaice_year = pd.ExcelFile("https://masie_web.apps.nsidc.org/pub/DATASETS/NOAA/G02135/seaice_analysis/Sea_Ice_Index_Monthly_Data_by_Year_G02135_v3.0.xlsx")

If the above code doesn’t work due to differences in OS, try instead:

import pandas as pd
import urllib ## NEW LINE 1 ##

# we need to take care of unverified https error. either install certificates command on mac or
import ssl
# ssl._create_default_https_context = ssl._create_unverified_context  ## COMMENTED OUT 1 ##
context = ssl._create_unverified_context()  ## NEW LINE 2 ##

# seaice_year = pd.ExcelFile("https://masie_web.apps.nsidc.org/pub/DATASETS/NOAA/G02135/seaice_analysis/Sea_Ice_Index_Monthly_Data_by_Year_G02135_v3.0.xlsx") ## COMMENTED OUT 2 ##
temp_data = urllib.request.urlopen("https://masie_web.apps.nsidc.org/pub/DATASETS/NOAA/G02135/seaice_analysis/Sea_Ice_Index_Monthly_Data_by_Year_G02135_v3.0.xlsx"
                        , context=context).read() ## NEW LINE 3 ##
seaice_year = pd.ExcelFile(temp_data) ## NEW LINE 4##

It is always a good idea to check the sheets in the excel files

print(seaice_year.sheet_names)

['NH-Extent', 'NH-Area', 'SH-Extent', 'SH-Area', 'Documentation']

It looks like there are five sheets within this excel spreadsheet. From the names, we can tell that they are for northern hemisphere sea ice extent, northern hemisphere sea ice area, south sea ice extent and south sea ice area along with a documentation sheet.

If we open the sheets, we can see that first column is year, then there is separate column for each month, from January to December, and then an annual column.

We can list the columns directly

seaice_nh_df = pd.read_excel(seaice_year, sheet_name='NH-Extent')
print(seaice_nh_df.columns.tolist())

['Unnamed: 0', 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'Unnamed: 13', 'Annual']

As we can see from the spreadsheet, the ‘Unnamed:0’ column is year. Our interest is in year and Annual columns

seaice_nh_df = seaice_nh_df.iloc[:,[0, 14]]
seaice_nh_df.rename(columns={'Unnamed: 0' : 'year', 'Annual': 'Annual Area'}, inplace=True)

We can do the same for Southern hemisphere

seaice_sh_df = pd.read_excel(seaice_year, sheet_name='SH-Extent')
seaice_sh_df = seaice_sh_df.iloc[:,[0, 14]]
seaice_sh_df.rename(columns={'Unnamed: 0' : 'year', 'Annual': 'Annual Area'}, inplace=True)

We can follow the same sea ice extent analysis as before

import plotly.express as px
fig = px.line(seaice_sh_df, x= 'year', y= 'Annual Area', title="Extent of Antarctic Ice: Annual")
fig.show()

import plotly.express as px
fig = px.line(seaice_nh_df, x= 'year', y= 'Annual Area', title="Extent of Arctic Ice: Annual")
fig.show()

By the way, do you see any sudden increases or drops towards the end? We always need to be careful to see if the data is available for the entire year. Otherwise, as some months can have a lower or higher value, the graphs can be misleading!

Let us drop 2024 as we don’t have full data for the entire year.

import plotly.express as px
fig = px.line(seaice_sh_df.loc[seaice_sh_df.year!=2024], x= 'year', y= 'Annual Area', title="Extent of Antarctic Ice: Annual")
fig.show()

The pattern remains the same as before for the Antarctic Ice.

How about Arctic Ice?

import plotly.express as px
fig = px.line(seaice_nh_df.loc[seaice_nh_df.year!=2024], x= 'year', y= 'Annual Area', title="Extent of Arctic Ice: Annual")
fig.show()

We see that the last year, the upward trend is removed as we consider only full years of data (that is stop in 2023)

As before, with a refined annual sea ice extent, we still see a significant decrease in the extent of sea ice, both in the southern hemisphere and northern hemisphere.

9.2 Ice and Snow Indicators from EPA

Recently, EPA has started putting together a number of climate indicators, including on Snow and Ice.

Let us make use of the data underlyng the EPA charts

9.2.1 Arctic Ice

We can create the graph ourlselves using the data from the EPA website

```{python}
import pandas as pd

url = "https://www.epa.gov/system/files/other-files/2023-11/arctic-sea-ice_fig-3.csv"

df = pd.read_csv(url)
```

Unfortunately, we get a HTTP Error 403: Forbidden error. It seems that the server doesn’t want to accept our request.

We have two options. We can either download the data into a local CSV file and then read it. As we would like to update the data whenever it gets updated on EPA, we may want to find a programmatic solution.

We turn to our best friend Google and Stack Overflow. Luckily some one has addressed this issue before. See Stack overflow.

We can adapt their code.

from io import StringIO
import pandas as pd

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/118.0",
}
url = "https://www.epa.gov/system/files/other-files/2023-11/arctic-sea-ice_fig-3.csv"

f = StringIO(requests.get(url, headers=headers).text)

df_arcticice = pd.read_csv(f,  skiprows=6)
df_arcticice.head()

	Year	Start date	Duration	End date
0	1979	169.275	88.412	257.687
1	1980	166.438	91.913	258.351
2	1981	164.909	91.975	256.884
3	1982	168.775	87.800	256.575
4	1983	169.446	87.883	257.329

It seems, we need to 1. use requests to get the CSV file 2. use StringIO to parse the file 3. we skip the first 6 rows as it has text details about the columns

fig = px.bar(df_arcticice, x = 'Year', y = 'Duration', color = 'Year')
fig.show()

fig = px.line(df_arcticice, x ='Year', y = 'Duration', 
    title = 'Arctic Sea Ice Duration',
)
fig.update_layout(title={'x':0.75})
fig.show()

by the way, we centered the tile using fig.update_layout(title={‘x’:0.5}). 0.5 centers. how about 0.75?

Let us look at the largest values of duration of the arctic ice melting.

df_arcticice.nlargest(5, 'Duration')

	Year	Start date	Duration	End date
28	2007	157.275	131.019	288.294
37	2016	159.160	128.902	288.063
33	2012	157.354	128.453	285.807
40	2019	158.310	127.937	286.247
41	2020	162.057	126.871	288.928

Duration stands for how long arctic Ice has been melting. arctic Ice can melt for many months. As we can see from the code above, the biggest duration of arctic Ice melting was in 2007. It has decreased over the past years but not by a great amount. We can also see this in the graph. Around the 80s and 90’s it was a bit stable going up and down but around the 2000’s the duration started going up.

Using this graph and from what we know, we can make a connection to the other data set. If the duration increases over the years then the extent or area of the arctic ice starts decreasing.

9.2.2 Antarctic Ice

EPA shows a nice figure on Antarctic Ice similar to what we have shown from NSIDC data before.

Let us chart it ourselves using the excel data provided by EPA. It is not that difficult as it is just reading the data and plotting it.

from io import StringIO
import pandas as pd

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/118.0",
}
url = "https://www.epa.gov/system/files/other-files/2022-07/antarctic-sea-ice-fig-1.csv"

f = StringIO(requests.get(url, headers=headers).text)

df_antice = pd.read_csv(f,  skiprows=6)
df_antice.head()

	Year	February	September
0	1979	1.212361	7.027059
1	1980	1.088808	7.266443
2	1981	1.108113	7.181500
3	1982	1.208500	7.084975
4	1983	1.185334	7.177639

fig = px.line(df_antice, x='Year', y=['February', 'September'], title = "February and September Monthly Average Antarctic Sea Ice Extent, 1979–2021").update_layout(yaxis_title="Sea Ice Extent (mn sq miles)", legend={'title_text':''})

fig.show()

We don’t seem to see much change in the sea ice extent in the peak month (September) and lowest month (February). One thing to note from our graphs using the data directly from NSIDC is that we originally plotted the data in millions of square km and the EPA data are in millions of square miles.

The annual averages and the length of the melting season informs us more than the data in any single month.

We can see that in recent years, both the arctic and antarctic sea extent has been decreasing when measured over the year.

9.2.3 Ice Sheets

As noted by NSIDC,

Earth has just two ice sheets, one covers most of Greenland, the largest island in the world, and the other spans across the Antarctic continent.
Together, the Antarctic and Greenland Ice Sheets contain more than 99 percent of the freshwater ice on Earth. The amount of fresh water stored in both the Antarctic and Greenland Ice Sheets total over 68 percent of the fresh water on Earth.
The Antarctic Ice Sheet extends almost 14 million square kilometers (5.4 million square miles), roughly the area of the contiguous United States and Mexico combined.
The Greenland Ice Sheet extends about 1.7 million square kilometers (656,000 square miles), covering 80 percent of the world’s largest island, and is equivalent to about three times the size of Texas.
Both ice sheets measure on average about 3.2 kilometers (2 miles) thick.

We next move on to another indicator, Ice Sheets.

We are not going to plot it as it is a repetition of the code that we have done before. If we would like, we need to 1. read the CSV data, making sure that we don’t get a 403 error 2. plot the data and adjust the legends and labels as necessary

from io import StringIO
import pandas as pd

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/118.0",
}
url = "https://www.epa.gov/sites/default/files/2021-04/ice_sheets_fig-1.csv"

f = StringIO(requests.get(url, headers=headers).text)

df_icesheets = pd.read_csv(f, skiprows = 6)

df_icesheets

	Year	NASA - Antarctica land ice mass	NASA - Greenland land ice mass	IMBIE - Antarctica cumulative ice mass change	IMBIE - Antarctica cumulative ice mass change uncertainty	IMBIE - Greenland cumulative ice mass change	IMBIE - Greenland cumulative ice mass change uncertainty
0	1992.000000	NaN	NaN	418.310350	7.221207	395.365526	16.281278
1	1992.083333	NaN	NaN	425.377021	50.029991	401.965526	23.025204
2	1992.166667	NaN	NaN	433.552016	53.064819	408.565526	28.200000
3	1992.250000	NaN	NaN	424.496837	59.023007	415.165526	32.562555
4	1992.333333	NaN	NaN	423.362721	65.012377	421.765526	36.406043
...	...	...	...	...	...	...	...
512	2020.710000	-2437.63	-4996.08	NaN	NaN	NaN	NaN
513	2020.790000	-2537.64	-4928.75	NaN	NaN	NaN	NaN
514	2020.870000	-2587.65	-4922.26	NaN	NaN	NaN	NaN
515	2020.960000	-2746.00	-4899.46	NaN	NaN	NaN	NaN
516	NaN	NaN	NaN	NaN	NaN	NaN	NaN

517 rows × 7 columns

#listcol = df_icesheets.columns.tolist()
#fig = px.line(df_icesheets, x='Year', y=listcol)

#y = ['IMBIE - Antarctica cumulative ice mass change', 'IMBIE - Antarctica cumulative ice mass change uncertainty', 'IMBIE - Greenland cumulative ice mass change', 'IMBIE - Greenland cumulative ice mass change uncertainty']

fig = px.line(df_icesheets, x = 'Year', y = ['IMBIE - Antarctica cumulative ice mass change',  'IMBIE - Greenland cumulative ice mass change'])

fig.update_layout(
    yaxis_title='Cumulative Mass Change (Billions of Tonnes)',
    title='Cumulative Mass Balance of Antarctica and Greenland, 1992-2020 ',
    hovermode="x",
    legend=dict(x=0.1,y=0.2)
)
fig.update_layout(legend={'title_text':''})
fig.show()

Why doesn’t the graph look similar to the graph that we downloaded from EPA directly.

Real world data is never clean and we need to understand the changes. By browsing through the excel spreadsheet, it appears that around 2002 once the NASA data is available, the frequency of data seems to have changed and there are missing values.

nrows, ncols = df_icesheets.shape
print(f'the data set has {nrows} observations and {ncols} of columns or variables')

the data set has 517 observations and 7 of columns or variables

What did we do? Copilot to the rescue :)

We first extracted the number of rows and columns from the DataFrame df_icesheets. The shape attribute of a DataFrame returns a tuple containing two values: the number of rows and the number of columns. By assigning df_icesheets.shape to nrows, ncols, the code unpacks this tuple into two separate variables: nrows for the number of rows and ncols for the number of columns.

Let us remove the missing values and plot again

ycols = ['IMBIE - Antarctica cumulative ice mass change',  'IMBIE - Greenland cumulative ice mass change']
df = df_icesheets.dropna(subset=ycols)

nrows, ncols = df.shape
print(f'the data set has {nrows} observations and {ncols} of columns or variables')

the data set has 306 observations and 7 of columns or variables

we can see that around 211 observations are dropped.

fig = px.line(df, x='Year', y = ycols, title = "Cumulative Mass Balance of Greenland and Antarctica, 1992–2020").update_layout(yaxis_title="Cumulative change in Mass (bn metric tonnes)", legend={'title_text':''})
fig.update_layout(
    yaxis_title='Cumulative Mass Change (Billions of Tonnes)',
    title='Cumulative Mass Balance of Antarctica and Greenland, 1992-2020 ',
    hovermode="x",
    legend=dict(x=0.1,y=0.2)
)
fig.update_layout(legend={'title_text':''})
fig.show()

By the way, we can go back to Berkeley Earth and highlight a figure that they mention as

A representation of the length of time during which sea ice levels in the Arctic Ocean were at 15% or less, allowing for clear passage through the Arctic Ocean.

Arctic Passable. Source: Berkeley Earth https://berkeleyearth.org/dv/periods-when-the-arctic-ocean-has-been-passable/

9.2.4 Glaciers

As noted by NSIDC

A glacier is an accumulation of ice and snow that slowly flows over land. At higher elevations, more snow typically falls than melts, adding to its mass. Eventually, the surplus of built-up ice begins to flow downhill. At lower elevations, there is usually a higher rate of melt or icebergs break off that removes ice mass.

There are two broad categories of glaciers: alpine glaciers and ice sheets.

Alpine glaciers are frozen rivers of ice, slowly flowing under their own weight down mountainsides and into valleys.

More information on Glaciers is available at https://wgms.ch/global-glacier-state/

Glaciologists assess the state of a glacier by measuring its annual mass balance as the combined results of snow accumulation (mass gain) and melt (mass loss) during a given year. The mass balance reflects the atmospheric conditions over a (hydrological) year and, if measured over a long period and displayed in a cumulative way, trends in mass balance are an indicator of climate change. Seasonal melt contributes to runoff and the annual balance (i.e. the net change of glacier mass) contributes to sea level change.

The data is available at http://wgms.ch/data/faq/mb_ref.csv

from io import StringIO
import pandas as pd
import requests

def url_data_request(url, skiprows):
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/118.0",
    }
    f = StringIO(requests.get(url, headers=headers).text)
    df_url = pd.read_csv(f, skiprows = skiprows)
    return df_url

url = "http://wgms.ch/data/faq/mb_ref.csv"
df = url_data_request(url,0)
df

	Year	MB_REF_count	REF_regionAVG	REF_regionAVG_cum-rel-1970
0	1950	5	-1142	5392
1	1951	4	-345	5047
2	1952	4	-561	4486
3	1953	8	-562	3924
4	1954	7	-420	3504
...	...	...	...	...
69	2019	61	-994	-22144
70	2020	61	-883	-23027
71	2021	61	-677	-23704
72	2022	60	-1090	-24794
73	2023	53	-1229	-26023

74 rows × 4 columns

The dataset has 4 columns. Let us plot the cumulative mass change of glaciers where 1970 is set as zero.

import plotly.express as px
fig = px.line(df, x= 'Year', y = 'REF_regionAVG_cum-rel-1970', 
    labels={"REF_regionAVG_cum-rel-1970": "Cumulative Mass Change in m w.e "},
    title = 'Global Cumulative Change of Glaciers'
    )
fig.show()

Some useful references to get further information are:

https://wgms.ch/global-glacier-state/
WGMS (2021, updated, and earlier reports). Global Glacier Change Bulletin No. 4 (2018-2019). Zemp, M., Nussbaumer, S. U., Gärtner-Roer, I., Bannwart, J., Paul, F., and Hoelzle, M. (eds.), ISC(WDS)/IUGG(IACS)/UNEP/UNESCO/WMO, World Glacier Monitoring Service, Zurich, Switzerland, 278 pp., publication based on database version: doi:10.5904/wgms-fog-2021-05.

They state that

“…In the hydrological years 2020/21 and 2021/22, observed reference glaciers experienced an ice loss of 0.8 m w.e. and of 1.2 m w.e., respectively. With this, eight out of the ten most negative mass balance years were recorded after 2010….”

The observed glaciers were close to steady states during the 1960s followed by increasingly strong ice loss until present.

We can see also the extent of Glaciers from the EPA data

As EPA states > “The line on the upper graph represents the average of all the glaciers that were measured. Negative values indicate a net loss of ice and snow compared with the base year of 1956. For consistency, measurements are in meters of water equivalent, which represent changes in the average thickness of a glacier.”

We can also plot the data from EPA ourlselves

from io import StringIO
import pandas as pd

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/118.0",
}

url = "https://www.epa.gov/system/files/other-files/2024-05/glaciers_fig-1.csv"

f = StringIO(requests.get(url, headers=headers).text)

df_glaciers = pd.read_csv(f, skiprows = 6)

df_glaciers = df_glaciers.iloc[1:, :]

Let’s plot the data:

fig = px.line(df_glaciers, x='Year', y='Mean cumulative mass balance')

fig.update_layout(
    title = "Average Cumulative Mass Balance of 'Reference' Glaciers Worldwide, 1956–2023",
    yaxis_title="Cumulative Mass Balance (meters of water equivalent)",
    title_x = 0.5
    )

fig.show()

Note, we used a ’’ in the title for Reference as the title is enclosed in ““.

It seems that there is a significant loss of mass, measured in water equivalents, since the base year of 1956.

Per the EPA eplanation for the figure,

On average, glaciers worldwide have been losing mass since at least the 1970s (see Figure 1), which in turn has contributed to observed changes in sea level (see the Sea Level indicator). A longer measurement record from a smaller number of glaciers suggests that they have been shrinking since the 1950s. The rate at which glaciers are losing mass appears to have accelerated over roughly the last decade.

9.2.5 Why do we care about Glaciers?

OK. Glaciers are losing a significant mass, as measured in water equivalents. That’a a lot of water! But why do we care? Is it something that we need to worry about? Is there a link between Loss of mass in glaciers $- >$ Sea Level Raise?

Let us check out the information from <https://wgms.ch/sea-level-rise/?

Melting ice sheets in Greenland and the Antarctic as well as ice melt from glaciers all over the world are causing sea levels to rise. For the IPCC special report on “Ocean and Cryosphere in a Changing Climate“, and AR6 WG-I, an international research team combined glaciological field observations with geodetic satellite measurements – as available from the WGMS – to reconstruct annual mass changes of more than 19’000 glaciers worldwide. This new assessment shows that glaciers alone lost more than 9,000 billion tons of ice between 1961/62 and 2015/16, raising water levels by 27 millimetres. This global glacier mass loss corresponds to an ice cube with the area of Germany and a thickness of 30 metres. The largest contributors were glaciers in Alaska, followed by the melting ice fields in Patagonia and glaciers in the Arctic regions. Glaciers in the European Alps, the Caucasus mountain range and New Zealand were also subject to significant ice loss; however, due to their relatively small glacierized areas they played only a minor role when it comes to the rising global sea levels.

Let is get some more data about glaciers from the EPA.

from io import StringIO
import pandas as pd

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/118.0",
}

url = "https://www.epa.gov/system/files/other-files/2024-05/glaciers_fig-2.csv"

f = StringIO(requests.get(url, headers=headers).text)

df = pd.read_csv(f,  skiprows=6)
df.head()

	Year	South Cascade Glacier	Gulkana Glacier	Wolverine Glacier	Lemon Creek Glacier
0	1952	NaN	NaN	NaN	1.29
1	1953	NaN	NaN	NaN	0.80
2	1954	NaN	NaN	NaN	0.69
3	1955	NaN	NaN	NaN	1.88
4	1956	NaN	NaN	NaN	1.31

Also note that we got a bit lazy here and started using df instead of naming the dataset more appropriately, say df_glaciers_epa. But we are not reusing datasets from before so it is easier to use the short df and overwrite the data frames. It becomes more of an issue when we need to reuse a old dataframe or merge with another data frame.

We see there are a lot of NaN values as some of hte measurements may have started at a later time than 1952.

fig = px.line(df, x="Year", y=['South Cascade Glacier', 'Gulkana Glacier', 'Wolverine Glacier', 'Lemon Creek Glacier'])

fig.update_layout(
    title = "Cumulative Mass Balance of Four U.S. Glaciers, 1952–2023",
    yaxis_title="Cumulative Mass Balance (meters of water equivalent)",
    title_x = 0.5,
    legend=dict(
        x=0.05,
        y=0.5
    )
)
fig.show()

We have using similar commands before, but two items to note

y=[‘South Cascade Glacier’, ‘Gulkana Glacier’, ‘Wolverine Glacier’, ‘Lemon Creek Glacier’]

Here, instead of one column, the y parameter is a list of column names from the DataFrame: ‘South Cascade Glacier’, ‘Gulkana Glacier’, ‘Wolverine Glacier’, and ‘Lemon Creek Glacier’. These columns contain the data points for each respective glacier, which will be plotted on the y-axis.

We have used fig.update_laypout to set the main title, yaxis title.
We have also centered the title by using title_x = 0.5. In order to keep the chart area larger, we have moved the legend inside the chart. We chose x= 0.05 and y = 0.5 based on the empty space in the chart.

OK. Enough with the code. What is the figure telling us?

As per the EPA explanation,

This figure shows the cumulative mass balance of four U.S. reference glaciers since measurements began in the 1950s or 1960s. For each glacier, the mass balance is set at zero for the base year of 1965. Negative values indicate a net loss of ice and snow compared with the base year. For consistency, measurements are in meters of water equivalent, which represent changes in the average thickness of a glacier.

The four U.S. reference glaciers have shown an overall decline in mass balance since the 1950s and 1960s and an accelerated rate of decline in recent years. Year-to-year trends vary, with some glaciers gaining mass in certain years (for example, Wolverine Glacier during the 1980s), but the measurements clearly indicate a loss of glacier mass over time.

Trends for the four U.S. reference glaciers are consistent with the retreat of glaciers observed throughout the western United States, Alaska, and other parts of the world. Observations of glaciers losing mass are also consistent with warming trends in U.S. and global temperatures during this time period (see the U.S. and Global Temperature indicator).

The images from the EPA website also tell a clear story.

We can see stark differences!

9.3 Arctic Glaciers

Let us move on to Arctic Glaciers from the EPA website

Cumulative Mass Balance of Eight Arctic Glaciers, 1945–2020

As EPA states,

This figure shows the cumulative change in mass balance of a set of eight glaciers located at a latitude of 66°N or higher with data beginning in 1945. Each time series has been adjusted to make 1970 a common zero point because this was the first year for which mass balance data are available for all eight glaciers. Negative values indicate a net loss of ice and snow compared with the base year of 1970. For consistency, measurements are in meters of water equivalent, which reflects the mass loss of the glacier by representing changes in the glacier’s average thickness.

As before, we can download the data from EPA.

url = "https://www.epa.gov/system/files/other-files/2022-07/arctic-glaciers-fig-1_0.csv"

df_arcticg = pd.read_csv(url, skiprows = 6)

df_arcticg

	Year	Meighen	White	Austre	Midtre	Devon	Melville	Storglaciaeren	Engabreen	Average
0	1945	NaN	NaN	NaN	NaN	NaN	NaN	11.14	NaN	9.25
1	1946	NaN	NaN	NaN	NaN	NaN	NaN	9.26	NaN	7.37
2	1947	NaN	NaN	NaN	NaN	NaN	NaN	8.97	NaN	7.08
3	1948	NaN	NaN	NaN	NaN	NaN	NaN	9.56	NaN	7.67
4	1949	NaN	NaN	NaN	NaN	NaN	NaN	8.81	NaN	6.92
...	...	...	...	...	...	...	...	...	...	...
71	2016	-9.12	-10.35	-23.61	-18.69	-7.33	-16.31	-5.57	2.10	-11.19
72	2017	-8.92	-10.34	-24.49	-19.46	-7.28	-16.09	-7.17	0.47	-11.74
73	2018	-9.75	NaN	-25.20	-20.02	-7.80	NaN	-7.48	1.26	-12.10
74	2019	-10.55	NaN	-26.94	-21.61	-8.30	NaN	-7.62	2.43	-12.70
75	2020	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

76 rows × 10 columns

We can see that from 1945 to 1957 they didn’t record anything for the first 6 glaciers. Even after 1957 they didn’t record anything for two of the glaciers. So, let’s just graph the average first.

Let us plot the average:

fig = px.line(df_arcticg, x="Year", y="Average")


fig.update_layout(
    title="Cumulative Mass Balance of Eight Arctic Glaciers, 1945–2019",
    title_x = 0.5
)

fig.show()

We can definitely see that the average mass balance is going down over time and is in negative territory. Note that negative values indicate a net loss of ice and snow compared with the base year of 1970.

We only plotted the average. There are many missing value NaN for other glaciers in the initial years. The nice thing with NaN is that Plotly can ignore the NaN and plot the data when it is available.

Let us list the columns in the data frame

df_arcticg.columns

Index(['Year', 'Meighen', 'White ', 'Austre', 'Midtre', 'Devon', 'Melville',
       'Storglaciaeren', 'Engabreen', 'Average'],
      dtype='object')

We can see that there are many glaciers that we can plot.

fig = px.line(df_arcticg, x="Year", y=['Meighen', 'White ', 'Austre', 'Midtre', 'Devon', 'Melville',
       'Storglaciaeren', 'Engabreen'])


fig.update_layout(
    title=" Cumulative Mass Balance of Eight Arctic Glaciers, 1945–2020",
    title_x = 0.5,
    yaxis_title="Cumulative Mass Balance (meters of water equivalent)",
    legend=dict(
        x=0.05,
        y=0.05
    )
)
fig.show()

What do we take away from the figures? EPA’s key insights for the figures mention

Since 1945, the eight Arctic glaciers in Figure 1 have experienced a decline in mass balance overall. One exception to this pattern is Engabreen, near the subpolar zone along the coast of Norway, which gained mass over the period of record. This trend is anomalous and does not reflect the significant global loss of land ice as measured and catalogued from numerous glaciers elsewhere in the Arctic and around the world in the 20th century. Engabreen is more strongly influenced by precipitation than glaciers elsewhere in the Arctic.

The loss of land-based ice in the Arctic has accelerated in recent decades. The Arctic is the largest contributing source of land ice to global sea level rise.

Lets do one more thing as we have eight glaciers.

subset_columns = ['Meighen', 'White ', 'Austre', 'Midtre', 'Devon', 'Melville','Storglaciaeren', 'Engabreen']

correlation_matrix = df_arcticg[subset_columns].corr()

print(correlation_matrix.round(3))

                Meighen  White   Austre  Midtre  Devon  Melville  \
Meighen           1.000   0.984   0.917   0.917  0.994     0.976   
White             0.984   1.000   0.950   0.949  0.991     0.990   
Austre            0.917   0.950   1.000   1.000  0.938     0.967   
Midtre            0.917   0.949   1.000   1.000  0.936     0.965   
Devon             0.994   0.991   0.938   0.936  1.000     0.985   
Melville          0.976   0.990   0.967   0.965  0.985     1.000   
Storglaciaeren    0.870   0.823   0.790   0.778  0.877     0.817   
Engabreen         0.374   0.248   0.120   0.115  0.365     0.192   

                Storglaciaeren  Engabreen  
Meighen                  0.870      0.374  
White                    0.823      0.248  
Austre                   0.790      0.120  
Midtre                   0.778      0.115  
Devon                    0.877      0.365  
Melville                 0.817      0.192  
Storglaciaeren           1.000      0.668  
Engabreen                0.668      1.000

What is the correlation matrix?

A correlation matrix is a table showing correlation coefficients between variables. Each cell in the table shows the correlation between two variables.
The value ranges from -1 to 1, where:
- 1 indicates a perfect positive correlation.
- -1 indicates a perfect negative correlation.
- 0 indicates no correlation.

The correlation_matrix is typically a DataFrame or a 2D array that contains correlation coefficients between different variables.

As we could see, the correlations are all positive indicating that all the glaciers are losing mass. Except Engabreen, the correlation among all the other glaciers is very high, more than 0.8 on average.

We can also plot the correlation matrix

fig = px.imshow(correlation_matrix)
fig.show()

The imshow function is designed to display image-like data, making it suitable for visualizing matrices. In this context, it will generate a heatmap where the color intensity represents the strength and direction of the correlations.

We can see that most of the plot is in yellow, with the correlation is greater than 0.8. Engabreen is the only one with a different color, indicating that the mass loss there is not significantly correlated with the other seven glaciers. As EPA states, Engabreen is more strongly influenced by precipitation than glaciers elsewhere in the Arctic.

9.4 Glaciers in Glacier National Park

There are a few other figures on the EPA website that take a closer look Glaciers in Glacier National Park.

Total Glacier Surface Area in Glacier National Park, 1966–2015

We can follow the same steps as before 1. get the data from the EPA website 2. make sure that we read the data correctly 3. plot the figure

Why don’t you try it as an exercise?

Can you modify the following code?

```{python}
url = "https://www.epa.gov/sites/default/files/2021-03/glacier-park_fig-1.csv"

df_totalSA = pd.read_csv(url, skiprows = 6, encoding = "ISO-8859-1")

df_totalSA
```

How can you add titles and legends to the code below? See our code before.

```{python}
fig = px.bar(df_totalSA, x="Year", y='Total glacier area')

fig.show()
```

What does this figure convey?

The total surface area of the 37 named glaciers in Glacier National Park decreased by about 34 percent between 1966 and 2015.

Change in Glacier Surface Area in Glacier National Park, 1966–2015

We don’t have all the data to plot this figure. But the EPA notes that

Every glacier’s surface area was smaller in 2015 than it was in 1966. Three glaciers temporarily gained some area during part of the overall time period.

The overall trend of shrinking surface area of glaciers in Glacier National Park is consistent with the retreat of glaciers observed in the United States and worldwide.

9.5 Great Lakes

From the EPA website,

The Great Lakes—Lake Superior, Lake Michigan, Lake Huron, Lake Erie, and Lake Ontario—form the largest group of freshwater lakes on Earth. These lakes support a variety of ecosystems and play a vital role in the economy of the eight neighboring states and the Canadian province of Ontario, providing drinking water, shipping lanes, fisheries, recreational opportunities, and more.

These graphs show the percentage of surface area in each of the Great Lakes that froze on the maximum day (the “most frozen” day) of each year since 1973. The graph at the lower right is an area-weighted average of all five lakes.

import pandas as pd
url = "https://www.epa.gov/sites/default/files/2021-04/great-lakes-ice_fig-1.csv"

df_lakes = pd.read_csv(url, skiprows = 6)

df_lakes

	Year	Erie	Huron	Michigan	Ontario	Superior	All Lakes
0	1973	95.7	66.70	33.0	62.6	69.8	60.6
1	1974	88.5	61.40	39.4	24.5	73.8	58.7
2	1975	80.1	55.10	28.1	14.9	64.9	50.2
3	1976	95.4	52.50	29.5	15.2	49.9	39.9
4	1977	99.8	95.00	93.1	46.7	96.0	89.9
5	1978	100.0	95.80	66.6	57.7	92.5	83.0
6	1979	100.0	96.40	92.3	86.2	97.1	94.7
7	1980	93.4	70.30	38.6	30.6	77.2	62.8
8	1981	96.0	93.80	53.8	47.6	84.7	71.3
9	1982	99.1	93.30	60.2	50.7	85.3	74.8
10	1983	40.8	31.80	23.6	12.5	20.2	22.1
11	1984	95.7	77.60	43.3	40.7	88.2	65.2
12	1985	96.0	87.40	41.3	38.2	81.9	69.8
13	1986	95.5	73.50	66.8	43.7	90.7	76.4
14	1987	88.0	43.50	20.0	8.4	14.5	27.3
15	1988	91.5	60.10	32.7	21.1	67.3	53.6
16	1989	91.6	76.30	30.9	15.5	82.3	59.2
17	1990	72.8	73.80	32.4	29.5	78.1	59.7
18	1991	35.1	43.80	21.5	11.6	89.6	44.3
19	1992	89.8	69.90	32.8	17.5	73.3	45.5
20	1993	94.3	78.30	32.2	29.0	77.2	64.4
21	1994	96.7	96.90	82.7	55.7	96.1	89.0
22	1995	94.0	41.70	21.6	18.8	28.8	32.0
23	1996	100.0	98.20	75.0	45.1	100.0	81.7
24	1997	99.6	65.30	37.8	25.6	89.0	58.4
25	1998	5.4	28.60	15.1	6.2	11.1	14.0
26	1999	74.8	36.00	23.0	17.9	18.3	26.6
27	2000	90.7	41.90	27.2	22.3	33.5	34.8
28	2001	94.0	45.70	29.5	17.9	46.7	34.4
29	2002	14.4	26.10	12.4	4.0	10.3	11.8
30	2003	95.7	96.20	48.0	49.6	95.5	80.2
31	2004	95.4	64.50	36.4	38.5	52.2	51.9
32	2005	93.0	58.90	29.4	37.8	54.5	44.2
33	2006	21.9	32.17	16.2	14.3	16.7	20.4
34	2007	95.8	71.39	37.2	23.8	51.8	55.2
35	2008	93.4	59.51	33.5	14.0	62.7	53.7
36	2009	95.6	85.11	52.3	25.9	93.6	73.9
37	2010	93.1	36.95	23.5	13.1	27.3	27.5
38	2011	95.9	63.79	29.4	31.1	33.6	40.1
39	2012	13.9	23.06	16.7	1.9	8.2	12.9
40	2013	83.7	47.75	24.4	15.6	38.6	38.4
41	2014	96.1	96.10	93.1	60.8	95.8	92.3
42	2015	98.1	96.40	72.9	82.4	95.7	88.8
43	2016	78.7	47.98	26.8	23.5	22.7	33.8
44	2017	35.5	35.35	18.2	5.8	18.7	19.4
45	2018	95.1	81.39	51.3	24.7	77.2	69.0
46	2019	94.3	95.70	55.8	39.8	94.9	80.9
47	2020	15.9	32.10	17.2	10.7	22.6	19.5

We can see that there are columns for: ‘Erie’, ‘Huron’, ‘Michigan’, ‘Ontario’, ‘Superior’, ‘All Lakes’ along with ‘YEAR’.

We can plot

lake by lake
all the lakes in one plot
lake by lake subplots and facet them into one graph

Lets do one lake first

import plotly.express as px
fig = px.line(df_lakes, x='Year', y='Erie')
fig.update_layout(title = "Maximum Ice-Covered Area for Lake Erie, 1973–2020",
    yaxis_title = "Percent Ice Cover",
    title_x = 0.5)

fig.show()

We haven’t used functions in a while. So, let us get some more practice with functions. We can see that all we need to change is the lake name or the column for the y-axis and the title.

import plotly.express as px
fig = px.line(df_lakes, x='Year', y='Erie')
fig.update_layout(title = "Maximum Ice-Covered Area for Lake Erie, 1973–2020",
    yaxis_title = "Percent Ice Cover",
    title_x = 0.5)

fig.show()

import plotly.express as px

def plot_lake_ice_cover(lake_name):
    fig = px.line(df_lakes, x='Year', y=lake_name)
    fig.update_layout(
        title=f"Maximum Ice-Covered Area for Lake {lake_name}, 1973–2020",
        yaxis_title="Percent Ice Cover",
        title_x=0.5
    )
    fig.show()

What did we do?

The function, plot_lake_ice_cover(), takes a single parameter, lake_name, which represents the name of the lake for which the ice cover data will be visualized.
Within the function, a line plot is created using the Plotly px.line method. This method takes a DataFrame, df_lakes, and plots the data with ‘Year’ on the x-axis and the specified lake_name on the y-axis.
After creating the initial plot, the function customizes the layout of the figure using the update_layout method.
The title of the plot is set to “Maximum Ice-Covered Area for Lake {lake_name}, 1973–2020”, dynamically incorporating the lake_name parameter.
The y-axis is labeled “Percent Ice Cover” to indicate the measurement being plotted.
Additionally, the title is centered horizontally by setting title_x to 0.5.
Finally, the function calls fig.show() to display the plot.

Let us try to call with a different lake name

plot_lake_ice_cover('Huron')

We can see that the relevant lake (Huron in this case) data is used along with dynamically includng the Lake name in the title!

How about lake Michigan? Again, all we need to do is to make a function call changing the name of the lake or the input variable.

plot_lake_ice_cover('Michigan')

It beats writing out long lines of code. For example

```{python}
fig = px.line(df_lakes, x='Year', y='Huron')

fig.show()

fig = px.line(df_lakes, x='Year', y='Michigan')

fig.show()

fig = px.line(df_lakes, x='Year', y='Ontario')

fig.show()

fig = px.line(df_lakes, x='Year', y='Superior')

fig.show()

fig = px.line(df_lakes, x='Year', y='All Lakes')

fig.show()
```

All these without including titles and other information!

Lets exactly replicate the EPA figure by faceting all these figures together. Our function helps!

We already have done these in the Seasonal Temperatures graph. We can just adapt it here.

import plotly.graph_objects as go
from plotly.subplots import make_subplots

fig = make_subplots(rows=3, cols=2, subplot_titles=('Erie', 'Huron', 'Michigan', 'Ontario', 'Superior', 'All Lakes'),
        x_title= 'Year',
        y_title = 'Percent Ice Cover')

fig.update_layout(title = "Maximum Ice-Covered Area for Great Lakes, 1973–2020", 
    title_x = 0.5)

fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'Erie'), row=1, col=1)

What do we see? We sew space for 6 figures, 2 columns and 3 rows and we only added Lake Erie so far.

Let us add the rest

fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'Huron'), row=1, col=2)
fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'Erie'), row=2, col=1)
fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'Ontario'), row=2, col=2)
fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'Superior'), row=3, col=1)
fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'All Lakes'), row=3, col=2)

What are the key takeaways based on the graphs? Per EPA,

Since 1973, the maximum area covered in ice on the Great Lakes has varied considerably from year to year. Adding the five Great Lakes together, maximum frozen area has ranged from less than 20 percent in some years to more than 90 percent in others (see Figure 1). All five lakes have experienced some degree of long-term decrease, but the decrease is only statistically significant in one lake (Superior). Years with much-lower-than-normal ice cover appear to have become more frequent during the past two decades, especially in lakes like Erie and Superior that have a history of freezing almost completely.

9.5.0.1 Duratuion of Ice Cover

Let us look at the duration of Ice Cover https://www.epa.gov/climate-indicators/climate-change-indicators-great-lakes-ice-cover

These graphs show the number of frozen days per year in each of the Great Lakes since 1973. A frozen day is defined here as a day when at least 5 percent of the lake’s surface area was covered with ice.

We need to follow the same recipte as before

download data
make subplots

import pandas as pd
url = "https://www.epa.gov/sites/default/files/2021-04/great-lakes-ice_fig-2.csv"

df_lakes = pd.read_csv(url, skiprows = 6)

df_lakes

	Year	Erie	Huron	Michigan	Ontario	Superior
0	1973	67	108	85	67	97
1	1974	76	120	106	76	119
2	1975	65	105	97	32	95
3	1976	91	119	108	74	108
4	1977	117	136	124	118	133
5	1978	124	130	113	106	123
6	1979	96	120	117	71	137
7	1980	98	106	90	60	91
8	1981	108	113	104	67	110
9	1982	105	129	105	86	121
10	1983	47	85	64	34	86
11	1984	123	128	113	94	128
12	1985	101	116	94	85	107
13	1986	118	125	114	89	115
14	1987	56	82	72	19	48
15	1988	102	105	89	81	93
16	1989	77	124	107	43	120
17	1990	96	126	108	95	117
18	1991	64	102	81	56	93
19	1992	64	121	81	68	127
20	1993	83	121	89	62	99
21	1994	113	123	107	99	125
22	1995	80	93	61	39	79
23	1996	132	146	123	80	134
24	1997	85	128	113	75	122
25	1998	1	86	56	10	34
26	1999	80	103	80	49	54
27	2000	69	88	65	43	50
28	2001	123	130	111	80	105
29	2002	20	102	59	0	33
30	2003	105	119	102	81	103
31	2004	91	106	89	68	86
32	2005	103	123	104	70	86
33	2006	55	120	68	18	31
34	2007	70	88	69	51	67
35	2008	88	122	92	45	69
36	2009	106	132	106	65	109
37	2010	86	86	78	23	45
38	2011	113	124	102	54	84
39	2012	12	75	52	0	13
40	2013	56	105	84	25	75
41	2014	134	153	145	98	153
42	2015	104	126	109	90	126
43	2016	40	72	62	17	43
44	2017	42	101	80	4	51
45	2018	104	143	99	61	124
46	2019	87	120	93	73	97
47	2020	6	88	72	7	54

Note that reused the df_lakes dataframe. Nothing is the same as before. Now we see that the five lakes, but there is no column for All Lakes.

We could use the same code as before by removing the All Lakes

import plotly.graph_objects as go
from plotly.subplots import make_subplots

fig = make_subplots(rows=3, cols=2, subplot_titles=('Erie', 'Huron', 'Michigan', 'Ontario', 'Superior'),
        x_title= 'Year',
        y_title = 'Duration of Ice Cover')

fig.update_layout(
    title="Duration of Ice Cover in the Great Lakes, 1973–2020", 
    title_x=0.5,
    legend=dict(x=0.7, y=-0.1)
)

fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'Erie'), row=1, col=1)
fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'Huron'), row=1, col=2)
fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'Erie'), row=2, col=1)
fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'Ontario'), row=2, col=2)
fig.add_trace(go.Scatter(x=df_lakes['Year'], y=df_lakes['Erie'], name = 'Superior'), row=3, col=1)

The only minor modifications that we have made are

removed “All Lakes” from the subplot_titles
changed the y_title to ‘Duration of Ice Cover’
update the title in fig.update_layout
moved the legend to the empty second column of third row as we only have 5 figures

What do we learn from these figures? EPA keynotes mention

Since 1973, the number of frozen days (duration) on all five Great Lakes has declined. Ice duration on these lakes has decreased at rates ranging from approximately one-fifth of a day per year in Lake Huron to almost a full day per year in Lakes Ontario and Superior (see Figure 2). Overall, this means the Great Lakes are frozen for eight to 46 fewer days now than they were in the early 1970s. The decreases in Lakes Ontario and Superior are statistically significant.

There is also a detailed map below (that we don’t know how to recreate)that shows that many areas of the Great Lakes have experienced significant decreases in ice cover duration, but other parts of the lakes have not changed significantly. Duration has decreased the most in areas near the shore.

9.6 Summary

9.6.1 Some Python concepts that we learned in this Chapter

We can collect the Python programming concepts towards the end

Concatenating- Creating a new string or object that merges two strings togehter. In simple terms, we can create a new string variable by combining two strings together usually using the + symbol. An example may be “Climate” + “Change” giving us “Climate Change”.

9.6.2 Some Climate Insights that we learned in this Chapter

Both the extent and thickness of Arctic sea ice has declined rapidly over the last several decades.
The four U.S. reference glaciers have shown an overall decline in mass balance since the 1950s and 1960s and an accelerated rate of decline in recent years
Ice duration on great Lakes and number of frozen days are declining.

9.7 Exercises / Check your knowledge of Python

In the code that we used in this chapter, there are some learning opportunities to think through. For example,

why is the index at 13?
what does for loop do?
what is the value of {month}
what is concatenate doing?
why do we need axis = 0

9.8 Data Sources

Some of the data sources that have used are: