Understanding Climate Change Using Data Science
  • Home
  • Research
  • AI Literacy
  • Twitter
  • Facebook
  1. Are We Responsible? Anthropocene Effect?
  2. 14  Greenhouse Gas Emissions

  • FRONT MATTER
    • Preface
    • About the Book
    • About the Author

  • Programming and Visualization Primer
    • 1  Setup and Installation
    • 2  Python Primer
    • 3  Pandas

  • 2024 Climate Dashboard
    • 4  Introduction
    • 5  U.S. and Global Temperatures
    • 6  Seasonal Temperature
    • 7  High and Low Temperatures
    • 8  Temperature and Heat
    • 9  Arctic and Antarctic Ice
    • 10  Oceans
    • 11  Sea Level Rise (SLR)
    • 12  Part 1: Conclusion

  • Are We Responsible? Anthropocene Effect?
    • 13  About Part-2
    • 14  Greenhouse Gas Emissions

  • Anthropocene? Why Should We Care About Climate Change?
    • 15  About Part-3
    • 16  NOAA Climate Indicators: Droughts
    • 17  Disaster Declarations by FEMA

  • What Can We Do? Personal Action, Mitigation and Resilience

Table of contents

  • 14.1 Carbon Dioxide (CO2)
  • 14.2 Other GHG Emissions
  • 14.3 U.S. Greenhouse Gas Emissions and Sinks by Economic Sector, 1990–2022
    • 14.3.1 Emissions per Capita
  • Edit this page
  • Report an issue
  • View source
  1. Are We Responsible? Anthropocene Effect?
  2. 14  Greenhouse Gas Emissions

14  Greenhouse Gas Emissions

In this section, we will analyze Greenhouse Gas (GHG) emissions.

What are Greenhouse Gases and why do we care about them?

As before, we will rely on EPA or other governmental agencies, multinational agencies or some scientific articles for evidence.

EPA highlights that:

Gases that trap heat in the atmosphere are called greenhouse gases. As greenhouse gas emissions from human activities increase, they build up in the atmosphere and warm the climate, leading to many other changes around the world—in the atmosphere, on land, and in the oceans.

EPA gives detailed information on Greenhouse Gases at https://www.epa.gov/ghgemissions/overview-greenhouse-gases. There main Greenhouse Gases that we will analyze are:

  1. Carbon dioxide (CO2): Carbon dioxide enters the atmosphere through burning fossil fuels (coal, natural gas, and oil), solid waste, trees and other biological materials, and also as a result of certain chemical reactions (e.g., cement production). Carbon dioxide is removed from the atmosphere (or “sequestered”) when it is absorbed by plants as part of the biological carbon cycle.
  1. Methane (CH4): Methane is emitted during the production and transport of coal, natural gas, and oil. Methane emissions also result from livestock and other agricultural practices, land use, and by the decay of organic waste in municipal solid waste landfills.
  1. Nitrous oxide (N2O): Nitrous oxide is emitted during agricultural, land use, and industrial activities; combustion of fossil fuels and solid waste; as well as during treatment of wastewater.
  1. Fluorinated gases: Hydrofluorocarbons, perfluorocarbons, sulfur hexafluoride, and nitrogen trifluoride are synthetic, powerful greenhouse gases that are emitted from a variety of household, commercial, and industrial applications and processes.

We can see the composition of total GHG emissions as of 2022 for the U.S. from EPA.

Greenhouse Gases. Source: EPA

Total U.S. Emissions in 2022 = 6,343 Million Metric Tons of CO₂ equivalent.

14.1 Carbon Dioxide (CO2)

Lets us start with CO2. Why do we care about CO2?

First, CO2 accounts for almost 80% of the Greenhouse Gases.

Second, there is a steep increase in the CO2 over the past 10,000 years. The increase over the past 100 years or so is alarming if the trends continue.

10,000 year of CO2

Before diving deeper into the sources and causes of CO2 emissions, let us first reproduce the EPA graphs on CO2 emissions.

CO2. Source: EPA

This figure shows emissions of carbon dioxide, methane, nitrous oxide, and several fluorinated gases in the United States from 1990 to 2022. For consistency, emissions are expressed in million metric tons of carbon dioxide equivalents.

We will repeat the steps that we have undertaken in part 1 of the book to reproduce the figure. So, we won’t give detailed explanations as many of these are covered before.

The data for the figure is available at https://www.epa.gov/system/files/other-files/2024-06/us-ghg-emissions_fig-1.csv

import pandas as pd

url = "https://www.epa.gov/system/files/other-files/2024-06/us-ghg-emissions_fig-1.csv"

from io import StringIO
import pandas as pd
import requests

def url_data_request(url, skiprows):
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/118.0",
    }
    f = StringIO(requests.get(url, headers=headers).text)
    df_url = pd.read_csv(f, skiprows = skiprows)
    return df_url

df = url_data_request(url, 6)

What does the code do? As we moved into Part 2 and AI agents have become much better, we can ask Copilot to explain the code. We edited the explanation to make it shorter.

  • The provided code snippet is written in Python and is used to fetch and process a CSV file from a given URL.

  • The code begins by importing the necessary libraries. The pandas library is imported twice, which is redundant, but it is a powerful data manipulation library used to handle the CSV data.

    • Our mistake! We missed it, but Copilot doesn’t! There is no harm in importing twice but it is redundant.
  • The StringIO class from the io module is imported to handle string data as file-like objects, and the requests library is imported to make HTTP requests.

  • The URL of the CSV file is stored in the variable url. This URL points to a CSV file hosted on the EPA’s website, which contains data on U.S. greenhouse gas emissions.

  • A function named url_data_request is defined to fetch and process the CSV data from the URL. This function takes two parameters: url and skiprows.

    • The url parameter is the URL of the CSV file, and skiprows specifies the number of rows to skip at the beginning of the file.
    • This is useful for skipping metadata or header information that is not part of the actual data.
  • Inside the function, a dictionary named headers is defined to mimic a web browser’s request. This is done to avoid potential issues with web servers that block requests from non-browser clients.

  • The requests.get function is used to fetch the content of the URL, and the response text is passed to StringIO to create a file-like object. This object is then read into a Pandas DataFrame using the pd.read_csv function, with the skiprows parameter to skip the specified number of rows.

  • Finally, the function returns the DataFrame containing the CSV data.

  • The url_data_request function is called with the URL and skiprows set to 6, and the resulting DataFrame is stored in the variable df.

  • This DataFrame now contains the processed data from the CSV file, ready for further analysis or visualization.

As you can see, the explanation is more detailed and useful than what we would have written. So we decided to use it, with some minor edits.

As before, the first step with any dataset is always to get some basic info.

nrows, ncol = df.shape

We can see from df.shape that there are 33 rows and 5 variables.

We can do descriptive stats on the numerical variables.

df.describe()
Year Methane Nitrous oxide HFCs, PFCs, SF6, and NF3
count 33.00000 33.000000 33.000000 33.000000
mean 2006.00000 802.654633 417.133412 163.509376
std 9.66954 47.816996 13.205529 20.770491
min 1990.00000 702.354745 389.707858 118.265427
25% 1998.00000 764.386625 410.943810 155.868137
50% 2006.00000 806.645870 419.175093 169.045426
75% 2014.00000 835.477487 423.413845 176.571344
max 2022.00000 879.645413 439.607792 198.128038

As we can see, the data is from 1990-2022 and includes emissions data on methane, nitrous oxide, and several fluorinated gases in the United States (re expressed in million metric tons of carbon dioxide equivalents).

That’s only 4 variables or columns, including year. Where did the fifth column go?

We should have first checked the data types in the dataset.

df.dtypes
Year                          int64
Carbon dioxide               object
Methane                     float64
Nitrous oxide               float64
HFCs, PFCs, SF6, and NF3    float64
dtype: object

We can see that Carbon dioxide (CO2) is an object and that’s why the df.describe() didn’t give descriptive statistics on it.

Why is it an object type? We have no idea!

Looking at the data, all the values look numerical.

df
Year Carbon dioxide Methane Nitrous oxide HFCs, PFCs, SF6, and NF3
0 1990 5,131.65 871.658825 408.151400 125.454537
1 1991 5,076.30 879.645413 398.477140 118.265427
2 1992 5,178.73 877.381636 397.294008 122.945240
3 1993 5,279.48 864.852491 418.461318 123.662121
4 1994 5,361.82 873.401867 419.175093 128.238674
5 1995 5,425.80 866.586056 423.495394 146.256594
6 1996 5,601.06 861.737879 435.519013 157.943424
7 1997 5,673.11 847.921716 419.608268 168.341026
8 1998 5,736.33 835.477487 423.373150 183.810398
9 1999 5,800.25 815.812634 418.744132 183.826627
10 2000 6,023.13 815.701845 410.943810 170.107309
11 2001 5,903.53 807.084109 421.725702 154.972910
12 2002 5,947.79 801.091353 422.934556 158.077394
13 2003 6,006.42 801.128955 423.413845 155.928303
14 2004 6,111.52 796.433638 428.505590 155.868137
15 2005 6,126.86 795.429181 419.189952 153.153855
16 2006 6,045.30 806.645870 420.562160 156.667115
17 2007 6,121.52 810.254941 431.642027 166.486877
18 2008 5,919.21 821.582197 416.514826 169.045426
19 2009 5,486.09 806.775967 410.066543 163.987454
20 2010 5,668.74 807.598702 418.263291 171.403236
21 2011 5,540.33 778.288480 419.850882 173.296084
22 2012 5,332.81 764.063618 392.793611 171.237077
23 2013 5,473.66 766.205413 434.022492 171.519475
24 2014 5,529.88 762.391457 439.607792 173.968019
25 2015 5,368.29 764.386625 426.965957 176.571344
26 2016 5,246.40 744.590524 413.408295 175.634227
27 2017 5,196.54 759.480075 417.818944 177.188295
28 2018 5,362.19 771.481067 439.453215 179.624212
29 2019 5,234.49 754.341139 416.377483 184.922055
30 2020 4,688.97 735.348832 391.167731 186.321505
31 2021 5,017.20 720.468152 398.167133 192.956988
32 2022 5053.018689 702.354745 389.707858 198.128038

All the values are included with two decimals whereas the value for 2022 has 6 decimals. We suspect this may be an issue. But we don’t know.

AI is smart but real world is messy! There are always some cases where the data is slightly different or there is some nuance that makes the AI generated code not work. So, it is still useful to learn and understand coding!

Let us read object type as a float type variable.

```{python}
df['Carbon dioxide'] = df['Carbon dioxide'].astype(float)
```

Unfortunately, it doesn’t work :(

We get an error.

 ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[19], line 1
----> 1 df['Carbon dioxide'] = df['Carbon dioxide'].astype(float)

File /usr/local/lib/python3.11/site-packages/pandas/core/generic.py:6324, in NDFrame.astype(self, dtype, copy, errors)
   6317     results = [
   6318         self.iloc[:, i].astype(dtype, copy=copy)
   6319         for i in range(len(self.columns))
   6320     ]
   6322 else:
   6323     # else, only a single dtype is given
-> 6324     new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   6325     return self._constructor(new_data).__finalize__(self, method="astype")
   6327 # GH 33113: handle empty frame or series

File /usr/local/lib/python3.11/site-packages/pandas/core/internals/managers.py:451, in BaseBlockManager.astype(self, dtype, copy, errors)
    448 elif using_copy_on_write():
    449     copy = False
--> 451 return self.apply(
    452     "astype",
    453     dtype=dtype,
    454     copy=copy,
    455     errors=errors,
    456     using_cow=using_copy_on_write(),
    457 )

File /usr/local/lib/python3.11/site-packages/pandas/core/internals/managers.py:352, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
    350         applied = b.apply(f, **kwargs)
    351     else:
--> 352         applied = getattr(b, f)(**kwargs)
    353     result_blocks = extend_blocks(applied, result_blocks)
    355 out = type(self).from_blocks(result_blocks, self.axes)

File /usr/local/lib/python3.11/site-packages/pandas/core/internals/blocks.py:511, in Block.astype(self, dtype, copy, errors, using_cow)
    491 """
    492 Coerce to the new dtype.
    493 
   (...)
    507 Block
    508 """
    509 values = self.values
--> 511 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
    513 new_values = maybe_coerce_values(new_values)
    515 refs = None

File /usr/local/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:242, in astype_array_safe(values, dtype, copy, errors)
    239     dtype = dtype.numpy_dtype
    241 try:
--> 242     new_values = astype_array(values, dtype, copy=copy)
    243 except (ValueError, TypeError):
    244     # e.g. _astype_nansafe can fail on object-dtype of strings
    245     #  trying to convert to float
    246     if errors == "ignore":

File /usr/local/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:187, in astype_array(values, dtype, copy)
    184     values = values.astype(dtype, copy=copy)
    186 else:
--> 187     values = _astype_nansafe(values, dtype, copy=copy)
    189 # in pandas we don't store numpy str dtypes, so convert to object
    190 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):

File /usr/local/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:138, in _astype_nansafe(arr, dtype, copy, skipna)
    134     raise ValueError(msg)
    136 if copy or is_object_dtype(arr.dtype) or is_object_dtype(dtype):
    137     # Explicit copy, or required since NumPy can't view from / to object.
--> 138     return arr.astype(dtype, copy=True)
    140 return arr.astype(dtype, copy=copy)

ValueError: could not convert string to float: '5,131.65'

Didn’t we say real world data is messy!!!

We have to do some snooping but it appears that 5,131.65 is a read as a string, for whatever reason.

Now that we know there are some values of ‘Carbon dioxide’ column / variable stored as ‘string’, we can do some data cleaning.

df['Carbon dioxide'] = df['Carbon dioxide'].str.replace(',', '') 
df['Carbon dioxide'] = df['Carbon dioxide'].str.strip() 

df['Carbon dioxide'] = df['Carbon dioxide'].astype(float)

Basically, we removed commas and leading and trailing empty spaced and then read the string as a float.

Did it work? Let us check!

df.describe()
Year Carbon dioxide Methane Nitrous oxide HFCs, PFCs, SF6, and NF3
count 33.00000 33.000000 33.000000 33.000000 33.000000
mean 2006.00000 5535.406627 802.654633 417.133412 163.509376
std 9.66954 377.903314 47.816996 13.205529 20.770491
min 1990.00000 4688.970000 702.354745 389.707858 118.265427
25% 1998.00000 5246.400000 764.386625 410.943810 155.868137
50% 2006.00000 5486.090000 806.645870 419.175093 169.045426
75% 2014.00000 5903.530000 835.477487 423.413845 176.571344
max 2022.00000 6126.860000 879.645413 439.607792 198.128038

Now, we see descriptive statistics for ‘Carbon dioxide’ also! The code worked!!

It is easy to plot these using Plotly.

Let us focus on ‘Carbon dioxide’ so that we can see if there are significant increases over time.

import plotly.express as px

fig = px.line(df, x='Year', y=['Carbon dioxide'], 
              labels={'value': 'Emissions (million metric tons of CO2 equivalents)', 'variable': 'Gas'},
              title='CO2 Emissions Over Time')

fig.update_layout(
    yaxis_title="Emissions (million metric tons of CO2 equivalents)",
    xaxis_title="Year",
    title_x=0.5,
    hovermode="x",
    legend=dict(title="CO2 Emissions", x=0.2, y=0.3)  
)
fig.show()
199019952000200520102015202048005000520054005600580060006200
CO2 EmissionsCarbon dioxideCO2 Emissions Over TimeYearEmissions (million metric tons of CO2 equivalents)
plotly-logomark

Now, we can see that the CO2 emissions steadily increased from 1990 to 2007 or so but decreased later and now at around the 1990 levels. Scale matters for graphing!

14.2 Other GHG Emissions

Let us see the other GHG that are at a different scale than CO2 (that accounts for approximately 80% of the GHG emissions)

import plotly.express as px

fig = px.line(df, x='Year', y=['Carbon dioxide', 'Methane', 'Nitrous oxide', 'HFCs, PFCs, SF6, and NF3'], 
              labels={'value': 'Emissions (million metric tons of CO2 equivalents)', 'variable': 'Gas'},
              title='Greenhouse Gas Emissions Over Time')

fig.update_layout(
    yaxis_title="Emissions (million metric tons of CO2 equivalents)",
    xaxis_title="Year",
    title_x=0.5,
    hovermode="x",
    legend=dict(title="GhG Emissions", x=0.2, y=0.3)  
)
    
fig.show()
19901995200020052010201520200100020003000400050006000
GhG EmissionsCarbon dioxideMethaneNitrous oxideHFCs, PFCs, SF6, and NF3Greenhouse Gas Emissions Over TimeYearEmissions (million metric tons of CO2 equivalents)
plotly-logomark

This graph is not as useful as ‘Carbon dioxide’ emissions dominate and relative to that, every other gas emission seems smaller.

We can just focus on other GHG gases in the figure to see the trends clearly.

import plotly.express as px

fig = px.line(df, x='Year', y=['Methane', 'Nitrous oxide', 'HFCs, PFCs, SF6, and NF3'], 
              labels={'value': 'Emissions (million metric tons of CO2 equivalents)', 'variable': 'Gas'},
              title='Greenhouse Gas Emissions Over Time')

fig.update_layout(
    yaxis_title="Emissions (million metric tons of CO2 equivalents)",
    xaxis_title="Year",
    title_x=0.5,
    hovermode="x",
    legend=dict(title="GhG Emissions", x=0.1, y=0.8)  
)
fig.show()
1990199520002005201020152020100200300400500600700800900
GhG EmissionsMethaneNitrous oxideHFCs, PFCs, SF6, and NF3Greenhouse Gas Emissions Over TimeYearEmissions (million metric tons of CO2 equivalents)
plotly-logomark

We see that ‘Nitrous Oxide’ is more or less stable over the period 1990-2022.

‘HFCs’ seem to have increased over the 1990-2022 time period.

Now that we see the different GhG emissions on different scales, let us combine them.

import plotly.express as px

fig = px.area(df, x='Year', y=['Carbon dioxide', 'Methane', 'Nitrous oxide', 'HFCs, PFCs, SF6, and NF3'], 
              labels={'value': 'Emissions (million metric tons of CO2 equivalents)', 'variable': 'Gas'},
              title='Greenhouse Gas Emissions Over Time')

fig.update_layout(
    yaxis_title="Emissions (million metric tons of CO2 equivalents)",
    xaxis_title="Year",
    title_x=0.5,
    hovermode="x",
    legend=dict(title="GhG Emissions", x=0.3, y=0.3)  
)
fig.show()
199019952000200520102015202001000200030004000500060007000
GhG EmissionsCarbon dioxideMethaneNitrous oxideHFCs, PFCs, SF6, and NF3Greenhouse Gas Emissions Over TimeYearEmissions (million metric tons of CO2 equivalents)
plotly-logomark

Now we see a graph that looks more like the Figure 1 from EPA for GHG emissions.

What are the key insights from this figure? EPA states:

In 2022, U.S. greenhouse gas emissions totaled 6,343 million metric tons (14.0 trillion pounds) of carbon dioxide equivalents. This total represents a 3.0 percent decrease since 1990, down from a high of 15.2 percent above 1990 levels in 2007. The sharp decline in emissions from 2019 to 2020 was largely due to the impacts of the coronavirus (COVID-19) pandemic on travel and economic activity. Emissions increased from 2020 to 2022 by 5.7 percent, driven largely by an increase in carbon dioxide emissions from fossil fuel combustion due to economic activity rebounding after the height of the pandemic (Figure 1).

For the United States, during the period from 1990 to 2022 (Figure 1): - Emissions of carbon dioxide, the primary greenhouse gas emitted by human activities, decreased by 2 percent. - Methane emissions decreased by 19 percent, as reduced emissions from landfills, coal mines, and natural gas systems more than offset increases in emissions from activities such as livestock production. - Nitrous oxide emissions, predominantly from agricultural soil management practices such as the use of nitrogen as a fertilizer, decreased by 5 percent. - Emissions of fluorinated gases (hydrofluorocarbons, perfluorocarbons, sulfur hexafluoride, and nitrogen trifluoride), released as a result of commercial, industrial, and household uses, increased by 58 percent.

14.3 U.S. Greenhouse Gas Emissions and Sinks by Economic Sector, 1990–2022

EPA also plots the sourcs of U.S. Greenhouse Gas Emissions and Sinks by Economic Sector, 1990–2022.

GhG by sector: Source, EPA

Per the EPA,

This figure shows greenhouse gas emissions and sinks (negative values) by source in the United States from 1990 to 2022. For consistency, emissions are expressed in million metric tons of carbon dioxide equivalents. All electric power emissions are grouped together in the “Electric power” sector, so other sectors such as “Residential” and “Commercial” are only showing non-electric sources, such as burning oil or gas for heating. Totals do not match Figure 1 exactly because the economic sectors shown here do not include emissions from U.S. territories outside the 50 states.

Let us look at the sector wise contribution. Note the units are million metric tons of CO2 equivalents.

Before going into generating figures, we want to highlight how messy the real world data can be and how it can change from year to year, even it is from the exact same source.

While revising this version of the book, we have realized that the data has been updated from the 2022-07 version to 2024-06.

The nice thing with coding is that we just need to change the link. But we will first use the old link to highlight the importance of data cleaning issues.

Note, this is not the latest data but the July 2022 version of the data for Figure 2 in EPA U.S. GHG emissions.

Let us read the data as usual

url = "https://www.epa.gov/system/files/other-files/2022-07/us-ghg-emissions_fig-2.csv"
df = url_data_request(url, 6 )

As before, the first step with any dataset is always to get some basic info.

nrows, ncol = df.shape

We can see from df.shape that there are 31 rows and 8 variables.

df.columns
Index(['Unnamed: 0', 'Transportation', 'Electricity generation', 'Industry',
       'Agriculture', 'Commercial', 'Residential',
       'Land use, land-use change, and forestry (net sink)'],
      dtype='object')

We notice that the first column, Year is read as ‘Unnamed: 0’. Let us see if we notice anything else that is not what we expect

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31 entries, 0 to 30
Data columns (total 8 columns):
 #   Column                                              Non-Null Count  Dtype  
---  ------                                              --------------  -----  
 0   Unnamed: 0                                          31 non-null     int64  
 1   Transportation                                      31 non-null     float64
 2   Electricity generation                              31 non-null     float64
 3   Industry                                            31 non-null     float64
 4   Agriculture                                         31 non-null     float64
 5   Commercial                                          31 non-null     float64
 6   Residential                                         31 non-null     float64
 7   Land use, land-use change, and forestry (net sink)  31 non-null     object 
dtypes: float64(6), int64(1), object(1)
memory usage: 2.1+ KB

Land use, land-use change, and forestry (net sink) is read as an object. It suggests that the column may be read with different data types. So let us look at the data quickly to see what is happening.

df
Unnamed: 0 Transportation Electricity generation Industry Agriculture Commercial Residential Land use, land-use change, and forestry (net sink)
0 1990 1526.434289 1880.548218 1652.361348 596.778204 427.109795 345.070789 -860.6250574
1 1991 1480.303906 1874.832994 1625.461259 587.493379 434.321072 354.689750 -869.9082889
2 1992 1539.881716 1890.103637 1662.343992 588.301071 429.939070 361.222025 -862.4402032
3 1993 1576.887693 1965.574487 1631.906633 616.367720 423.366758 372.612499 -844.206428
4 1994 1631.842860 1990.300202 1657.477536 603.473044 426.583160 363.260147 -857.5259735
5 1995 1667.019780 2006.877479 1675.133984 615.669467 425.696364 367.523521 -831.7775006
6 1996 1723.249166 2079.434226 1705.045256 622.907953 433.307228 399.454803 -852.1428676
7 1997 1749.849330 2144.903577 1705.172770 611.328521 425.795877 380.828659 -833.4407232
8 1998 1792.295978 2231.446285 1677.755098 618.246994 400.541006 346.873254 -835.2578601
9 1999 1863.292087 2244.837093 1627.052157 609.289744 396.926502 366.684458 -828.0496898
10 2000 1913.550780 2351.125685 1616.991263 594.854300 411.195289 387.992539 -825.2288316
11 2001 1885.434887 2311.387024 1569.909847 615.526557 400.209078 378.076884 -828.8079892
12 2002 1925.980857 2326.759449 1550.961500 618.674248 401.730691 375.286669 -790.5485356
13 2003 1933.326749 2357.952283 1529.061153 618.962290 417.965117 393.728943 -820.7858422
14 2004 1965.852074 2390.768614 1577.981113 631.722629 415.777828 381.852435 -729.4523162
15 2005 1975.495129 2456.737090 1536.176420 626.253214 405.442432 371.015529 -789.7931676
16 2006 1975.865366 2401.231383 1570.640402 625.786501 391.901762 334.309400 -817.1760559
17 2007 1974.426677 2467.226239 1560.386569 642.720748 405.856004 355.254855 -776.3550104
18 2008 1870.906050 2414.114945 1501.553646 631.031054 413.166512 363.906982 -766.4712252
19 2009 1796.295049 2198.273322 1345.926535 632.860470 416.903785 354.545147 -73448.90%
20 2010 1802.276352 2313.956993 1438.778760 641.029259 419.290316 355.539184 -761.0363768
21 2011 1768.595360 2211.767283 1443.335476 622.717221 415.067661 349.093809 -800.7290152
22 2012 1748.912232 2073.785620 1440.040968 606.452578 396.025010 306.943124 -799.9251081
23 2013 1751.515531 2092.343183 1489.536433 645.157276 419.341960 357.731086 -767.4142623
24 2014 1785.407074 2092.532800 1474.546661 654.114340 429.600526 378.276992 -781.3816316
25 2015 1793.399368 1952.725752 1464.017871 656.101038 442.084720 351.469522 -700.0664105
26 2016 1828.045564 1860.497992 1424.355203 643.420271 426.906642 327.812635 -826.6421699
27 2017 1845.162609 1780.552030 1446.687191 644.395728 428.486050 329.880970 -781.2093236
28 2018 1874.726295 1799.826648 1507.647460 657.914982 444.233499 377.368858 -769.266566
29 2019 1874.291137 1650.961498 1521.665941 663.895669 452.139788 384.178158 -730.4876898
30 2020 1627.618555 1482.182865 1426.194536 635.105961 425.312514 361.952257 -758.9433076

We can immediately see that 2009 for Land use, land-use change, and forestry (net sink) is -73448.90%. For some reason, it is read as a ‘%’ and the value needs to be adjusted by a factor of 100.

There are never any clean data sets! We need to clean the data ourselves many a time. Let us get to it.

df.describe()
Unnamed: 0 Transportation Electricity generation Industry Agriculture Commercial Residential
count 31.000000 31.000000 31.000000 31.000000 31.000000 31.000000 31.000000
mean 2005.000000 1789.294855 2106.308610 1550.196935 625.114595 419.426581 362.401157
std 9.092121 138.775245 252.071204 96.760250 20.358572 14.814310 20.424532
min 1990.000000 1480.303906 1482.182865 1345.926535 587.493379 391.901762 306.943124
25% 1997.500000 1736.080699 1921.414694 1469.282266 613.427539 408.525647 353.007335
50% 2005.000000 1796.295049 2092.532800 1550.961500 622.907953 419.341960 363.260147
75% 2012.500000 1880.080591 2320.358221 1629.479395 641.875004 427.797922 377.722871
max 2020.000000 1975.865366 2467.226239 1705.172770 663.895669 452.139788 399.454803
df.rename(columns={'Unnamed: 0': "Year"}, inplace = True) 

#for some reason, 2009 data for "Land use, .." has an extra % that is creating issues;
df['Land use, land-use change, and forestry (net sink)'] = df['Land use, land-use change, and forestry (net sink)'].astype("str")

# # need to remove % manually or through regex for 2009
df['Land use, land-use change, and forestry (net sink)'] = df['Land use, land-use change, and forestry (net sink)'].str.replace('%','')

df['Land use, land-use change, and forestry (net sink)'] = df['Land use, land-use change, and forestry (net sink)'].astype("float")

# divide the specific value by 100 as it was in % before
df.loc[(df['Year'] == 2009), 'Land use, land-use change, and forestry (net sink)'] = df.loc[(df['Year'] == 2009), 'Land use, land-use change, and forestry (net sink)']*0.01

Let us unpack the code step-by-step:

  1. We renamed the column ‘Unnamed: 0’ to ‘Year’. The inplace=True parameter ensures that the change is made directly to the DataFrame df without needing to reassign it.
  2. We converted the ‘Land use, land-use change, and forestry (net sink)’ column to a string type. This is necessary because the next operation involves string manipulation.
  3. We removed any ‘%’ characters from the ‘Land use, land-use change, and forestry (net sink)’ column. The str.replace(‘%’, ’‘) method is used to replace’%’ with an empty string.
  4. We converted the ‘Land use, land-use change, and forestry (net sink)’ column back to a float type. This is necessary for numerical operations.
  5. We select the rows where the ‘Year’ is 2009 df.loc[(df[‘Year’] == 2009)]) and adjust the ‘Land use, land-use change, and forestry (net sink)’ values by dividing them by 100. This is because the original value was in percentage form.

We can do descriptive stats on the numerical variables.

df.describe()
Year Transportation Electricity generation Industry Agriculture Commercial Residential Land use, land-use change, and forestry (net sink)
count 31.000000 31.000000 31.000000 31.000000 31.000000 31.000000 31.000000 31.000000
mean 2005.000000 1789.294855 2106.308610 1550.196935 625.114595 419.426581 362.401157 -801.018853
std 9.092121 138.775245 252.071204 96.760250 20.358572 14.814310 20.424532 44.477461
min 1990.000000 1480.303906 1482.182865 1345.926535 587.493379 391.901762 306.943124 -869.908289
25% 1997.500000 1736.080699 1921.414694 1469.282266 613.427539 408.525647 353.007335 -832.609112
50% 2005.000000 1796.295049 2092.532800 1550.961500 622.907953 419.341960 363.260147 -800.729015
75% 2012.500000 1880.080591 2320.358221 1629.479395 641.875004 427.797922 377.722871 -768.340414
max 2020.000000 1975.865366 2467.226239 1705.172770 663.895669 452.139788 399.454803 -700.066410

Now, we have all the variables in numerical format, we can use plotly to plot the figures.

However, things are never this simple. We reran the code and updated the dataset in the latest revision (August 2024).

It appeared that all we had to do was change the URL to reflect the updated data.

url = "https://www.epa.gov/system/files/other-files/2024-06/us-ghg-emissions_fig-2.csv"

df = url_data_request(url, 6 )

Let us look at this updated data.

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33 entries, 0 to 32
Data columns (total 8 columns):
 #   Column                                              Non-Null Count  Dtype  
---  ------                                              --------------  -----  
 0   Unnamed: 0                                          33 non-null     int64  
 1   Transportation                                      33 non-null     object 
 2   Electric power industry                             33 non-null     object 
 3   Industry                                            33 non-null     object 
 4   Agriculture                                         33 non-null     float64
 5   Commercial                                          33 non-null     float64
 6   Residential                                         33 non-null     float64
 7   Land use, land-use change, and forestry (net sink)  33 non-null     float64
dtypes: float64(4), int64(1), object(3)
memory usage: 2.2+ KB

Now, we notice that ‘Land use, land-use change, and forestry (net sink)’ are in float64 numerical format but some other variables like ‘Transportation’, ‘Electric power industry’ and ‘Industry’ are in object data type. These were all float64 data type in the 2022 version of the data.

Our code would have given errors because we wouldn’t have been able to plot these variables.

There is always some changes to the data or data structure and it is always useful to check the data before running visualization or analysis code.

What seems to be the issue?

df
Unnamed: 0 Transportation Electric power industry Industry Agriculture Commercial Residential Land use, land-use change, and forestry (net sink)
0 1990 1,521.42 1,880.18 1,723.32 595.946054 447.006292 345.597115 -976.694756
1 1991 1,474.77 1,874.41 1,702.87 587.411997 454.452721 355.264350 -988.981068
2 1992 1,533.76 1,889.64 1,729.28 587.511141 449.970155 361.808133 -1007.385245
3 1993 1,570.22 1,964.99 1,701.00 608.733193 443.082240 373.147538 -991.346525
4 1994 1,624.55 1,989.60 1,719.67 612.158538 446.051460 363.773428 -1005.221528
5 1995 1,659.12 2,006.05 1,741.77 618.123912 444.491719 367.292045 -980.231228
6 1996 1,714.59 2,078.44 1,762.11 625.693883 451.625773 399.137915 -1011.174579
7 1997 1,740.50 2,143.76 1,765.41 610.728830 442.880099 380.425938 -985.194814
8 1998 1,782.67 2,230.09 1,757.76 621.023100 416.338725 346.522958 -996.334680
9 1999 1,853.62 2,243.49 1,703.94 614.986761 411.817874 366.325388 -968.475616
10 2000 1,903.74 2,350.06 1,699.09 607.226572 425.633761 387.642129 -983.687718
11 2001 1,875.44 2,310.78 1,631.35 622.405680 414.052081 377.790426 -979.752372
12 2002 1,916.20 2,326.47 1,613.80 628.402932 415.310841 375.113216 -924.826536
13 2003 1,923.66 2,357.97 1,593.64 627.684215 431.720081 393.678826 -957.908300
14 2004 1,956.16 2,391.13 1,639.83 633.535447 429.132623 381.913394 -860.183752
15 2005 1,965.92 2,457.45 1,587.26 634.303237 418.863933 371.189484 -907.696609
16 2006 1,966.34 2,402.37 1,624.49 638.728891 404.901417 334.363826 -951.935738
17 2007 1,967.19 2,468.32 1,611.63 654.903372 418.425064 355.266743 -900.321359
18 2008 1,863.44 2,415.24 1,571.71 643.038956 425.324512 363.827773 -912.806240
19 2009 1,789.02 2,198.66 1,415.50 639.829854 428.256574 353.866061 -850.877697
20 2010 1,795.15 2,313.10 1,488.57 647.142320 430.587609 355.005227 -886.324754
21 2011 1,762.38 2,210.22 1,488.39 641.226366 425.528968 348.859719 -941.730797
22 2012 1,743.52 2,072.66 1,473.16 624.385259 406.456125 306.480072 -929.233916
23 2013 1,746.77 2,090.99 1,533.83 658.485864 429.200747 357.202066 -886.612352
24 2014 1,780.99 2,091.04 1,526.88 660.868014 439.358052 377.618245 -923.660383
25 2015 1,789.41 1,951.70 1,506.06 657.352663 451.674786 350.689960 -820.192444
26 2016 1,824.49 1,859.33 1,456.23 650.363698 435.630707 327.026529 -916.798491
27 2017 1,841.92 1,779.36 1,478.41 658.503899 437.389191 329.161536 -926.002254
28 2018 1,871.61 1,799.18 1,541.87 683.533939 453.481029 376.819523 -915.499704
29 2019 1,874.55 1,650.75 1,531.80 661.035227 462.630952 384.210676 -863.576006
30 2020 1,625.28 1,482.17 1,435.91 640.048676 436.915126 358.042212 -904.394729
31 2021 1,805.47 1,584.45 1,455.80 645.876462 443.663063 369.609906 -910.553723
32 2022 1801.520909 1577.493448 1452.536166 633.964509 463.661569 391.301657 -854.238577

A quick look at the data shows that these variables have two decimals for every year except for 2022. Can this probably be the issue? In 2020, we see that these same variables always have a precision of 4 decimals. Can this be causing the variables to be read as an object now?

df.describe()
Unnamed: 0 Agriculture Commercial Residential Land use, land-use change, and forestry (net sink)
count 33.00000 33.000000 33.000000 33.000000 33.000000
mean 2006.00000 632.580711 434.409572 363.211334 -933.934985
std 9.66954 22.453809 15.853136 20.528880 51.739532
min 1990.00000 587.411997 404.901417 306.480072 -1011.174579
25% 1998.00000 618.123912 425.324512 353.866061 -980.231228
50% 2006.00000 633.964509 435.630707 363.827773 -926.002254
75% 2014.00000 647.142320 446.051460 377.618245 -904.394729
max 2022.00000 683.533939 463.661569 399.137915 -820.192444

We can’t use these variables in plotly until we make them numeric variables.

Thankfully, we just need to modify our old code slightly to adjust for these changes in data.

df['Transportation'] = df['Transportation'].astype("float")

It looks like this is not the issue! We get an error!

    1,584.45    1,455.80    645.876462  443.663063  369.609906  -910.553723
32  2022    1801.520909 1577.493448 1452.536166 633.964509  463.661569  391.301657  -854.238577
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File /Users/sc/Library/CloudStorage/GoogleDrive-nextgen360ga@gmail.com/My Drive/NextGen_Web/Climate_P1/Chapters/Part2/GHG.qmd:2
----> 1 df['Transportation'] = df['Transportation'].astype("float")

File /usr/local/lib/python3.11/site-packages/pandas/core/generic.py:6324, in NDFrame.astype(self, dtype, copy, errors)
   6317     results = [
   6318         self.iloc[:, i].astype(dtype, copy=copy)
   6319         for i in range(len(self.columns))
   6320     ]
   6322 else:
   6323     # else, only a single dtype is given
-> 6324     new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   6325     return self._constructor(new_data).__finalize__(self, method="astype")
   6327 # GH 33113: handle empty frame or series

File /usr/local/lib/python3.11/site-packages/pandas/core/internals/managers.py:451, in BaseBlockManager.astype(self, dtype, copy, errors)
    448 elif using_copy_on_write():
    449     copy = False
--> 451 return self.apply(
    452     "astype",
    453     dtype=dtype,
    454     copy=copy,
    455     errors=errors,
    456     using_cow=using_copy_on_write(),
    457 )

File /usr/local/lib/python3.11/site-packages/pandas/core/internals/managers.py:352, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
    350         applied = b.apply(f, **kwargs)
    351     else:
--> 352         applied = getattr(b, f)(**kwargs)
    353     result_blocks = extend_blocks(applied, result_blocks)
    355 out = type(self).from_blocks(result_blocks, self.axes)

File /usr/local/lib/python3.11/site-packages/pandas/core/internals/blocks.py:511, in Block.astype(self, dtype, copy, errors, using_cow)
    491 """
    492 Coerce to the new dtype.
    493 
   (...)
    507 Block
    508 """
    509 values = self.values
--> 511 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
    513 new_values = maybe_coerce_values(new_values)
    515 refs = None

File /usr/local/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:242, in astype_array_safe(values, dtype, copy, errors)
    239     dtype = dtype.numpy_dtype
    241 try:
--> 242     new_values = astype_array(values, dtype, copy=copy)
    243 except (ValueError, TypeError):
    244     # e.g. _astype_nansafe can fail on object-dtype of strings
    245     #  trying to convert to float
    246     if errors == "ignore":

File /usr/local/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:187, in astype_array(values, dtype, copy)
    184     values = values.astype(dtype, copy=copy)
    186 else:
--> 187     values = _astype_nansafe(values, dtype, copy=copy)
    189 # in pandas we don't store numpy str dtypes, so convert to object
    190 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):

File /usr/local/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:138, in _astype_nansafe(arr, dtype, copy, skipna)
    134     raise ValueError(msg)
    136 if copy or is_object_dtype(arr.dtype) or is_object_dtype(dtype):
    137     # Explicit copy, or required since NumPy can't view from / to object.
--> 138     return arr.astype(dtype, copy=True)
    140 return arr.astype(dtype, copy=copy)

ValueError: could not convert string to float: '1,521.42'

We can now see the issue. Ignoring everything, we can see from the error message that

  1. —-> 1 df[‘Transportation’] = df[‘Transportation’].astype(“float”). This line of code gave the error.
  2. The last line of error hints at the problem
ValueError: could not convert string to float: '1,521.42'

It appears that the values are formatted as numbers with a “,” to signify thousands. This is causing pandas to read the variable as an object data type. If this is the issue, it is easy to fix!

df['Transportation'] = df['Transportation'].str.replace(',', '').astype("float")

What did we do?

  • We used str.replace method to remove commas from the ‘Transportation’ column.
  • We converted the cleaned strings to float using the astype method.
  • We chained the methods so that the code is shorter.

Did it work? Lets do descriptive statistics on the numerical variables.

df.describe()
Unnamed: 0 Transportation Agriculture Commercial Residential Land use, land-use change, and forestry (net sink)
count 33.00000 33.000000 33.000000 33.000000 33.000000 33.000000
mean 2006.00000 1783.799725 632.580711 434.409572 363.211334 -933.934985
std 9.66954 133.640242 22.453809 15.853136 20.528880 51.739532
min 1990.00000 1474.770000 587.411997 404.901417 306.480072 -1011.174579
25% 1998.00000 1740.500000 618.123912 425.324512 353.866061 -980.231228
50% 2006.00000 1795.150000 633.964509 435.630707 363.827773 -926.002254
75% 2014.00000 1874.550000 647.142320 446.051460 377.618245 -904.394729
max 2022.00000 1967.190000 683.533939 463.661569 399.137915 -820.192444

We see now the descriptive statistics for ‘Transportation’.

Lets do the same for the other variables that are objects as it appears that the ‘,’ is causing the issue.

df['Electric power industry'] = df['Electric power industry'].str.replace(',', '').astype("float")
df['Industry'] = df['Industry'].str.replace(',', '').astype("float")

Let us rename the Unnamed column also.

df.rename(columns={'Unnamed: 0': "Year"}, inplace = True) 

The full code for the 2024 version of the data is

df.rename(columns={'Unnamed: 0': "Year"}, inplace = True) 
df['Transportation'] = df['Transportation'].str.replace(',', '').astype("float")
df['Electric power industry'] = df['Electric power industry'].str.replace(',', '').astype("float")
df['Industry'] = df['Industry'].str.replace(',', '').astype("float")

We can see that the data is as we want.

df.describe()
Year Transportation Electric power industry Industry Agriculture Commercial Residential Land use, land-use change, and forestry (net sink)
count 33.00000 33.000000 33.000000 33.000000 33.000000 33.000000 33.000000 33.000000
mean 2006.00000 1783.799725 2073.986165 1595.905338 632.580711 434.409572 363.211334 -933.934985
std 9.66954 133.640242 275.443797 111.344067 22.453809 15.853136 20.528880 51.739532
min 1990.00000 1474.770000 1482.170000 1415.500000 587.411997 404.901417 306.480072 -1011.174579
25% 1998.00000 1740.500000 1880.180000 1488.570000 618.123912 425.324512 353.866061 -980.231228
50% 2006.00000 1795.150000 2090.990000 1593.640000 633.964509 435.630707 363.827773 -926.002254
75% 2014.00000 1874.550000 2313.100000 1702.870000 647.142320 446.051460 377.618245 -904.394729
max 2022.00000 1967.190000 2468.320000 1765.410000 683.533939 463.661569 399.137915 -820.192444

There are always some small changes to the code that we need to make or if some debugging if the code doesn’t work.

Now that we have the data, we can plot it easily.

import plotly.express as px

df.dtypes
listcol = df.columns.tolist()
listcol.remove('Year')

fig = px.area(df, x = 'Year', y = listcol)
fig.show()
199020002010202001000200030004000500060007000
variableTransportationElectric power industryIndustryAgricultureCommercialResidentialLand use, land-use change, and forestry (net sink)Yearvalue
plotly-logomark

unfortunately, this is not what we wanted. The ‘Land use, land-use change, and forestry (net sink)’ is negative and needs to show up below with a negative y-axes.

We have plotted a similar figure in the cite temperature chapter and we can adapt it. One thing with spending some time to understand and code is that we will probably need it some where down the line.

By the way, between the 2022-07 version of the data and 2024-06 version of the data, the variable name for ‘Electricity generation’ has been changed to ‘Electric power industry’. Honestly, we didn’t notice. But when our old code ran into an error, we had to fix it.

What did we say! Real world data is messy! Some analyst formats the data in a different format or changes the definition of the variables or something else changes!

import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Create figure with secondary y-axis
fig = make_subplots(shared_yaxes=True)

# Add traces
fig.add_trace(go.Scatter(x=df["Year"], y= df['Electric power industry'], name="Electricity generation", fill='tonexty',    stackgroup='one'))
fig.add_trace(go.Scatter(x=df["Year"], y= df['Transportation'], name="Transportation", fill='tonexty',     stackgroup='one'))
fig.add_trace(go.Scatter(x=df["Year"], y= df['Industry'], name="Industry", fill='tozeroy',     stackgroup='one'))
fig.add_trace(go.Scatter(x=df["Year"], y= df['Agriculture'], name="Agriculture", fill='tonexty',     stackgroup='one'))
fig.add_trace(go.Scatter(x=df["Year"], y= df['Commercial'], name="Commercial", fill='tonexty',    stackgroup='one'))
fig.add_trace(go.Scatter(x=df["Year"], y= df['Residential'], name="Residential", fill='tonexty',     stackgroup='one'))

fig.add_trace(
    go.Scatter(x=df["Year"], y= df["Land use, land-use change, and forestry (net sink)"], name="Land Use %", fill='tozeroy')
)

fig.update_layout(
    yaxis_title = "Emissions (mn metric tons of carbon dioxide equivalents)",
    xaxis_title = "Year",
    title="U.S. Greenhouse Gas Emissions and Sinks by Economic Sector, 1990–2022", title_x=0.5,
    hovermode="x",
    legend=dict(x=0.3,y=0.5),
    barmode='relative'
)

fig.show()
1990199520002005201020152020−100001000200030004000500060007000
Land Use %ResidentialCommercialAgricultureIndustryTransportationElectricity generationU.S. Greenhouse Gas Emissions and Sinks by Economic Sector, 1990–2022YearEmissions (mn metric tons of carbon dioxide equivalents)
plotly-logomark

Let us unpack the code as we have made some changes from the previous code

  1. As usual, we imported the necessary libraries: plotly.graph_objects and make_subplots
  2. Make_subplots(shared_yaxes=True): Creates a subplot figure where all subplots share the same y-axis.
  3. We needed to use stackgroup so that it is cumulative. We did this by adding traces.
    • go.Scatter: Creates a scatter plot.
    • x=df[“Year”]: Sets the x-axis data to the “Year” column of the dataframe df.
    • y=df[‘Electric power industry’]: Sets the y-axis data to the “Electric power industry” column of the dataframe df.
    • name: Sets the name of the trace, which will appear in the legend.
    • fill: Specifies how the area under the line should be filled.
    • stackgroup: Groups the traces together for stacking.
  4. We do the same for other columns.
  5. We add a separate trace that is similar to the others but represents “Land use, land-use change, and forestry (net sink)”. This is the only variable with negative values.
  6. Then we updated the layout by
    • yaxis_title: Sets the title for the y-axis.
    • xaxis_title: Sets the title for the x-axis.
    • title: Sets the main title of the plot.
    • title_x=0.5: Centers the title.
    • hovermode=“x”: Displays hover information for all traces at the x-coordinate.
    • legend=dict(x=0.3,y=0.5): Positions the legend.
    • barmode=‘relative’: Sets the bar mode to ‘relative’, useful for stacked bar charts.

This is in absolute value of the emissions. Let us do relative value also

import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Create figure with secondary y-axis
fig = make_subplots(shared_yaxes=True)

# Add traces
fig.add_trace(go.Scatter(x=df["Year"], y= df['Electric power industry'], name="Electricity generation %", fill='tonexty',    stackgroup='one', groupnorm='percent'))
fig.add_trace(go.Scatter(x=df["Year"], y= df['Transportation'], name="Transportation %", fill='tonexty',     stackgroup='one'))
fig.add_trace(go.Scatter(x=df["Year"], y= df['Industry'], name="Industry %", fill='tozeroy',     stackgroup='one'))
fig.add_trace(go.Scatter(x=df["Year"], y= df['Agriculture'], name="Agriculture %", fill='tonexty',     stackgroup='one'))
fig.add_trace(go.Scatter(x=df["Year"], y= df['Commercial'], name="Commercial %", fill='tonexty',    stackgroup='one'))
fig.add_trace(go.Scatter(x=df["Year"], y= df['Residential'], name="Residential %", fill='tonexty',     stackgroup='one'))

fig.update_layout(
    yaxis_title = "Emissions (mn metric tons of carbon dioxide equivalents)",
    xaxis_title = "Year",
    title= "<b><span> U.S. Greenhouse Gas Emissions and Sinks by Economic Sector, 1990–2022 <br> </span> <span>(Relative Contribution)</span></b>", title_x=0.5,
    hovermode="x",
    legend=dict(x=0.3,y=0.5),
    barmode='relative'
)

fig.show()
1990199520002005201020152020020406080100
Residential %Commercial %Agriculture %Industry %Transportation %Electricity generation % U.S. Greenhouse Gas Emissions and Sinks by Economic Sector, 1990–2022 (Relative Contribution)YearEmissions (mn metric tons of carbon dioxide equivalents)
plotly-logomark

In this figure, we have normalized all the emissions so that they add up to 100% (we have removed the negative land sink values). We can clearly see from the figure that the order of emissions contributions is

  1. Electricity generation used to be top sector till 2016 and now in second place
  2. Transportation used to be in second place, and now has moved to first place
  3. Industry

These three sectors together account for almost 80% of U.S. Green House Gas emissions. The rest is made up of

  1. Agriculture
  2. Commercial
  3. Residential sectors

We will dig deeper in to the contribution of Electricity and Transportation in later chapters. But it is obvious that in order to reach a meaningul reduction in GHG emissions, the emissions from these three sectors have to come down significantly.

What are the key points from EPA for this figure?

Among the various sectors of the U.S. economy, transportation accounts for the largest share—28.4 percent—of 2022 emissions. Transportation has been the largest contributing sector since 2017. Electric power (power plants) accounts for the next largest share of 2022 emissions, accounting for approximately 25 percent of emissions. Electric power has historically been the largest sector, accounting for 30 percent of emissions since 1990 (Figure 2).

Removals from sinks, the opposite of emissions from sources, absorb or sequester carbon dioxide from the atmosphere. In 2022, 13 percent of U.S. greenhouse gas emissions were offset by net sinks resulting from land use and forestry practices (Figure 2). One major sink is the net growth of forests, including urban trees, which remove carbon from the atmosphere. Other carbon sinks are associated with how people manage and use the land, including the practice of depositing yard trimmings and food scraps in landfills. While the land use, land-use change, and forestry category represents an overall net sink of carbon dioxide in the United States, this category also includes emission sources resulting from activities such as wildfires, converting land to cropland, and emissions from flooded lands such as reservoirs.

14.3.1 Emissions per Capita

EPA also shows the emissions per capita and per dollar of GDP and how they have changed over time.

EPA per capita and per $GDP

EPA’s legend for the figure is

This figure shows trends in greenhouse gas emissions from 1990 to 2022 per capita (heavy orange line), based on the total U.S. population (thin orange line). It also shows trends in emissions per dollar of real GDP (heavy blue line). Real GDP (thin blue line) is the value of all goods and services produced in the country during a given year, adjusted for inflation. All data are indexed to 1990 as the base year, which is assigned a value of 100. For instance, a real GDP value of 217 in the year 2022 would represent a 117 percent increase since 1990.

Let us get to plotting the data.

By the way, we commented out the old URL as new data beame available during the revision.

#url = "https://www.epa.gov/system/files/other-files/2022-07/us-ghg-emissions_fig-3.csv"
url = "https://www.epa.gov/system/files/other-files/2024-06/us-ghg-emissions_fig-3.csv"
df_ghg_percapita = url_data_request(url, 6)

df_ghg_percapita.dtypes
df_ghg_percapita

listcol = df_ghg_percapita.columns.tolist()
listcol.remove('Year')

fig = px.line(df_ghg_percapita, x = 'Year', y = listcol)

fig.update_layout(
    yaxis_title="Index Value (1990=100)",
    title='U.S. Greenhouse Gas Emissions per Capita and per Dollar of GDP, 1990–2021 ', title_x=0.5,
    hovermode="x",
    legend=dict(x=0.2,y=1)
)
fig.update_layout(legend={'title_text':''})
fig.show()
1990199520002005201020152020406080100120140160180200220
Real GDPPopulationEmissions per capitaEmissions per $GDPU.S. Greenhouse Gas Emissions per Capita and per Dollar of GDP, 1990–2021YearIndex Value (1990=100)
plotly-logomark

We can plot this data also by making minor modifications to the code that showed earlier. It should be pretty straigh forward by now. We have used a line graph this time (px.line).

From the figure, we can see, as EPA mentions in the key points:

Emissions increased at about the same rate as the population from 1990 to 2007, which caused emissions per capita to remain fairly level (Figure 3). Total emissions and emissions per capita declined from 2007 to 2009, due in part to a drop in U.S. economic production during this time. Emissions decreased again from 2010 to 2012 and continued downward largely due to the growing use of natural gas and renewables to generate electricity in place of more carbon-intensive fuels.

From 1990 to 2022, greenhouse gas emissions per dollar of goods and services produced by the U.S. economy (the gross domestic product or GDP) declined by 55 percent (Figure 3). This change may reflect a combination of increased energy efficiency and structural changes in the economy.

13  About Part-2

Sahasra Chava

 
  • Edit this page
  • Report an issue
  • View source

NextGen360.