14 Greenhouse Gas Emissions

In this section, we will analyze Greenhouse Gas (GHG) emissions.

What are Greenhouse Gases and why do we care about them?

As before, we will rely on EPA or other governmental agencies, multinational agencies or some scientific articles for evidence.

EPA highlights that:

Gases that trap heat in the atmosphere are called greenhouse gases. As greenhouse gas emissions from human activities increase, they build up in the atmosphere and warm the climate, leading to many other changes around the world—in the atmosphere, on land, and in the oceans.

EPA gives detailed information on Greenhouse Gases at https://www.epa.gov/ghgemissions/overview-greenhouse-gases. There main Greenhouse Gases that we will analyze are:

Carbon dioxide (CO2): Carbon dioxide enters the atmosphere through burning fossil fuels (coal, natural gas, and oil), solid waste, trees and other biological materials, and also as a result of certain chemical reactions (e.g., cement production). Carbon dioxide is removed from the atmosphere (or “sequestered”) when it is absorbed by plants as part of the biological carbon cycle.

Methane (CH4): Methane is emitted during the production and transport of coal, natural gas, and oil. Methane emissions also result from livestock and other agricultural practices, land use, and by the decay of organic waste in municipal solid waste landfills.

Nitrous oxide (N2O): Nitrous oxide is emitted during agricultural, land use, and industrial activities; combustion of fossil fuels and solid waste; as well as during treatment of wastewater.

Fluorinated gases: Hydrofluorocarbons, perfluorocarbons, sulfur hexafluoride, and nitrogen trifluoride are synthetic, powerful greenhouse gases that are emitted from a variety of household, commercial, and industrial applications and processes.

We can see the composition of total GHG emissions as of 2022 for the U.S. from EPA.

Total U.S. Emissions in 2022 = 6,343 Million Metric Tons of CO₂ equivalent.

14.1 Carbon Dioxide (CO2)

Lets us start with CO2. Why do we care about CO2?

First, CO2 accounts for almost 80% of the Greenhouse Gases.

Second, there is a steep increase in the CO2 over the past 10,000 years. The increase over the past 100 years or so is alarming if the trends continue.

Before diving deeper into the sources and causes of CO2 emissions, let us first reproduce the EPA graphs on CO2 emissions.

This figure shows emissions of carbon dioxide, methane, nitrous oxide, and several fluorinated gases in the United States from 1990 to 2022. For consistency, emissions are expressed in million metric tons of carbon dioxide equivalents.

We will repeat the steps that we have undertaken in part 1 of the book to reproduce the figure. So, we won’t give detailed explanations as many of these are covered before.

The data for the figure is available at https://www.epa.gov/system/files/other-files/2024-06/us-ghg-emissions_fig-1.csv

import pandas as pd

url = "https://www.epa.gov/system/files/other-files/2024-06/us-ghg-emissions_fig-1.csv"

from io import StringIO
import pandas as pd
import requests

def url_data_request(url, skiprows):
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/118.0",
    }
    f = StringIO(requests.get(url, headers=headers).text)
    df_url = pd.read_csv(f, skiprows = skiprows)
    return df_url

df = url_data_request(url, 6)

What does the code do? As we moved into Part 2 and AI agents have become much better, we can ask Copilot to explain the code. We edited the explanation to make it shorter.

The provided code snippet is written in Python and is used to fetch and process a CSV file from a given URL.
The code begins by importing the necessary libraries. The pandas library is imported twice, which is redundant, but it is a powerful data manipulation library used to handle the CSV data.
- Our mistake! We missed it, but Copilot doesn’t! There is no harm in importing twice but it is redundant.
The StringIO class from the io module is imported to handle string data as file-like objects, and the requests library is imported to make HTTP requests.
The URL of the CSV file is stored in the variable url. This URL points to a CSV file hosted on the EPA’s website, which contains data on U.S. greenhouse gas emissions.
A function named url_data_request is defined to fetch and process the CSV data from the URL. This function takes two parameters: url and skiprows.
- The url parameter is the URL of the CSV file, and skiprows specifies the number of rows to skip at the beginning of the file.
- This is useful for skipping metadata or header information that is not part of the actual data.
Inside the function, a dictionary named headers is defined to mimic a web browser’s request. This is done to avoid potential issues with web servers that block requests from non-browser clients.
The requests.get function is used to fetch the content of the URL, and the response text is passed to StringIO to create a file-like object. This object is then read into a Pandas DataFrame using the pd.read_csv function, with the skiprows parameter to skip the specified number of rows.
Finally, the function returns the DataFrame containing the CSV data.
The url_data_request function is called with the URL and skiprows set to 6, and the resulting DataFrame is stored in the variable df.
This DataFrame now contains the processed data from the CSV file, ready for further analysis or visualization.

As you can see, the explanation is more detailed and useful than what we would have written. So we decided to use it, with some minor edits.

As before, the first step with any dataset is always to get some basic info.

nrows, ncol = df.shape

We can see from df.shape that there are 33 rows and 5 variables.

We can do descriptive stats on the numerical variables.

df.describe()

	Year	Methane	Nitrous oxide	HFCs, PFCs, SF6, and NF3
count	33.00000	33.000000	33.000000	33.000000
mean	2006.00000	802.654633	417.133412	163.509376
std	9.66954	47.816996	13.205529	20.770491
min	1990.00000	702.354745	389.707858	118.265427
25%	1998.00000	764.386625	410.943810	155.868137
50%	2006.00000	806.645870	419.175093	169.045426
75%	2014.00000	835.477487	423.413845	176.571344
max	2022.00000	879.645413	439.607792	198.128038

As we can see, the data is from 1990-2022 and includes emissions data on methane, nitrous oxide, and several fluorinated gases in the United States (re expressed in million metric tons of carbon dioxide equivalents).

That’s only 4 variables or columns, including year. Where did the fifth column go?

We should have first checked the data types in the dataset.

df.dtypes

Year                          int64
Carbon dioxide               object
Methane                     float64
Nitrous oxide               float64
HFCs, PFCs, SF6, and NF3    float64
dtype: object

We can see that Carbon dioxide (CO2) is an object and that’s why the df.describe() didn’t give descriptive statistics on it.

Why is it an object type? We have no idea!

Looking at the data, all the values look numerical.

df

	Year	Carbon dioxide	Methane	Nitrous oxide	HFCs, PFCs, SF6, and NF3
0	1990	5,131.65	871.658825	408.151400	125.454537
1	1991	5,076.30	879.645413	398.477140	118.265427
2	1992	5,178.73	877.381636	397.294008	122.945240
3	1993	5,279.48	864.852491	418.461318	123.662121
4	1994	5,361.82	873.401867	419.175093	128.238674
5	1995	5,425.80	866.586056	423.495394	146.256594
6	1996	5,601.06	861.737879	435.519013	157.943424
7	1997	5,673.11	847.921716	419.608268	168.341026
8	1998	5,736.33	835.477487	423.373150	183.810398
9	1999	5,800.25	815.812634	418.744132	183.826627
10	2000	6,023.13	815.701845	410.943810	170.107309
11	2001	5,903.53	807.084109	421.725702	154.972910
12	2002	5,947.79	801.091353	422.934556	158.077394
13	2003	6,006.42	801.128955	423.413845	155.928303
14	2004	6,111.52	796.433638	428.505590	155.868137
15	2005	6,126.86	795.429181	419.189952	153.153855
16	2006	6,045.30	806.645870	420.562160	156.667115
17	2007	6,121.52	810.254941	431.642027	166.486877
18	2008	5,919.21	821.582197	416.514826	169.045426
19	2009	5,486.09	806.775967	410.066543	163.987454
20	2010	5,668.74	807.598702	418.263291	171.403236
21	2011	5,540.33	778.288480	419.850882	173.296084
22	2012	5,332.81	764.063618	392.793611	171.237077
23	2013	5,473.66	766.205413	434.022492	171.519475
24	2014	5,529.88	762.391457	439.607792	173.968019
25	2015	5,368.29	764.386625	426.965957	176.571344
26	2016	5,246.40	744.590524	413.408295	175.634227
27	2017	5,196.54	759.480075	417.818944	177.188295
28	2018	5,362.19	771.481067	439.453215	179.624212
29	2019	5,234.49	754.341139	416.377483	184.922055
30	2020	4,688.97	735.348832	391.167731	186.321505
31	2021	5,017.20	720.468152	398.167133	192.956988
32	2022	5053.018689	702.354745	389.707858	198.128038

All the values are included with two decimals whereas the value for 2022 has 6 decimals. We suspect this may be an issue. But we don’t know.

AI is smart but real world is messy! There are always some cases where the data is slightly different or there is some nuance that makes the AI generated code not work. So, it is still useful to learn and understand coding!

Let us read object type as a float type variable.

```{python}
df['Carbon dioxide'] = df['Carbon dioxide'].astype(float)
```

Unfortunately, it doesn’t work :(

We get an error.

 ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[19], line 1
----> 1 df['Carbon dioxide'] = df['Carbon dioxide'].astype(float)

File /usr/local/lib/python3.11/site-packages/pandas/core/generic.py:6324, in NDFrame.astype(self, dtype, copy, errors)
   6317     results = [
   6318         self.iloc[:, i].astype(dtype, copy=copy)
   6319         for i in range(len(self.columns))
   6320     ]
   6322 else:
   6323     # else, only a single dtype is given
-> 6324     new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   6325     return self._constructor(new_data).__finalize__(self, method="astype")
   6327 # GH 33113: handle empty frame or series

File /usr/local/lib/python3.11/site-packages/pandas/core/internals/managers.py:451, in BaseBlockManager.astype(self, dtype, copy, errors)
    448 elif using_copy_on_write():
    449     copy = False
--> 451 return self.apply(
    452     "astype",
    453     dtype=dtype,
    454     copy=copy,
    455     errors=errors,
    456     using_cow=using_copy_on_write(),
    457 )

File /usr/local/lib/python3.11/site-packages/pandas/core/internals/managers.py:352, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
    350         applied = b.apply(f, **kwargs)
    351     else:
--> 352         applied = getattr(b, f)(**kwargs)
    353     result_blocks = extend_blocks(applied, result_blocks)
    355 out = type(self).from_blocks(result_blocks, self.axes)

File /usr/local/lib/python3.11/site-packages/pandas/core/internals/blocks.py:511, in Block.astype(self, dtype, copy, errors, using_cow)
    491 """
    492 Coerce to the new dtype.
    493 
   (...)
    507 Block
    508 """
    509 values = self.values
--> 511 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
    513 new_values = maybe_coerce_values(new_values)
    515 refs = None

File /usr/local/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:242, in astype_array_safe(values, dtype, copy, errors)
    239     dtype = dtype.numpy_dtype
    241 try:
--> 242     new_values = astype_array(values, dtype, copy=copy)
    243 except (ValueError, TypeError):
    244     # e.g. _astype_nansafe can fail on object-dtype of strings
    245     #  trying to convert to float
    246     if errors == "ignore":

File /usr/local/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:187, in astype_array(values, dtype, copy)
    184     values = values.astype(dtype, copy=copy)
    186 else:
--> 187     values = _astype_nansafe(values, dtype, copy=copy)
    189 # in pandas we don't store numpy str dtypes, so convert to object
    190 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):

File /usr/local/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:138, in _astype_nansafe(arr, dtype, copy, skipna)
    134     raise ValueError(msg)
    136 if copy or is_object_dtype(arr.dtype) or is_object_dtype(dtype):
    137     # Explicit copy, or required since NumPy can't view from / to object.
--> 138     return arr.astype(dtype, copy=True)
    140 return arr.astype(dtype, copy=copy)

ValueError: could not convert string to float: '5,131.65'

Didn’t we say real world data is messy!!!

We have to do some snooping but it appears that 5,131.65 is a read as a string, for whatever reason.

Now that we know there are some values of ‘Carbon dioxide’ column / variable stored as ‘string’, we can do some data cleaning.

df['Carbon dioxide'] = df['Carbon dioxide'].str.replace(',', '') 
df['Carbon dioxide'] = df['Carbon dioxide'].str.strip() 

df['Carbon dioxide'] = df['Carbon dioxide'].astype(float)

Basically, we removed commas and leading and trailing empty spaced and then read the string as a float.

Did it work? Let us check!

df.describe()

	Year	Carbon dioxide	Methane	Nitrous oxide	HFCs, PFCs, SF6, and NF3
count	33.00000	33.000000	33.000000	33.000000	33.000000
mean	2006.00000	5535.406627	802.654633	417.133412	163.509376
std	9.66954	377.903314	47.816996	13.205529	20.770491
min	1990.00000	4688.970000	702.354745	389.707858	118.265427
25%	1998.00000	5246.400000	764.386625	410.943810	155.868137
50%	2006.00000	5486.090000	806.645870	419.175093	169.045426
75%	2014.00000	5903.530000	835.477487	423.413845	176.571344
max	2022.00000	6126.860000	879.645413	439.607792	198.128038

Now, we see descriptive statistics for ‘Carbon dioxide’ also! The code worked!!

It is easy to plot these using Plotly.

Let us focus on ‘Carbon dioxide’ so that we can see if there are significant increases over time.

import plotly.express as px

fig = px.line(df, x='Year', y=['Carbon dioxide'], 
              labels={'value': 'Emissions (million metric tons of CO2 equivalents)', 'variable': 'Gas'},
              title='CO2 Emissions Over Time')

fig.update_layout(
    yaxis_title="Emissions (million metric tons of CO2 equivalents)",
    xaxis_title="Year",
    title_x=0.5,
    hovermode="x",
    legend=dict(title="CO2 Emissions", x=0.2, y=0.3)  
)
fig.show()

Now, we can see that the CO2 emissions steadily increased from 1990 to 2007 or so but decreased later and now at around the 1990 levels. Scale matters for graphing!

14.2 Other GHG Emissions

Let us see the other GHG that are at a different scale than CO2 (that accounts for approximately 80% of the GHG emissions)

import plotly.express as px

fig = px.line(df, x='Year', y=['Carbon dioxide', 'Methane', 'Nitrous oxide', 'HFCs, PFCs, SF6, and NF3'], 
              labels={'value': 'Emissions (million metric tons of CO2 equivalents)', 'variable': 'Gas'},
              title='Greenhouse Gas Emissions Over Time')

fig.update_layout(
    yaxis_title="Emissions (million metric tons of CO2 equivalents)",
    xaxis_title="Year",
    title_x=0.5,
    hovermode="x",
    legend=dict(title="GhG Emissions", x=0.2, y=0.3)  
)
    
fig.show()

This graph is not as useful as ‘Carbon dioxide’ emissions dominate and relative to that, every other gas emission seems smaller.

We can just focus on other GHG gases in the figure to see the trends clearly.

import plotly.express as px

fig = px.line(df, x='Year', y=['Methane', 'Nitrous oxide', 'HFCs, PFCs, SF6, and NF3'], 
              labels={'value': 'Emissions (million metric tons of CO2 equivalents)', 'variable': 'Gas'},
              title='Greenhouse Gas Emissions Over Time')

fig.update_layout(
    yaxis_title="Emissions (million metric tons of CO2 equivalents)",
    xaxis_title="Year",
    title_x=0.5,
    hovermode="x",
    legend=dict(title="GhG Emissions", x=0.1, y=0.8)  
)
fig.show()

We see that ‘Nitrous Oxide’ is more or less stable over the period 1990-2022.

‘HFCs’ seem to have increased over the 1990-2022 time period.

Now that we see the different GhG emissions on different scales, let us combine them.

import plotly.express as px

fig = px.area(df, x='Year', y=['Carbon dioxide', 'Methane', 'Nitrous oxide', 'HFCs, PFCs, SF6, and NF3'], 
              labels={'value': 'Emissions (million metric tons of CO2 equivalents)', 'variable': 'Gas'},
              title='Greenhouse Gas Emissions Over Time')

fig.update_layout(
    yaxis_title="Emissions (million metric tons of CO2 equivalents)",
    xaxis_title="Year",
    title_x=0.5,
    hovermode="x",
    legend=dict(title="GhG Emissions", x=0.3, y=0.3)  
)
fig.show()

Now we see a graph that looks more like the Figure 1 from EPA for GHG emissions.

What are the key insights from this figure? EPA states:

In 2022, U.S. greenhouse gas emissions totaled 6,343 million metric tons (14.0 trillion pounds) of carbon dioxide equivalents. This total represents a 3.0 percent decrease since 1990, down from a high of 15.2 percent above 1990 levels in 2007. The sharp decline in emissions from 2019 to 2020 was largely due to the impacts of the coronavirus (COVID-19) pandemic on travel and economic activity. Emissions increased from 2020 to 2022 by 5.7 percent, driven largely by an increase in carbon dioxide emissions from fossil fuel combustion due to economic activity rebounding after the height of the pandemic (Figure 1).

For the United States, during the period from 1990 to 2022 (Figure 1): - Emissions of carbon dioxide, the primary greenhouse gas emitted by human activities, decreased by 2 percent. - Methane emissions decreased by 19 percent, as reduced emissions from landfills, coal mines, and natural gas systems more than offset increases in emissions from activities such as livestock production. - Nitrous oxide emissions, predominantly from agricultural soil management practices such as the use of nitrogen as a fertilizer, decreased by 5 percent. - Emissions of fluorinated gases (hydrofluorocarbons, perfluorocarbons, sulfur hexafluoride, and nitrogen trifluoride), released as a result of commercial, industrial, and household uses, increased by 58 percent.

14.3 U.S. Greenhouse Gas Emissions and Sinks by Economic Sector, 1990–2022

EPA also plots the sourcs of U.S. Greenhouse Gas Emissions and Sinks by Economic Sector, 1990–2022.

Per the EPA,

This figure shows greenhouse gas emissions and sinks (negative values) by source in the United States from 1990 to 2022. For consistency, emissions are expressed in million metric tons of carbon dioxide equivalents. All electric power emissions are grouped together in the “Electric power” sector, so other sectors such as “Residential” and “Commercial” are only showing non-electric sources, such as burning oil or gas for heating. Totals do not match Figure 1 exactly because the economic sectors shown here do not include emissions from U.S. territories outside the 50 states.

Let us look at the sector wise contribution. Note the units are million metric tons of CO2 equivalents.

Before going into generating figures, we want to highlight how messy the real world data can be and how it can change from year to year, even it is from the exact same source.

While revising this version of the book, we have realized that the data has been updated from the 2022-07 version to 2024-06.

The nice thing with coding is that we just need to change the link. But we will first use the old link to highlight the importance of data cleaning issues.

Note, this is not the latest data but the July 2022 version of the data for Figure 2 in EPA U.S. GHG emissions.

Let us read the data as usual

url = "https://www.epa.gov/system/files/other-files/2022-07/us-ghg-emissions_fig-2.csv"
df = url_data_request(url, 6 )

As before, the first step with any dataset is always to get some basic info.

nrows, ncol = df.shape

We can see from df.shape that there are 31 rows and 8 variables.

df.columns

Index(['Unnamed: 0', 'Transportation', 'Electricity generation', 'Industry',
       'Agriculture', 'Commercial', 'Residential',
       'Land use, land-use change, and forestry (net sink)'],
      dtype='object')

We notice that the first column, Year is read as ‘Unnamed: 0’. Let us see if we notice anything else that is not what we expect

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31 entries, 0 to 30
Data columns (total 8 columns):
 #   Column                                              Non-Null Count  Dtype  
---  ------                                              --------------  -----  
 0   Unnamed: 0                                          31 non-null     int64  
 1   Transportation                                      31 non-null     float64
 2   Electricity generation                              31 non-null     float64
 3   Industry                                            31 non-null     float64
 4   Agriculture                                         31 non-null     float64
 5   Commercial                                          31 non-null     float64
 6   Residential                                         31 non-null     float64
 7   Land use, land-use change, and forestry (net sink)  31 non-null     object 
dtypes: float64(6), int64(1), object(1)
memory usage: 2.1+ KB

Land use, land-use change, and forestry (net sink) is read as an object. It suggests that the column may be read with different data types. So let us look at the data quickly to see what is happening.

df

	Unnamed: 0	Transportation	Electricity generation	Industry	Agriculture	Commercial	Residential	Land use, land-use change, and forestry (net sink)
0	1990	1526.434289	1880.548218	1652.361348	596.778204	427.109795	345.070789	-860.6250574
1	1991	1480.303906	1874.832994	1625.461259	587.493379	434.321072	354.689750	-869.9082889
2	1992	1539.881716	1890.103637	1662.343992	588.301071	429.939070	361.222025	-862.4402032
3	1993	1576.887693	1965.574487	1631.906633	616.367720	423.366758	372.612499	-844.206428
4	1994	1631.842860	1990.300202	1657.477536	603.473044	426.583160	363.260147	-857.5259735
5	1995	1667.019780	2006.877479	1675.133984	615.669467	425.696364	367.523521	-831.7775006
6	1996	1723.249166	2079.434226	1705.045256	622.907953	433.307228	399.454803	-852.1428676
7	1997	1749.849330	2144.903577	1705.172770	611.328521	425.795877	380.828659	-833.4407232
8	1998	1792.295978	2231.446285	1677.755098	618.246994	400.541006	346.873254	-835.2578601
9	1999	1863.292087	2244.837093	1627.052157	609.289744	396.926502	366.684458	-828.0496898
10	2000	1913.550780	2351.125685	1616.991263	594.854300	411.195289	387.992539	-825.2288316
11	2001	1885.434887	2311.387024	1569.909847	615.526557	400.209078	378.076884	-828.8079892
12	2002	1925.980857	2326.759449	1550.961500	618.674248	401.730691	375.286669	-790.5485356
13	2003	1933.326749	2357.952283	1529.061153	618.962290	417.965117	393.728943	-820.7858422
14	2004	1965.852074	2390.768614	1577.981113	631.722629	415.777828	381.852435	-729.4523162
15	2005	1975.495129	2456.737090	1536.176420	626.253214	405.442432	371.015529	-789.7931676
16	2006	1975.865366	2401.231383	1570.640402	625.786501	391.901762	334.309400	-817.1760559
17	2007	1974.426677	2467.226239	1560.386569	642.720748	405.856004	355.254855	-776.3550104
18	2008	1870.906050	2414.114945	1501.553646	631.031054	413.166512	363.906982	-766.4712252
19	2009	1796.295049	2198.273322	1345.926535	632.860470	416.903785	354.545147	-73448.90%
20	2010	1802.276352	2313.956993	1438.778760	641.029259	419.290316	355.539184	-761.0363768
21	2011	1768.595360	2211.767283	1443.335476	622.717221	415.067661	349.093809	-800.7290152
22	2012	1748.912232	2073.785620	1440.040968	606.452578	396.025010	306.943124	-799.9251081
23	2013	1751.515531	2092.343183	1489.536433	645.157276	419.341960	357.731086	-767.4142623
24	2014	1785.407074	2092.532800	1474.546661	654.114340	429.600526	378.276992	-781.3816316
25	2015	1793.399368	1952.725752	1464.017871	656.101038	442.084720	351.469522	-700.0664105
26	2016	1828.045564	1860.497992	1424.355203	643.420271	426.906642	327.812635	-826.6421699
27	2017	1845.162609	1780.552030	1446.687191	644.395728	428.486050	329.880970	-781.2093236
28	2018	1874.726295	1799.826648	1507.647460	657.914982	444.233499	377.368858	-769.266566
29	2019	1874.291137	1650.961498	1521.665941	663.895669	452.139788	384.178158	-730.4876898
30	2020	1627.618555	1482.182865	1426.194536	635.105961	425.312514	361.952257	-758.9433076

We can immediately see that 2009 for Land use, land-use change, and forestry (net sink) is -73448.90%. For some reason, it is read as a ‘%’ and the value needs to be adjusted by a factor of 100.

There are never any clean data sets! We need to clean the data ourselves many a time. Let us get to it.

df.describe()

	Unnamed: 0	Transportation	Electricity generation	Industry	Agriculture	Commercial	Residential
count	31.000000	31.000000	31.000000	31.000000	31.000000	31.000000	31.000000
mean	2005.000000	1789.294855	2106.308610	1550.196935	625.114595	419.426581	362.401157
std	9.092121	138.775245	252.071204	96.760250	20.358572	14.814310	20.424532
min	1990.000000	1480.303906	1482.182865	1345.926535	587.493379	391.901762	306.943124
25%	1997.500000	1736.080699	1921.414694	1469.282266	613.427539	408.525647	353.007335
50%	2005.000000	1796.295049	2092.532800	1550.961500	622.907953	419.341960	363.260147
75%	2012.500000	1880.080591	2320.358221	1629.479395	641.875004	427.797922	377.722871
max	2020.000000	1975.865366	2467.226239	1705.172770	663.895669	452.139788	399.454803

df.rename(columns={'Unnamed: 0': "Year"}, inplace = True) 

#for some reason, 2009 data for "Land use, .." has an extra % that is creating issues;
df['Land use, land-use change, and forestry (net sink)'] = df['Land use, land-use change, and forestry (net sink)'].astype("str")

# # need to remove % manually or through regex for 2009
df['Land use, land-use change, and forestry (net sink)'] = df['Land use, land-use change, and forestry (net sink)'].str.replace('%','')

df['Land use, land-use change, and forestry (net sink)'] = df['Land use, land-use change, and forestry (net sink)'].astype("float")

# divide the specific value by 100 as it was in % before
df.loc[(df['Year'] == 2009), 'Land use, land-use change, and forestry (net sink)'] = df.loc[(df['Year'] == 2009), 'Land use, land-use change, and forestry (net sink)']*0.01

Let us unpack the code step-by-step:

We renamed the column ‘Unnamed: 0’ to ‘Year’. The inplace=True parameter ensures that the change is made directly to the DataFrame df without needing to reassign it.
We converted the ‘Land use, land-use change, and forestry (net sink)’ column to a string type. This is necessary because the next operation involves string manipulation.
We removed any ‘%’ characters from the ‘Land use, land-use change, and forestry (net sink)’ column. The str.replace(‘%’, ’‘) method is used to replace’%’ with an empty string.
We converted the ‘Land use, land-use change, and forestry (net sink)’ column back to a float type. This is necessary for numerical operations.
We select the rows where the ‘Year’ is 2009 df.loc[(df[‘Year’] == 2009)]) and adjust the ‘Land use, land-use change, and forestry (net sink)’ values by dividing them by 100. This is because the original value was in percentage form.

We can do descriptive stats on the numerical variables.

df.describe()

	Year	Transportation	Electricity generation	Industry	Agriculture	Commercial	Residential	Land use, land-use change, and forestry (net sink)
count	31.000000	31.000000	31.000000	31.000000	31.000000	31.000000	31.000000	31.000000
mean	2005.000000	1789.294855	2106.308610	1550.196935	625.114595	419.426581	362.401157	-801.018853
std	9.092121	138.775245	252.071204	96.760250	20.358572	14.814310	20.424532	44.477461
min	1990.000000	1480.303906	1482.182865	1345.926535	587.493379	391.901762	306.943124	-869.908289
25%	1997.500000	1736.080699	1921.414694	1469.282266	613.427539	408.525647	353.007335	-832.609112
50%	2005.000000	1796.295049	2092.532800	1550.961500	622.907953	419.341960	363.260147	-800.729015
75%	2012.500000	1880.080591	2320.358221	1629.479395	641.875004	427.797922	377.722871	-768.340414
max	2020.000000	1975.865366	2467.226239	1705.172770	663.895669	452.139788	399.454803	-700.066410

Now, we have all the variables in numerical format, we can use plotly to plot the figures.

However, things are never this simple. We reran the code and updated the dataset in the latest revision (August 2024).

It appeared that all we had to do was change the URL to reflect the updated data.

url = "https://www.epa.gov/system/files/other-files/2024-06/us-ghg-emissions_fig-2.csv"

df = url_data_request(url, 6 )

Let us look at this updated data.

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33 entries, 0 to 32
Data columns (total 8 columns):
 #   Column                                              Non-Null Count  Dtype  
---  ------                                              --------------  -----  
 0   Unnamed: 0                                          33 non-null     int64  
 1   Transportation                                      33 non-null     object 
 2   Electric power industry                             33 non-null     object 
 3   Industry                                            33 non-null     object 
 4   Agriculture                                         33 non-null     float64
 5   Commercial                                          33 non-null     float64
 6   Residential                                         33 non-null     float64
 7   Land use, land-use change, and forestry (net sink)  33 non-null     float64
dtypes: float64(4), int64(1), object(3)
memory usage: 2.2+ KB

Now, we notice that ‘Land use, land-use change, and forestry (net sink)’ are in float64 numerical format but some other variables like ‘Transportation’, ‘Electric power industry’ and ‘Industry’ are in object data type. These were all float64 data type in the 2022 version of the data.

Our code would have given errors because we wouldn’t have been able to plot these variables.

There is always some changes to the data or data structure and it is always useful to check the data before running visualization or analysis code.

What seems to be the issue?

df

	Unnamed: 0	Transportation	Electric power industry	Industry	Agriculture	Commercial	Residential	Land use, land-use change, and forestry (net sink)
0	1990	1,521.42	1,880.18	1,723.32	595.946054	447.006292	345.597115	-976.694756
1	1991	1,474.77	1,874.41	1,702.87	587.411997	454.452721	355.264350	-988.981068
2	1992	1,533.76	1,889.64	1,729.28	587.511141	449.970155	361.808133	-1007.385245
3	1993	1,570.22	1,964.99	1,701.00	608.733193	443.082240	373.147538	-991.346525
4	1994	1,624.55	1,989.60	1,719.67	612.158538	446.051460	363.773428	-1005.221528
5	1995	1,659.12	2,006.05	1,741.77	618.123912	444.491719	367.292045	-980.231228
6	1996	1,714.59	2,078.44	1,762.11	625.693883	451.625773	399.137915	-1011.174579
7	1997	1,740.50	2,143.76	1,765.41	610.728830	442.880099	380.425938	-985.194814
8	1998	1,782.67	2,230.09	1,757.76	621.023100	416.338725	346.522958	-996.334680
9	1999	1,853.62	2,243.49	1,703.94	614.986761	411.817874	366.325388	-968.475616
10	2000	1,903.74	2,350.06	1,699.09	607.226572	425.633761	387.642129	-983.687718
11	2001	1,875.44	2,310.78	1,631.35	622.405680	414.052081	377.790426	-979.752372
12	2002	1,916.20	2,326.47	1,613.80	628.402932	415.310841	375.113216	-924.826536
13	2003	1,923.66	2,357.97	1,593.64	627.684215	431.720081	393.678826	-957.908300
14	2004	1,956.16	2,391.13	1,639.83	633.535447	429.132623	381.913394	-860.183752
15	2005	1,965.92	2,457.45	1,587.26	634.303237	418.863933	371.189484	-907.696609
16	2006	1,966.34	2,402.37	1,624.49	638.728891	404.901417	334.363826	-951.935738
17	2007	1,967.19	2,468.32	1,611.63	654.903372	418.425064	355.266743	-900.321359
18	2008	1,863.44	2,415.24	1,571.71	643.038956	425.324512	363.827773	-912.806240
19	2009	1,789.02	2,198.66	1,415.50	639.829854	428.256574	353.866061	-850.877697
20	2010	1,795.15	2,313.10	1,488.57	647.142320	430.587609	355.005227	-886.324754
21	2011	1,762.38	2,210.22	1,488.39	641.226366	425.528968	348.859719	-941.730797
22	2012	1,743.52	2,072.66	1,473.16	624.385259	406.456125	306.480072	-929.233916
23	2013	1,746.77	2,090.99	1,533.83	658.485864	429.200747	357.202066	-886.612352
24	2014	1,780.99	2,091.04	1,526.88	660.868014	439.358052	377.618245	-923.660383
25	2015	1,789.41	1,951.70	1,506.06	657.352663	451.674786	350.689960	-820.192444
26	2016	1,824.49	1,859.33	1,456.23	650.363698	435.630707	327.026529	-916.798491
27	2017	1,841.92	1,779.36	1,478.41	658.503899	437.389191	329.161536	-926.002254
28	2018	1,871.61	1,799.18	1,541.87	683.533939	453.481029	376.819523	-915.499704
29	2019	1,874.55	1,650.75	1,531.80	661.035227	462.630952	384.210676	-863.576006
30	2020	1,625.28	1,482.17	1,435.91	640.048676	436.915126	358.042212	-904.394729
31	2021	1,805.47	1,584.45	1,455.80	645.876462	443.663063	369.609906	-910.553723
32	2022	1801.520909	1577.493448	1452.536166	633.964509	463.661569	391.301657	-854.238577

A quick look at the data shows that these variables have two decimals for every year except for 2022. Can this probably be the issue? In 2020, we see that these same variables always have a precision of 4 decimals. Can this be causing the variables to be read as an object now?

df.describe()

	Unnamed: 0	Agriculture	Commercial	Residential	Land use, land-use change, and forestry (net sink)
count	33.00000	33.000000	33.000000	33.000000	33.000000
mean	2006.00000	632.580711	434.409572	363.211334	-933.934985
std	9.66954	22.453809	15.853136	20.528880	51.739532
min	1990.00000	587.411997	404.901417	306.480072	-1011.174579
25%	1998.00000	618.123912	425.324512	353.866061	-980.231228
50%	2006.00000	633.964509	435.630707	363.827773	-926.002254
75%	2014.00000	647.142320	446.051460	377.618245	-904.394729
max	2022.00000	683.533939	463.661569	399.137915	-820.192444

We can’t use these variables in plotly until we make them numeric variables.

Thankfully, we just need to modify our old code slightly to adjust for these changes in data.

df['Transportation'] = df['Transportation'].astype("float")

It looks like this is not the issue! We get an error!

    1,584.45    1,455.80    645.876462  443.663063  369.609906  -910.553723
32  2022    1801.520909 1577.493448 1452.536166 633.964509  463.661569  391.301657  -854.238577
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File /Users/sc/Library/CloudStorage/GoogleDrive-nextgen360ga@gmail.com/My Drive/NextGen_Web/Climate_P1/Chapters/Part2/GHG.qmd:2
----> 1 df['Transportation'] = df['Transportation'].astype("float")

File /usr/local/lib/python3.11/site-packages/pandas/core/generic.py:6324, in NDFrame.astype(self, dtype, copy, errors)
   6317     results = [
   6318         self.iloc[:, i].astype(dtype, copy=copy)
   6319         for i in range(len(self.columns))
   6320     ]
   6322 else:
   6323     # else, only a single dtype is given
-> 6324     new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   6325     return self._constructor(new_data).__finalize__(self, method="astype")
   6327 # GH 33113: handle empty frame or series

File /usr/local/lib/python3.11/site-packages/pandas/core/internals/managers.py:451, in BaseBlockManager.astype(self, dtype, copy, errors)
    448 elif using_copy_on_write():
    449     copy = False
--> 451 return self.apply(
    452     "astype",
    453     dtype=dtype,
    454     copy=copy,
    455     errors=errors,
    456     using_cow=using_copy_on_write(),
    457 )

File /usr/local/lib/python3.11/site-packages/pandas/core/internals/managers.py:352, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
    350         applied = b.apply(f, **kwargs)
    351     else:
--> 352         applied = getattr(b, f)(**kwargs)
    353     result_blocks = extend_blocks(applied, result_blocks)
    355 out = type(self).from_blocks(result_blocks, self.axes)

File /usr/local/lib/python3.11/site-packages/pandas/core/internals/blocks.py:511, in Block.astype(self, dtype, copy, errors, using_cow)
    491 """
    492 Coerce to the new dtype.
    493 
   (...)
    507 Block
    508 """
    509 values = self.values
--> 511 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
    513 new_values = maybe_coerce_values(new_values)
    515 refs = None

File /usr/local/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:242, in astype_array_safe(values, dtype, copy, errors)
    239     dtype = dtype.numpy_dtype
    241 try:
--> 242     new_values = astype_array(values, dtype, copy=copy)
    243 except (ValueError, TypeError):
    244     # e.g. _astype_nansafe can fail on object-dtype of strings
    245     #  trying to convert to float
    246     if errors == "ignore":

File /usr/local/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:187, in astype_array(values, dtype, copy)
    184     values = values.astype(dtype, copy=copy)
    186 else:
--> 187     values = _astype_nansafe(values, dtype, copy=copy)
    189 # in pandas we don't store numpy str dtypes, so convert to object
    190 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):

File /usr/local/lib/python3.11/site-packages/pandas/core/dtypes/astype.py:138, in _astype_nansafe(arr, dtype, copy, skipna)
    134     raise ValueError(msg)
    136 if copy or is_object_dtype(arr.dtype) or is_object_dtype(dtype):
    137     # Explicit copy, or required since NumPy can't view from / to object.
--> 138     return arr.astype(dtype, copy=True)
    140 return arr.astype(dtype, copy=copy)

ValueError: could not convert string to float: '1,521.42'

We can now see the issue. Ignoring everything, we can see from the error message that

—-> 1 df[‘Transportation’] = df[‘Transportation’].astype(“float”). This line of code gave the error.
The last line of error hints at the problem

ValueError: could not convert string to float: '1,521.42'

It appears that the values are formatted as numbers with a “,” to signify thousands. This is causing pandas to read the variable as an object data type. If this is the issue, it is easy to fix!

df['Transportation'] = df['Transportation'].str.replace(',', '').astype("float")

What did we do?

We used str.replace method to remove commas from the ‘Transportation’ column.
We converted the cleaned strings to float using the astype method.
We chained the methods so that the code is shorter.

Did it work? Lets do descriptive statistics on the numerical variables.

df.describe()

	Unnamed: 0	Transportation	Agriculture	Commercial	Residential	Land use, land-use change, and forestry (net sink)
count	33.00000	33.000000	33.000000	33.000000	33.000000	33.000000
mean	2006.00000	1783.799725	632.580711	434.409572	363.211334	-933.934985
std	9.66954	133.640242	22.453809	15.853136	20.528880	51.739532
min	1990.00000	1474.770000	587.411997	404.901417	306.480072	-1011.174579
25%	1998.00000	1740.500000	618.123912	425.324512	353.866061	-980.231228
50%	2006.00000	1795.150000	633.964509	435.630707	363.827773	-926.002254
75%	2014.00000	1874.550000	647.142320	446.051460	377.618245	-904.394729
max	2022.00000	1967.190000	683.533939	463.661569	399.137915	-820.192444

We see now the descriptive statistics for ‘Transportation’.

Lets do the same for the other variables that are objects as it appears that the ‘,’ is causing the issue.

df['Electric power industry'] = df['Electric power industry'].str.replace(',', '').astype("float")
df['Industry'] = df['Industry'].str.replace(',', '').astype("float")

Let us rename the Unnamed column also.

df.rename(columns={'Unnamed: 0': "Year"}, inplace = True)

The full code for the 2024 version of the data is

df.rename(columns={'Unnamed: 0': "Year"}, inplace = True) 
df['Transportation'] = df['Transportation'].str.replace(',', '').astype("float")
df['Electric power industry'] = df['Electric power industry'].str.replace(',', '').astype("float")
df['Industry'] = df['Industry'].str.replace(',', '').astype("float")

We can see that the data is as we want.

df.describe()

	Year	Transportation	Electric power industry	Industry	Agriculture	Commercial	Residential	Land use, land-use change, and forestry (net sink)
count	33.00000	33.000000	33.000000	33.000000	33.000000	33.000000	33.000000	33.000000
mean	2006.00000	1783.799725	2073.986165	1595.905338	632.580711	434.409572	363.211334	-933.934985
std	9.66954	133.640242	275.443797	111.344067	22.453809	15.853136	20.528880	51.739532
min	1990.00000	1474.770000	1482.170000	1415.500000	587.411997	404.901417	306.480072	-1011.174579
25%	1998.00000	1740.500000	1880.180000	1488.570000	618.123912	425.324512	353.866061	-980.231228
50%	2006.00000	1795.150000	2090.990000	1593.640000	633.964509	435.630707	363.827773	-926.002254
75%	2014.00000	1874.550000	2313.100000	1702.870000	647.142320	446.051460	377.618245	-904.394729
max	2022.00000	1967.190000	2468.320000	1765.410000	683.533939	463.661569	399.137915	-820.192444

There are always some small changes to the code that we need to make or if some debugging if the code doesn’t work.

Now that we have the data, we can plot it easily.

import plotly.express as px

df.dtypes
listcol = df.columns.tolist()
listcol.remove('Year')

fig = px.area(df, x = 'Year', y = listcol)
fig.show()

unfortunately, this is not what we wanted. The ‘Land use, land-use change, and forestry (net sink)’ is negative and needs to show up below with a negative y-axes.

We have plotted a similar figure in the cite temperature chapter and we can adapt it. One thing with spending some time to understand and code is that we will probably need it some where down the line.

By the way, between the 2022-07 version of the data and 2024-06 version of the data, the variable name for ‘Electricity generation’ has been changed to ‘Electric power industry’. Honestly, we didn’t notice. But when our old code ran into an error, we had to fix it.

What did we say! Real world data is messy! Some analyst formats the data in a different format or changes the definition of the variables or something else changes!

import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Create figure with secondary y-axis
fig = make_subplots(shared_yaxes=True)

# Add traces
fig.add_trace(go.Scatter(x=df["Year"], y= df['Electric power industry'], name="Electricity generation", fill='tonexty',    stackgroup='one'))
fig.add_trace(go.Scatter(x=df["Year"], y= df['Transportation'], name="Transportation", fill='tonexty',     stackgroup='one'))
fig.add_trace(go.Scatter(x=df["Year"], y= df['Industry'], name="Industry", fill='tozeroy',     stackgroup='one'))
fig.add_trace(go.Scatter(x=df["Year"], y= df['Agriculture'], name="Agriculture", fill='tonexty',     stackgroup='one'))
fig.add_trace(go.Scatter(x=df["Year"], y= df['Commercial'], name="Commercial", fill='tonexty',    stackgroup='one'))
fig.add_trace(go.Scatter(x=df["Year"], y= df['Residential'], name="Residential", fill='tonexty',     stackgroup='one'))

fig.add_trace(
    go.Scatter(x=df["Year"], y= df["Land use, land-use change, and forestry (net sink)"], name="Land Use %", fill='tozeroy')
)

fig.update_layout(
    yaxis_title = "Emissions (mn metric tons of carbon dioxide equivalents)",
    xaxis_title = "Year",
    title="U.S. Greenhouse Gas Emissions and Sinks by Economic Sector, 1990–2022", title_x=0.5,
    hovermode="x",
    legend=dict(x=0.3,y=0.5),
    barmode='relative'
)

fig.show()

Let us unpack the code as we have made some changes from the previous code

As usual, we imported the necessary libraries: plotly.graph_objects and make_subplots
Make_subplots(shared_yaxes=True): Creates a subplot figure where all subplots share the same y-axis.
We needed to use stackgroup so that it is cumulative. We did this by adding traces.
- go.Scatter: Creates a scatter plot.
- x=df[“Year”]: Sets the x-axis data to the “Year” column of the dataframe df.
- y=df[‘Electric power industry’]: Sets the y-axis data to the “Electric power industry” column of the dataframe df.
- name: Sets the name of the trace, which will appear in the legend.
- fill: Specifies how the area under the line should be filled.
- stackgroup: Groups the traces together for stacking.
We do the same for other columns.
We add a separate trace that is similar to the others but represents “Land use, land-use change, and forestry (net sink)”. This is the only variable with negative values.
Then we updated the layout by
- yaxis_title: Sets the title for the y-axis.
- xaxis_title: Sets the title for the x-axis.
- title: Sets the main title of the plot.
- title_x=0.5: Centers the title.
- hovermode=“x”: Displays hover information for all traces at the x-coordinate.
- legend=dict(x=0.3,y=0.5): Positions the legend.
- barmode=‘relative’: Sets the bar mode to ‘relative’, useful for stacked bar charts.

This is in absolute value of the emissions. Let us do relative value also

import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Create figure with secondary y-axis
fig = make_subplots(shared_yaxes=True)

# Add traces
fig.add_trace(go.Scatter(x=df["Year"], y= df['Electric power industry'], name="Electricity generation %", fill='tonexty',    stackgroup='one', groupnorm='percent'))
fig.add_trace(go.Scatter(x=df["Year"], y= df['Transportation'], name="Transportation %", fill='tonexty',     stackgroup='one'))
fig.add_trace(go.Scatter(x=df["Year"], y= df['Industry'], name="Industry %", fill='tozeroy',     stackgroup='one'))
fig.add_trace(go.Scatter(x=df["Year"], y= df['Agriculture'], name="Agriculture %", fill='tonexty',     stackgroup='one'))
fig.add_trace(go.Scatter(x=df["Year"], y= df['Commercial'], name="Commercial %", fill='tonexty',    stackgroup='one'))
fig.add_trace(go.Scatter(x=df["Year"], y= df['Residential'], name="Residential %", fill='tonexty',     stackgroup='one'))

fig.update_layout(
    yaxis_title = "Emissions (mn metric tons of carbon dioxide equivalents)",
    xaxis_title = "Year",
    title= "<b><span> U.S. Greenhouse Gas Emissions and Sinks by Economic Sector, 1990–2022 <br> </span> <span>(Relative Contribution)</span></b>", title_x=0.5,
    hovermode="x",
    legend=dict(x=0.3,y=0.5),
    barmode='relative'
)

fig.show()

In this figure, we have normalized all the emissions so that they add up to 100% (we have removed the negative land sink values). We can clearly see from the figure that the order of emissions contributions is

Electricity generation used to be top sector till 2016 and now in second place
Transportation used to be in second place, and now has moved to first place
Industry

These three sectors together account for almost 80% of U.S. Green House Gas emissions. The rest is made up of

Agriculture
Commercial
Residential sectors

We will dig deeper in to the contribution of Electricity and Transportation in later chapters. But it is obvious that in order to reach a meaningul reduction in GHG emissions, the emissions from these three sectors have to come down significantly.

What are the key points from EPA for this figure?

Among the various sectors of the U.S. economy, transportation accounts for the largest share—28.4 percent—of 2022 emissions. Transportation has been the largest contributing sector since 2017. Electric power (power plants) accounts for the next largest share of 2022 emissions, accounting for approximately 25 percent of emissions. Electric power has historically been the largest sector, accounting for 30 percent of emissions since 1990 (Figure 2).

Removals from sinks, the opposite of emissions from sources, absorb or sequester carbon dioxide from the atmosphere. In 2022, 13 percent of U.S. greenhouse gas emissions were offset by net sinks resulting from land use and forestry practices (Figure 2). One major sink is the net growth of forests, including urban trees, which remove carbon from the atmosphere. Other carbon sinks are associated with how people manage and use the land, including the practice of depositing yard trimmings and food scraps in landfills. While the land use, land-use change, and forestry category represents an overall net sink of carbon dioxide in the United States, this category also includes emission sources resulting from activities such as wildfires, converting land to cropland, and emissions from flooded lands such as reservoirs.

14.3.1 Emissions per Capita

EPA also shows the emissions per capita and per dollar of GDP and how they have changed over time.

EPA’s legend for the figure is

This figure shows trends in greenhouse gas emissions from 1990 to 2022 per capita (heavy orange line), based on the total U.S. population (thin orange line). It also shows trends in emissions per dollar of real GDP (heavy blue line). Real GDP (thin blue line) is the value of all goods and services produced in the country during a given year, adjusted for inflation. All data are indexed to 1990 as the base year, which is assigned a value of 100. For instance, a real GDP value of 217 in the year 2022 would represent a 117 percent increase since 1990.

Let us get to plotting the data.

By the way, we commented out the old URL as new data beame available during the revision.

#url = "https://www.epa.gov/system/files/other-files/2022-07/us-ghg-emissions_fig-3.csv"
url = "https://www.epa.gov/system/files/other-files/2024-06/us-ghg-emissions_fig-3.csv"
df_ghg_percapita = url_data_request(url, 6)

df_ghg_percapita.dtypes
df_ghg_percapita

listcol = df_ghg_percapita.columns.tolist()
listcol.remove('Year')

fig = px.line(df_ghg_percapita, x = 'Year', y = listcol)

fig.update_layout(
    yaxis_title="Index Value (1990=100)",
    title='U.S. Greenhouse Gas Emissions per Capita and per Dollar of GDP, 1990–2021 ', title_x=0.5,
    hovermode="x",
    legend=dict(x=0.2,y=1)
)
fig.update_layout(legend={'title_text':''})
fig.show()

We can plot this data also by making minor modifications to the code that showed earlier. It should be pretty straigh forward by now. We have used a line graph this time (px.line).

From the figure, we can see, as EPA mentions in the key points:

Emissions increased at about the same rate as the population from 1990 to 2007, which caused emissions per capita to remain fairly level (Figure 3). Total emissions and emissions per capita declined from 2007 to 2009, due in part to a drop in U.S. economic production during this time. Emissions decreased again from 2010 to 2012 and continued downward largely due to the growing use of natural gas and renewables to generate electricity in place of more carbon-intensive fuels.

From 1990 to 2022, greenhouse gas emissions per dollar of goods and services produced by the U.S. economy (the gross domestic product or GDP) declined by 55 percent (Figure 3). This change may reflect a combination of increased energy efficiency and structural changes in the economy.