17 Disaster Declarations by FEMA

Let us start with the understanding the extreme weather events and their impact on us.

The Federal Emergency Management Agency (FEMA) is a great place to start. FEMA’s mission is helping people before, during and after disasters. FEMA employs more than 20,000 people nationwide.

17.1 Reading the FEMA Data

A great way to understand our environment is to look at the disasters that have happened there! In this section, we are going to be working with the OpenFEMA Dataset: Disaster Declaration Summaries dataset.

The data set is available at https://www.fema.gov/openfema-data-page/disaster-declarations-summaries-v2.

In this dataset, all federally declared disasters since 1953 (wow!) are described.

Using the information we learned from the Basic Pandas section in the Pandas primer, let’s download and read the file.

Let’s start! First, follow the link, scroll down to full data, and download the csv file. Then, create a new folder and keep it in it so it is easily accessible. This will be useful later on too.

import pandas as pd

In this line, we are importing the package Pandas so that we can see and manipulate the data. We said “as pd” to give it a shorter name or alias (pd) whenever we decide to use it. This will make more sense as we keep coding.

Note: Once we import pandas once, we don’t need to do it again. If your kernel restarts however, you will need to run this line again for your code to work.

fema_df = pd.read_csv('https://www.fema.gov/api/open/v2/DisasterDeclarationsSummaries.csv')
print(fema_df.head())

  femaDeclarationString  disasterNumber state declarationType  \
0            FM-5530-NV            5530    NV              FM   
1            FM-5529-OR            5529    OR              FM   
2            FM-5528-OR            5528    OR              FM   
3            FM-5527-OR            5527    OR              FM   
4            FM-5526-CO            5526    CO              FM   

            declarationDate  fyDeclared incidentType      declarationTitle  \
0  2024-08-12T00:00:00.000Z        2024         Fire       GOLD RANCH FIRE   
1  2024-08-09T00:00:00.000Z        2024         Fire        LEE FALLS FIRE   
2  2024-08-06T00:00:00.000Z        2024         Fire         ELK LANE FIRE   
3  2024-08-02T00:00:00.000Z        2024         Fire  MILE MARKER 132 FIRE   
4  2024-08-01T00:00:00.000Z        2024         Fire           QUARRY FIRE   

   ihProgramDeclared  iaProgramDeclared  ...  placeCode       designatedArea  \
0                  0                  0  ...      99031      Washoe (County)   
1                  0                  0  ...      99067  Washington (County)   
2                  0                  0  ...      99031   Jefferson (County)   
3                  0                  0  ...      99017   Deschutes (County)   
4                  0                  0  ...      99059   Jefferson (County)   

  declarationRequestNumber lastIAFilingDate  incidentId  region  \
0                    24123              NaN  2024081201       9   
1                    24122              NaN  2024081001      10   
2                    24116              NaN  2024080701      10   
3                    24111              NaN  2024080301      10   
4                    24106              NaN  2024080102       8   

   designatedIncidentTypes               lastRefresh  \
0                        R  2024-08-27T18:22:14.800Z   
1                        R  2024-08-27T18:22:14.800Z   
2                        R  2024-08-27T18:22:14.800Z   
3                        R  2024-08-27T18:22:14.800Z   
4                        R  2024-08-27T18:22:14.800Z   

                                       hash  \
0  5d07e7c51bb300bfbec94a699a1e1ab1d61a97cd   
1  ae87cf3c6ed795015b714af7166c7c295b2b67c7   
2  432cf0995c47e3895cea696ede5621b810460501   
3  2f21d90cb6bc64b0d4121aa3f18d852bbb4b11fa   
4  e753ba692156f389dbe19f7a1c332d04ae145f74   

                                     id  
0  f15a7a79-f1c3-41bb-8a5c-c05fbae34423  
1  09e3f81a-5e16-4b72-b317-1c64e0cfa59c  
2  59983f89-30bf-4888-b21b-62e8d57d9aac  
3  8d13ecf0-bc2f-496b-8c9f-b2e73da832a0  
4  17c24d4a-49a9-4cac-9322-e5427c4cdfeb  

[5 rows x 28 columns]

/var/folders/8t/bwrtv74x3vg8g7hbvcpx86sr0000gn/T/ipykernel_66231/3810111583.py:1: DtypeWarning:

Columns (21) have mixed types. Specify dtype option on import or set low_memory=False.

What are we doing in the code?

In the first line, we are reading the CSV URL using Pandas.
You can get the URL from the website again by just copying the CSV link. Then, we are assigning it to the variable fema_df with df standing for DataFrame.
In the second line, we are printing the first five rows (head) of the data. This is how we normally preview our data.

Do you notice that when you ran the code, it took a while to execute and the data displayed seems a bit harder to read?

Let’s try doing the same thing except with our downloaded CSV file and see if there is any difference.

Download the data locally. Make sure that the full path name is given if the file is not in the same directory.

fema_df = pd.read_csv("DisasterDeclarationsSummaries.csv") 
fema_df.head()

Wow! The data was displayed in an instant and it’s much easier to read. It’s neatly sorted into visible rows and columns. This seems much better right? Well…

Using a downloaded CSV file has the downside that it may not always be up to date. When using the URL, every time it runs, it’s pulling the most recent data. To get the most recent data with a CSV file, you have to keep redownloading the file. For now, we are going to use the downloaded CSV file, and later, we can replace it with the URL when we need the most recent data.

17.2 Manipulating the Data:

Before moving onto visualizations, it’s important that we manipulate the data and change some elements. By doing this, we are making it easier to visualize the data and learning different ways to manipulate data with Pandas.

But where do we start?

Let’s first look at the types of variables we have in our dataset and see if there is anything off about them.

fema_df.dtypes

femaDeclarationString       object
disasterNumber               int64
state                       object
declarationType             object
declarationDate             object
fyDeclared                   int64
incidentType                object
declarationTitle            object
ihProgramDeclared            int64
iaProgramDeclared            int64
paProgramDeclared            int64
hmProgramDeclared            int64
incidentBeginDate           object
incidentEndDate             object
disasterCloseoutDate        object
tribalRequest                int64
fipsStateCode                int64
fipsCountyCode               int64
placeCode                    int64
designatedArea              object
declarationRequestNumber     int64
lastIAFilingDate            object
incidentId                   int64
region                       int64
designatedIncidentTypes     object
lastRefresh                 object
hash                        object
id                          object
dtype: object

Hmmm… there’s so many variables and all of them are either objects or integers, but do you notice anything off about them? Should some of the column names be a different type of variable?

Aha! The dates (declarationDate, incidentBeginDate, incidentEndDate, incidentCloseoutDate) are listed as objects instead of actual dates. Let’s fix that now because it might present a problem for us later.

import pandas as pd
fema_df = pd.read_csv('https://www.fema.gov/api/open/v2/DisasterDeclarationsSummaries.csv',
parse_dates = ["declarationDate", "incidentBeginDate", "incidentEndDate", "disasterCloseoutDate"])
fema_df.dtypes

/var/folders/8t/bwrtv74x3vg8g7hbvcpx86sr0000gn/T/ipykernel_66231/2798779253.py:2: DtypeWarning:

Columns (21) have mixed types. Specify dtype option on import or set low_memory=False.

femaDeclarationString                    object
disasterNumber                            int64
state                                    object
declarationType                          object
declarationDate             datetime64[ns, UTC]
fyDeclared                                int64
incidentType                             object
declarationTitle                         object
ihProgramDeclared                         int64
iaProgramDeclared                         int64
paProgramDeclared                         int64
hmProgramDeclared                         int64
incidentBeginDate           datetime64[ns, UTC]
incidentEndDate             datetime64[ns, UTC]
disasterCloseoutDate        datetime64[ns, UTC]
tribalRequest                             int64
fipsStateCode                             int64
fipsCountyCode                            int64
placeCode                                 int64
designatedArea                           object
declarationRequestNumber                  int64
lastIAFilingDate                         object
incidentId                                int64
region                                    int64
designatedIncidentTypes                  object
lastRefresh                              object
hash                                     object
id                                       object
dtype: object

What just happened in the code above? Do you see some familiar code from before?

In the first line, we are reading the CSV file again BUT we added something.

The parse_dates function made all the dates (declarationDate, incidentBeginDate, incidentEndDate, incidentCloseoutDate) into actual date variables which are readable by Python instead of objects.

The variable types are now datetime64[ns, UTC].

Since we did all this, wouldn’t it be helpful to add a column in our DataFrame which shows us the incident month number (January = 1, December = 12, etc.) ? We can use this later to see what incidents happen the most in specific months and which months have the most incidents.

fema_df["incidentMonth"] = fema_df["incidentBeginDate"].dt.month
fema_df.head()

	femaDeclarationString	disasterNumber	state	declarationType	declarationDate	fyDeclared	incidentType	declarationTitle	...	designatedArea	declarationRequestNumber	lastIAFilingDate	incidentId	region	designatedIncidentTypes	lastRefresh	hash	id	incidentMonth
0	FM-5530-NV	5530	NV	FM	2024-08-12 00:00:00+00:00	2024	Fire	GOLD RANCH FIRE	...	Washoe (County)	24123	NaN	2024081201	9	R	2024-08-27T18:22:14.800Z	5d07e7c51bb300bfbec94a699a1e1ab1d61a97cd	f15a7a79-f1c3-41bb-8a5c-c05fbae34423	8
1	FM-5529-OR	5529	OR	FM	2024-08-09 00:00:00+00:00	2024	Fire	LEE FALLS FIRE	...	Washington (County)	24122	NaN	2024081001	10	R	2024-08-27T18:22:14.800Z	ae87cf3c6ed795015b714af7166c7c295b2b67c7	09e3f81a-5e16-4b72-b317-1c64e0cfa59c	8
2	FM-5528-OR	5528	OR	FM	2024-08-06 00:00:00+00:00	2024	Fire	ELK LANE FIRE	...	Jefferson (County)	24116	NaN	2024080701	10	R	2024-08-27T18:22:14.800Z	432cf0995c47e3895cea696ede5621b810460501	59983f89-30bf-4888-b21b-62e8d57d9aac	8
3	FM-5527-OR	5527	OR	FM	2024-08-02 00:00:00+00:00	2024	Fire	MILE MARKER 132 FIRE	...	Deschutes (County)	24111	NaN	2024080301	10	R	2024-08-27T18:22:14.800Z	2f21d90cb6bc64b0d4121aa3f18d852bbb4b11fa	8d13ecf0-bc2f-496b-8c9f-b2e73da832a0	8
4	FM-5526-CO	5526	CO	FM	2024-08-01 00:00:00+00:00	2024	Fire	QUARRY FIRE	...	Jefferson (County)	24106	NaN	2024080102	8	R	2024-08-27T18:22:14.800Z	e753ba692156f389dbe19f7a1c332d04ae145f74	17c24d4a-49a9-4cac-9322-e5427c4cdfeb	7

5 rows × 29 columns

What happened here?! We have a new column at the right that shows the month of each incident!

In the code, we are saying that the new column incidentMonth has the value of the month number of when the incident began. We got the month number from the incidentBeginDate column by using dt.month. Now we have an incidentMonth column.

Let’s make an incident year column too!

fema_df["incidentYear"] = fema_df["incidentBeginDate"].dt.year
fema_df.head()

	femaDeclarationString	disasterNumber	state	declarationType	declarationDate	fyDeclared	incidentType	declarationTitle	...	declarationRequestNumber	lastIAFilingDate	incidentId	region	designatedIncidentTypes	lastRefresh	hash	id	incidentMonth	incidentYear
0	FM-5530-NV	5530	NV	FM	2024-08-12 00:00:00+00:00	2024	Fire	GOLD RANCH FIRE	...	24123	NaN	2024081201	9	R	2024-08-27T18:22:14.800Z	5d07e7c51bb300bfbec94a699a1e1ab1d61a97cd	f15a7a79-f1c3-41bb-8a5c-c05fbae34423	8	2024
1	FM-5529-OR	5529	OR	FM	2024-08-09 00:00:00+00:00	2024	Fire	LEE FALLS FIRE	...	24122	NaN	2024081001	10	R	2024-08-27T18:22:14.800Z	ae87cf3c6ed795015b714af7166c7c295b2b67c7	09e3f81a-5e16-4b72-b317-1c64e0cfa59c	8	2024
2	FM-5528-OR	5528	OR	FM	2024-08-06 00:00:00+00:00	2024	Fire	ELK LANE FIRE	...	24116	NaN	2024080701	10	R	2024-08-27T18:22:14.800Z	432cf0995c47e3895cea696ede5621b810460501	59983f89-30bf-4888-b21b-62e8d57d9aac	8	2024
3	FM-5527-OR	5527	OR	FM	2024-08-02 00:00:00+00:00	2024	Fire	MILE MARKER 132 FIRE	...	24111	NaN	2024080301	10	R	2024-08-27T18:22:14.800Z	2f21d90cb6bc64b0d4121aa3f18d852bbb4b11fa	8d13ecf0-bc2f-496b-8c9f-b2e73da832a0	8	2024
4	FM-5526-CO	5526	CO	FM	2024-08-01 00:00:00+00:00	2024	Fire	QUARRY FIRE	...	24106	NaN	2024080102	8	R	2024-08-27T18:22:14.800Z	e753ba692156f389dbe19f7a1c332d04ae145f74	17c24d4a-49a9-4cac-9322-e5427c4cdfeb	7	2024

5 rows × 30 columns

Here, we are doing the same thing as above, except this time, we are taking the year from the incident begin date by using dt.year.

Now we have another new column to the right called incidentYear! This makes it soooo much easier to show what specific months/years had the most incidents and what incidents happened in specific months/years.

Let’s see how many incidents there are per month overall! We can do this by using the groupby() and count function.

fema_df.groupby(["incidentMonth"])["incidentMonth"].count().to_frame()

	incidentMonth
incidentMonth
1	12941
2	4868
3	4843
4	5505
5	5014
6	4548
7	2964
8	8619
9	8354
10	3551
11	1860
12	4162

Here, we grouped all the incident months and counted the number of times each month was in the dataset.

Why? Because any place the incident month is listed, that means that there was an incident. So, now we can see the overall total of incidents for every month. 1 means January, 12 means December, and so on.

Did you notice something wrong about the column names? Both of them are named incidentMonth, which is confusing, but the column containing the numbers 1-12 are the actual months.

Don’t worry, we will fix this soon!

Finally, the to_frame() function just displays the data in a neater way.

What if we wanted to see the specific incidents in January? Let’s look at the simple code below to learn how.

incidentTypeMonth = fema_df.groupby(["incidentMonth", "incidentType"])["incidentMonth"].count()
print(incidentTypeMonth[1])

incidentType
Biological           7857
Chemical                9
Coastal Storm          40
Drought               158
Earthquake              4
Fire                   48
Flood                 820
Freezing              112
Hurricane               5
Mud/Landslide           1
Other                   2
Severe Ice Storm      708
Severe Storm         1840
Snowstorm            1214
Tornado                55
Typhoon                14
Volcanic Eruption       1
Winter Storm           53
Name: incidentMonth, dtype: int64

Here, we are grouping the incidentMonth and incidentType columns together and counting how many times each type of incident happens in that specific month.

In the next line, we said incidentTypeMonth[1] so that we can see how many times each type of incident happened in January. If you want to see a table like this for another month, all you have to do is change the number inside the brackets.

Neat, right?

What if we wanted to see a table like this for every month? Now, this does sound like a lot of work, but we just have to make something called a crosstab, and lucky for us, Pandas already has a crosstab function. So, we can make a crosstab with just one line of code!

femacross_df = pd.crosstab(fema_df.incidentMonth, fema_df.incidentType)
femacross_df

incidentType	Biological	Chemical	Coastal Storm	Dam/Levee Break	Drought	Earthquake	Fire	Fishing Losses	Flood	Freezing	...	Snowstorm	Straight-Line Winds	Terrorist	Tornado	Toxic Substances	Tropical Storm	Tsunami	Typhoon	Volcanic Eruption	Winter Storm
incidentMonth
1	7857	9	40	0	158	4	48	0	820	112	...	1214	0	0	55	0	0	0	14	1	53
2	0	0	6	3	0	25	107	0	948	0	...	823	0	0	62	1	0	0	12	0	1
3	0	0	3	0	74	7	344	0	1722	26	...	984	0	0	164	1	0	9	5	0	0
4	0	0	0	0	27	8	445	0	1895	0	...	87	0	4	536	1	0	0	3	1	7
5	0	0	0	8	10	8	263	18	931	0	...	6	0	0	298	1	9	0	4	48	0
6	0	0	137	1	219	2	324	0	1494	8	...	0	0	0	116	0	0	0	4	0	0
7	0	0	41	0	293	3	417	0	618	0	...	0	2	0	157	0	0	0	4	0	0
8	0	0	222	0	319	13	748	0	361	0	...	0	0	0	6	3	430	0	10	0	0
9	0	0	160	0	100	7	399	24	747	0	...	0	0	1	22	0	472	0	7	1	0
10	0	0	16	0	59	23	163	0	496	0	...	77	0	0	2	1	6	0	19	0	0
11	0	0	0	0	1	12	449	0	341	0	...	62	0	0	117	1	54	0	36	0	11
12	0	0	12	1	32	116	32	0	719	155	...	454	0	0	88	0	0	0	12	0	45

12 rows × 26 columns

Wow! Doesn’t it look neat? For every month, you can find all the information about how many of each type of incident occurred across all the years.

We made a new DataFrame called femacross_df because we want to be able to use the other dataframe (fema_df) again later. Then, we applied the crosstab function to the incidentMonth and incidentType from the other dataframe (fema_df).

Finally, we just called it so that the code would display the crosstab and voila!

Mostly easy, right? Can you think of what kind of visualizations are possible with a crosstab?

We are sooo close to visualizing our data, but uh oh! Remember our problem from before when we grouped all the incident months and counted the number of times each month was in the dataset?

If you forgot, you can go back and look at it again. You can see that both of the column names were incidentMonth, which made it confusing. Before moving onto the visualizations, we should fix this so it doesn’t present a problem for us later!

So, we know that both of the column names shouldn’t be incidentMonth because one column has the actual incident month number and the other column has the number of times an incident occured in that month. What should the name of the other column be?

A great name would be incidentCount because it shows the incident count! While we are at it, let’s make another dataframe just for this so it’s easier to visualize and it’s separate from our other dataframes!

Okay, so let’s make this.

incidentMonthCount = fema_df.groupby(["incidentMonth"])["incidentMonth"].count().to_frame()
incidentMonthCount.head()

	incidentMonth
incidentMonth
1	12941
2	4868
3	4843
4	5505
5	5014

Here, we are doing the same thing as above where we are counting the number of times a month appears in the dataset. Remember, every time a month appears in the data, that means that there has been an incident in that month. We are assigning this new dataframe to incidentMonthCount.

incidentMonthCount["Month"] = incidentMonthCount.index
incidentMonthCount.head()

	incidentMonth	Month
incidentMonth
1	12941	1
2	4868	2
3	4843	3
4	5505	4
5	5014	5

Here, we are assigning the value of the Month column to the index of incidentMonthCount. Before, there wasn’t a Month column, there was only two incidentMonth columns, one which held the actual month, and the other which showed how many times the month appeared in the dataset. Isn’t this confusing?

So that it makes more sense, we added a month column to show the months.

incidentMonthCount.rename(columns = {'incidentMonth':'incidentCount'}, inplace = True)
incidentMonthCount.head()

	incidentCount	Month
incidentMonth
1	12941	1
2	4868	2
3	4843	3
4	5505	4
5	5014	5

Now that we added a Month column we should rename the other incidentMonth column (if you are confused, it’s the one with the bigger numbers) something that makes more sense too. So, we named it incidentCount because it shows the number of incidents using the rename feature in Pandas. After doing this, we have an extra column, incidentMonth, because we already have an Month column. Let’s get rid of this.

incidentMonthCount.reset_index(drop=True, inplace=True)
incidentMonthCount.head()

	incidentCount	Month
0	12941	1
1	4868	2
2	4843	3
3	5505	4
4	5014	5

Great! With that, our index column is gone! The index numbers will still be there, but the column is gone! The data looks very simple now and it’s easy to use. Let’s put all the lines together now.

incidentMonthCount = fema_df.groupby(["incidentMonth"])["incidentMonth"].count().to_frame()
incidentMonthCount["Month"] = incidentMonthCount.index
incidentMonthCount.rename(columns = {'incidentMonth':'incidentCount'}, inplace = True)
incidentMonthCount.reset_index(drop=True, inplace=True)
incidentMonthCount.head()

	incidentCount	Month
0	12941	1
1	4868	2
2	4843	3
3	5505	4
4	5014	5

And with that, it’s fixed and we are done!

Whew, we did a lot of data manipulation there! If it’s confusing, it’s good to go back and play around with it to understand it better.

By manipulating data, you’re making it easier to use for your own unique purposes. Now, let’s use the data and finally visualize it to answer questions about your environment.

17.3 Visualizations:

17.3.1 In which months do the most incidents occur?

Let’s start with a simple bar chart using our new dataframe, incidentMonthCount.

import plotly.express as px

Here, we are importing the Plotly package, and like we did with the Pandas package, we are giving it a shorter name (px).

import plotly.express as px
fig = px.bar(incidentMonthCount, x = incidentMonthCount.Month, y = incidentMonthCount.incidentCount, 
             labels={
                 "Month": "Month",
                 "incidentCount": "Incident Count"
             },
             text_auto = True,
             title="FEMA Disaster Incidents Per Month" ) 
fig.show()

Woah… a lot just happened here. This code looks different from all the code we have been doing before. Well, this is because we are visualizing something.

Let’s break it down, piece by piece.

Notice that in the very first piece we said px.bar, and this is to tell Plotly what type of graph we want. In the future, we can similarly change this easily to make a pie chart or histogram.

In the code above, we are making a bar chart using incidentMonthCount. The x-axis is the month and the y-axis is the incidentCount or the number of times there were incidents in that month. Then, we are just giving the labels by making “Month” the same thing and “incidentCount” into “Incident Count” so that the graph is neater.

In the next piece (text_auto), we are making sure the text for the month and incident count appears when the mouse hovers on the bars. We are assigning all this to the variable fig. Finally, we are displaying the figure. Voila!

However, we have a slight problem with the data. To get the most accurate idea of our environment, we need to consider the following.

In 2020 and 2021, the coronavirus(COVID-19) pandemic may have contributed to many of the FEMA disasters in the United States. Can you guess which type of disaster they were counted as? So, let’s take out 2020 and 2021 and see the data without most of the COVID disaster declarations.

So, let’s do the same bar graph, except this time, let’s take out all the data from the years of 2020 and 2021. Yes, we need to manipulate the data again, but don’t worry! We can do it directly in our graph’s code.

fig = px.bar(incidentMonthCount[incidentMonthCount.incidentYear < 2020], 
        x = incidentMonthCount.Month, 
        y =  incidentMonthCount.incidentCount, 
             labels={
                 "index": "Month",
                 "incidentCount": "incident Count"
             },
             text_auto = True,
             title="FEMA Disaster Incidents Per Month") 

fig.show()

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[17], line 1
----> 1 fig = px.bar(incidentMonthCount[incidentMonthCount.incidentYear < 2020], 
      2         x = incidentMonthCount.Month, 
      3         y =  incidentMonthCount.incidentCount, 
      4              labels={
      5                  "index": "Month",
      6                  "incidentCount": "incident Count"
      7              },
      8              text_auto = True,
      9              title="FEMA Disaster Incidents Per Month") 
     11 fig.show()

File /usr/local/lib/python3.11/site-packages/pandas/core/generic.py:5989, in NDFrame.__getattr__(self, name)
   5982 if (
   5983     name not in self._internal_names_set
   5984     and name not in self._metadata
   5985     and name not in self._accessors
   5986     and self._info_axis._can_hold_identifiers_and_holds_name(name)
   5987 ):
   5988     return self[name]
-> 5989 return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'incidentYear'
AttributeError: 'DataFrame' object has no attribute 'incidentYear'

Uh oh! There’s an error. Why? Before, when we made the dataframe incidentMonthCount, everything seemed to be correct? So what’s wrong?

Oh! Look at the first line where we said incidentMountCount.incidentYear. In incidentMonthCount, we only have the month and the number of incidents per month, but not the year. Let’s fix this. What DataFrame also has the year.

Our original dataframe, fema_df, has the year. So, let’s use that this time.

fema_month = fema_df[fema_df.incidentYear < 2020].groupby(["incidentMonth"])["incidentMonth"].count().to_frame().rename({'incidentMonth': 'incidentCount'}, axis =1 )
fema_month.head()

	incidentCount
incidentMonth
1	4822
2	3700
3	4516
4	5015
5	4631

In these lines, just like before, we are counting the number of times each incident occurs per month and we are renaming the other incidentMonth to incidentCount and we are assigning all this to the fema_month variable. We are basically doing everything we did before when we changed the column names, except we are putting everything on one line.

The cool thing about Python is that you can keep putting dots and a method after and continue the code for a while. Basically, we are chaining the methods!

But do you notice something different? We said fema_df.incidentYear < 2020, which means that the only the years before 2020 will be shown.

Now, let’s move onto the code for the graph.

fig = px.bar(fema_month, x = fema_month.index, y =  fema_month.incidentCount, title = "Incidents Per Month (Without COVID)",
             labels={
                 "incidentMonth": "Month",
                 "incidentCount": "Incident Count"
             },
             text_auto = True ) 
fig.show()

Here, we are doing the same thing as the previous graph, but we are using a different DataFrame (fema_month). In the next piece, we are saying that the x-axis is the index of fema_month which is basically just incidentMonth because it’s an index column.

The y-axis is the number of incidents. This time, we added a title, so we can see that this is without COVID. Then, like in the first graph, we are changing the labels and making sure the text shows up when the mouse hovers over the bars.

Yay! Now our graph works! What can you get from this? Using the two graphs, you can see which months the most and least incidents occur. Can you think of why the most/least incidents occur in these months?

Can we visualize the same data (without COVID) in a pie chart so that we can see the percentages instead? We have only done bar graphs so far so let’s try a pie chart!

fig = px.pie(fema_month, values=fema_month.incidentCount, 
             names = fema_month.index,
             title = 'FEMA Incident Count by Month (Without COVID)'
            ) 

fig.show()

We can now see a pie chart of the incident count by month, expect now you can see the percentage for each month. When you hover over a piece of the pie, you can see the month and the number of incidents. On the right, there is a key for the month numbers. Let’s look at the code now.

The first difference is when we say px.pie. We are specifying to Plotly what type of graph we want. Then, like before, we are choosing our dataframe which is fema_month.

The values of the pieces of the pie will be the incidentCount. The names of the pieces of the pie are the index of fema_month. Then, we just made the title and showed the figure.

Did you notice that while this code seems shorter than the code for our bar graphs, it’s actually very similar?

Let’s move onto answering another question about our environment!

17.3.2 What are the top 5 states that face disasters?

So let’s look at the five states with the most incidents. Let’s also do two versions: with COVID and without COVID. Let’s start with COVID.

fema_states_covid = fema_df.groupby(["state"])["state"].count().to_frame().rename({'state': 'incidentCount'}, axis =1).sort_values(by='incidentCount', ascending = False)
fema_states_covid.head(10)

	incidentCount
state
TX	5350
FL	2783
KY	2762
MO	2750
GA	2650
LA	2589
VA	2584
OK	2543
NC	2315
PR	2078

Wow, that first line was really long! Let’s break it down.

In the first piece, we are counting the number of times each state appears in the dataset (fema_df). Why? Like before, with the months, everytime a state is listed in the dataset, that means that there has been an incident in that state.

In the next piece, we are renaming state to incidentCount. Again, why? Remember when we were counting incidents per month, and we had two month columns? If we didn’t rename it, then we would have two state columns. In this code, we are basically doing the same thing as before, but we are doing it for the state and adding a few new things.

Then, we are sorting the incidentCount values with ascending set to True so that the values can be displayed from greatest to least. Okay, so now we are done with the first line.

In the second line, we just displayed the first ten rows to make sure that it was working, and yay! It’s working!

Now, we can do two things so that it can show the top 5 states. At the end of the first line, we can add .head(5) so the data only consists of the top 5 states, or we can keep fema_states_covid the same and just print the head, which automatically shows the top 5 rows of the data.

Let’s add .head(5) at the end of the first line so that it’s easier for us later because then we wouldn’t have to add any extra code to display only the top 5 rows of the data.

fema_states_covid = fema_df.groupby(["state"])["state"].count().to_frame().rename({'state': 'incidentCount'}, axis =1).sort_values(by='incidentCount', ascending = False).head(5)
fema_states_covid

	incidentCount
state
TX	5350
FL	2783
KY	2762
MO	2750
GA	2650

Now let’s do the same thing, but this time, the data won’t include COVID.

fema_states_nocovid = fema_df[fema_df.incidentYear < 2020].groupby(["state"])["state"].count().to_frame().rename({'state': 'incidentCount'}, axis =1).sort_values(by='incidentCount', ascending = False).head(5)
fema_states_nocovid

	incidentCount
state
TX	4042
MO	2421
VA	2184
KY	2149
OK	2041

Here, we are doing the same thing as before, except we are taking out 2020 and 2021, which we have also done before.

Done! That was easy, right? Let’s now take this data and visualize this.

Let’s use a pie chart, but you can always make a bar graph on your own! This time, instead of making two versions of code for with and without COVID, let’s do it all in one cell and let the user decide which graph they want to see.

covid_status = input("Would you like to see a graph with or without COVID being considered? Enter Y for COVID and N for without COVID.")
if covid_status == "Y":
    fig = px.pie(fema_states_covid, values = fema_states_covid.incidentCount, names = fema_states_covid.index, title = "Top 5 States with COVID")
    fig.show()
else:
    fig = px.pie(fema_states_nocovid, values = fema_states_nocovid.incidentCount, names = fema_states_nocovid.index, title = "Top 5 States Without COVID")
    fig.show()

Wow! Isn’t this cool? It certainly makes things a lot easier for us. All we did was add an input and an if-else statement.

Let’s break down the code!

We are doing the same thing as before with the figures, so if you don’t understand that, then you should go back and look at our previous code.

Our new variable, covid_status, contains the user input. The first line takes the input from the user, and then the second line, which is the if statement, says that if the user input is equal to “Y”, then the graph with COVID should be displayed.

After, the else statement says that if the user inputs something other than “Y”, like “N”, then the graph without COVID should be displayed.

Easy peeasy lemon squeezy!

17.3.3 More visualizations!

What if we wanted to show all the months on a pie chart and not just the top 5? Let’s do this one without COVID too.

df_nocovid = fema_df[fema_df.incidentYear < 2020].groupby(["state"])["state"].count().to_frame().rename({'state': 'incidentCount'}, axis =1)


df_nocovid["state"] = df_nocovid.index

df_nocovid.head()

	incidentCount	state
state
AK	176	AK
AL	1363	AL
AR	1348	AR
AS	65	AS
AZ	241	AZ

We need to create a new dataframe called df_nocovid because the other dataframe only includes the top 5 states. In this dataframe, we are doing the same thing we did for the fema_states_nocovid variable, except we are taking out the .head(5) so that we can see all the states.

Then, we made a new variable, state, to take the value of the index of df_nocovid so that the state column is counted as the index column. Remember when we needed to do this before with the month columns?

Okay, now our data is ready, let’s move onto graphing it.

fig = px.pie(df_nocovid, values = df_nocovid.incidentCount, names = df_nocovid.state)
fig.show()

AHHHH! What is this? There are too many values and it’s so messy and crowded! Is there a way we can fix this?

Let’s group all the smaller values, let’s say states with an incident count less than 1000, into one group called “Other States”.

df_nocovid = fema_df[fema_df.incidentYear < 2020].groupby(["state"])["state"].count().to_frame().rename({'state': 'incidentCount'}, axis =1)

df_nocovid["state"] = df_nocovid.index

df_nocovid.loc[df_nocovid['incidentCount'] < 500, 'state'] = 'Other States'

fig = px.pie(df_nocovid, values = df_nocovid.incidentCount, names = df_nocovid.state)
fig.show()

So, our code is almost exactly the same, but we added one more line. Notice, in the third line, we added df_nocovid.loc.

What does .loc do?.

Basically, it lets us access different rows and columns in the DataFrame.

Here, we are accessing the incidentCount column and saying that if a value is less than 500, take the state, and add it to “Other States”. Now, when we display the figure, we have a new piece called Other States and our graph is much better.

Done! Let’s move onto the next question we can answer about our environment!

17.3.4 What are the most common FEMA disasters in your state?

To find the answer to this question, we need to get user input and get a specific state from the data.

Let’s use a Plotly histogram for this. Why aren’t we using a bar chart? Well, in this case a histogram is better because we don’t need as much code, and bar charts are basically the same thing as histograms depending on the situation.

Also, we are looking at the FEMA disasters across all time in the data, so we aren’t specifying anything but the state, whereas before, we had to change the month, year, etc.

state = input('Enter the state initials in all caps.')

fig = px.histogram(fema_df[fema_df.state == state], x="incidentType")

fig.show()

First, we are asking for the user to input their state initials in all caps.

Then, we are making a variable called fig and assigning the histogram to it by using px.histogram.

The histogram is getting data from the fema_df dataframe, but it is being filtered to only show data from the state that the user inputted. Then, we are giving the x-axis value for the histogram. Since we want to see the type of FEMA disasters, the x-axis will be incidentType.

Since the figure is a histogram, the y-axis is automatically the count and it shows how many times each type of disaster happened.

Finally, we are displaying the figure.

Looking at the graph, what are the most common FEMA disasters in your state? Why do you think that is?

Since we are in Georgia, we will try GA

fig = px.histogram(fema_df[fema_df.state == 'GA'], x="incidentType")

fig.show()

17.3.5 Are FEMA declared disasters becoming more frequent in your state?

import plotly.express as px

state = input('State:')

fig = px.histogram(fema_df[fema_df.state == state], x="fyDeclared")
fig.show()

In the first two lines of code, we are doing the same thing as before. We are just importing Plotly and getting the state from the user.

Next, we are making a histogram again. The only difference is that this time, we are looking to see how many FEMA declared disasters there are over time. So, the x-axis will be fyDeclared so we know which year the disaster was declared. The y-axis is the count by default, so it shows how many times a disaster was declared for each year.

Then, we are just showing the figure.

Looking at the graph, you can see how many FEMA declared disasters there were in specific years and see if there are more over time. Are they becoming more frequent in your state? Are there any factors that you can think of that are contributing to it?

And with that, we have exhausted this dataset! There are still many more questions that can be answered and you should try it on your own based on what you have learned in this chapter.

First, think of something you want to know about your environment. That’s your question. Then think of the data you need, use your knowledge, and consult the Internet when necessary, and code!

You will make mistakes, but it’s extremely satisfying to figure it out and do something on your own. If you can’t figure it out, there’s always the Google and the lessons from this chapter!