17 Disaster Declarations by FEMA
Let us start with the understanding the extreme weather events and their impact on us.
The Federal Emergency Management Agency (FEMA) is a great place to start. FEMA’s mission is helping people before, during and after disasters. FEMA employs more than 20,000 people nationwide.
17.1 Reading the FEMA Data
A great way to understand our environment is to look at the disasters that have happened there! In this section, we are going to be working with the OpenFEMA Dataset: Disaster Declaration Summaries dataset.
The data set is available at https://www.fema.gov/openfema-data-page/disaster-declarations-summaries-v2.
In this dataset, all federally declared disasters since 1953 (wow!) are described.
Using the information we learned from the Basic Pandas section in the Pandas primer, let’s download and read the file.
Let’s start! First, follow the link, scroll down to full data, and download the csv file. Then, create a new folder and keep it in it so it is easily accessible. This will be useful later on too.
In this line, we are importing the package Pandas so that we can see and manipulate the data. We said “as pd” to give it a shorter name or alias (pd) whenever we decide to use it. This will make more sense as we keep coding.
Note: Once we import pandas once, we don’t need to do it again. If your kernel restarts however, you will need to run this line again for your code to work.
fema_df = pd.read_csv('https://www.fema.gov/api/open/v2/DisasterDeclarationsSummaries.csv')
print(fema_df.head())
femaDeclarationString disasterNumber state declarationType \
0 FM-5530-NV 5530 NV FM
1 FM-5529-OR 5529 OR FM
2 FM-5528-OR 5528 OR FM
3 FM-5527-OR 5527 OR FM
4 FM-5526-CO 5526 CO FM
declarationDate fyDeclared incidentType declarationTitle \
0 2024-08-12T00:00:00.000Z 2024 Fire GOLD RANCH FIRE
1 2024-08-09T00:00:00.000Z 2024 Fire LEE FALLS FIRE
2 2024-08-06T00:00:00.000Z 2024 Fire ELK LANE FIRE
3 2024-08-02T00:00:00.000Z 2024 Fire MILE MARKER 132 FIRE
4 2024-08-01T00:00:00.000Z 2024 Fire QUARRY FIRE
ihProgramDeclared iaProgramDeclared ... placeCode designatedArea \
0 0 0 ... 99031 Washoe (County)
1 0 0 ... 99067 Washington (County)
2 0 0 ... 99031 Jefferson (County)
3 0 0 ... 99017 Deschutes (County)
4 0 0 ... 99059 Jefferson (County)
declarationRequestNumber lastIAFilingDate incidentId region \
0 24123 NaN 2024081201 9
1 24122 NaN 2024081001 10
2 24116 NaN 2024080701 10
3 24111 NaN 2024080301 10
4 24106 NaN 2024080102 8
designatedIncidentTypes lastRefresh \
0 R 2024-08-27T18:22:14.800Z
1 R 2024-08-27T18:22:14.800Z
2 R 2024-08-27T18:22:14.800Z
3 R 2024-08-27T18:22:14.800Z
4 R 2024-08-27T18:22:14.800Z
hash \
0 5d07e7c51bb300bfbec94a699a1e1ab1d61a97cd
1 ae87cf3c6ed795015b714af7166c7c295b2b67c7
2 432cf0995c47e3895cea696ede5621b810460501
3 2f21d90cb6bc64b0d4121aa3f18d852bbb4b11fa
4 e753ba692156f389dbe19f7a1c332d04ae145f74
id
0 f15a7a79-f1c3-41bb-8a5c-c05fbae34423
1 09e3f81a-5e16-4b72-b317-1c64e0cfa59c
2 59983f89-30bf-4888-b21b-62e8d57d9aac
3 8d13ecf0-bc2f-496b-8c9f-b2e73da832a0
4 17c24d4a-49a9-4cac-9322-e5427c4cdfeb
[5 rows x 28 columns]
/var/folders/8t/bwrtv74x3vg8g7hbvcpx86sr0000gn/T/ipykernel_66231/3810111583.py:1: DtypeWarning:
Columns (21) have mixed types. Specify dtype option on import or set low_memory=False.
What are we doing in the code?
In the first line, we are reading the CSV URL using Pandas.
You can get the URL from the website again by just copying the CSV link. Then, we are assigning it to the variable fema_df with df standing for DataFrame.
In the second line, we are printing the first five rows (head) of the data. This is how we normally preview our data.
Do you notice that when you ran the code, it took a while to execute and the data displayed seems a bit harder to read?
Let’s try doing the same thing except with our downloaded CSV file and see if there is any difference.
Download the data locally. Make sure that the full path name is given if the file is not in the same directory.
Wow! The data was displayed in an instant and it’s much easier to read. It’s neatly sorted into visible rows and columns. This seems much better right? Well…
Using a downloaded CSV file has the downside that it may not always be up to date. When using the URL, every time it runs, it’s pulling the most recent data. To get the most recent data with a CSV file, you have to keep redownloading the file. For now, we are going to use the downloaded CSV file, and later, we can replace it with the URL when we need the most recent data.
17.2 Manipulating the Data:
Before moving onto visualizations, it’s important that we manipulate the data and change some elements. By doing this, we are making it easier to visualize the data and learning different ways to manipulate data with Pandas.
But where do we start?
Let’s first look at the types of variables we have in our dataset and see if there is anything off about them.
femaDeclarationString object
disasterNumber int64
state object
declarationType object
declarationDate object
fyDeclared int64
incidentType object
declarationTitle object
ihProgramDeclared int64
iaProgramDeclared int64
paProgramDeclared int64
hmProgramDeclared int64
incidentBeginDate object
incidentEndDate object
disasterCloseoutDate object
tribalRequest int64
fipsStateCode int64
fipsCountyCode int64
placeCode int64
designatedArea object
declarationRequestNumber int64
lastIAFilingDate object
incidentId int64
region int64
designatedIncidentTypes object
lastRefresh object
hash object
id object
dtype: object
Hmmm… there’s so many variables and all of them are either objects or integers, but do you notice anything off about them? Should some of the column names be a different type of variable?
Aha! The dates (declarationDate, incidentBeginDate, incidentEndDate, incidentCloseoutDate) are listed as objects instead of actual dates. Let’s fix that now because it might present a problem for us later.
import pandas as pd
fema_df = pd.read_csv('https://www.fema.gov/api/open/v2/DisasterDeclarationsSummaries.csv',
parse_dates = ["declarationDate", "incidentBeginDate", "incidentEndDate", "disasterCloseoutDate"])
fema_df.dtypes
/var/folders/8t/bwrtv74x3vg8g7hbvcpx86sr0000gn/T/ipykernel_66231/2798779253.py:2: DtypeWarning:
Columns (21) have mixed types. Specify dtype option on import or set low_memory=False.
femaDeclarationString object
disasterNumber int64
state object
declarationType object
declarationDate datetime64[ns, UTC]
fyDeclared int64
incidentType object
declarationTitle object
ihProgramDeclared int64
iaProgramDeclared int64
paProgramDeclared int64
hmProgramDeclared int64
incidentBeginDate datetime64[ns, UTC]
incidentEndDate datetime64[ns, UTC]
disasterCloseoutDate datetime64[ns, UTC]
tribalRequest int64
fipsStateCode int64
fipsCountyCode int64
placeCode int64
designatedArea object
declarationRequestNumber int64
lastIAFilingDate object
incidentId int64
region int64
designatedIncidentTypes object
lastRefresh object
hash object
id object
dtype: object
What just happened in the code above? Do you see some familiar code from before?
In the first line, we are reading the CSV file again BUT we added something.
The parse_dates function made all the dates (declarationDate, incidentBeginDate, incidentEndDate, incidentCloseoutDate) into actual date variables which are readable by Python instead of objects.
The variable types are now datetime64[ns, UTC].
Since we did all this, wouldn’t it be helpful to add a column in our DataFrame which shows us the incident month number (January = 1, December = 12, etc.) ? We can use this later to see what incidents happen the most in specific months and which months have the most incidents.
femaDeclarationString | disasterNumber | state | declarationType | declarationDate | fyDeclared | incidentType | declarationTitle | ihProgramDeclared | iaProgramDeclared | ... | designatedArea | declarationRequestNumber | lastIAFilingDate | incidentId | region | designatedIncidentTypes | lastRefresh | hash | id | incidentMonth | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | FM-5530-NV | 5530 | NV | FM | 2024-08-12 00:00:00+00:00 | 2024 | Fire | GOLD RANCH FIRE | 0 | 0 | ... | Washoe (County) | 24123 | NaN | 2024081201 | 9 | R | 2024-08-27T18:22:14.800Z | 5d07e7c51bb300bfbec94a699a1e1ab1d61a97cd | f15a7a79-f1c3-41bb-8a5c-c05fbae34423 | 8 |
1 | FM-5529-OR | 5529 | OR | FM | 2024-08-09 00:00:00+00:00 | 2024 | Fire | LEE FALLS FIRE | 0 | 0 | ... | Washington (County) | 24122 | NaN | 2024081001 | 10 | R | 2024-08-27T18:22:14.800Z | ae87cf3c6ed795015b714af7166c7c295b2b67c7 | 09e3f81a-5e16-4b72-b317-1c64e0cfa59c | 8 |
2 | FM-5528-OR | 5528 | OR | FM | 2024-08-06 00:00:00+00:00 | 2024 | Fire | ELK LANE FIRE | 0 | 0 | ... | Jefferson (County) | 24116 | NaN | 2024080701 | 10 | R | 2024-08-27T18:22:14.800Z | 432cf0995c47e3895cea696ede5621b810460501 | 59983f89-30bf-4888-b21b-62e8d57d9aac | 8 |
3 | FM-5527-OR | 5527 | OR | FM | 2024-08-02 00:00:00+00:00 | 2024 | Fire | MILE MARKER 132 FIRE | 0 | 0 | ... | Deschutes (County) | 24111 | NaN | 2024080301 | 10 | R | 2024-08-27T18:22:14.800Z | 2f21d90cb6bc64b0d4121aa3f18d852bbb4b11fa | 8d13ecf0-bc2f-496b-8c9f-b2e73da832a0 | 8 |
4 | FM-5526-CO | 5526 | CO | FM | 2024-08-01 00:00:00+00:00 | 2024 | Fire | QUARRY FIRE | 0 | 0 | ... | Jefferson (County) | 24106 | NaN | 2024080102 | 8 | R | 2024-08-27T18:22:14.800Z | e753ba692156f389dbe19f7a1c332d04ae145f74 | 17c24d4a-49a9-4cac-9322-e5427c4cdfeb | 7 |
5 rows × 29 columns
What happened here?! We have a new column at the right that shows the month of each incident!
In the code, we are saying that the new column incidentMonth has the value of the month number of when the incident began. We got the month number from the incidentBeginDate column by using dt.month. Now we have an incidentMonth column.
Let’s make an incident year column too!
femaDeclarationString | disasterNumber | state | declarationType | declarationDate | fyDeclared | incidentType | declarationTitle | ihProgramDeclared | iaProgramDeclared | ... | declarationRequestNumber | lastIAFilingDate | incidentId | region | designatedIncidentTypes | lastRefresh | hash | id | incidentMonth | incidentYear | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | FM-5530-NV | 5530 | NV | FM | 2024-08-12 00:00:00+00:00 | 2024 | Fire | GOLD RANCH FIRE | 0 | 0 | ... | 24123 | NaN | 2024081201 | 9 | R | 2024-08-27T18:22:14.800Z | 5d07e7c51bb300bfbec94a699a1e1ab1d61a97cd | f15a7a79-f1c3-41bb-8a5c-c05fbae34423 | 8 | 2024 |
1 | FM-5529-OR | 5529 | OR | FM | 2024-08-09 00:00:00+00:00 | 2024 | Fire | LEE FALLS FIRE | 0 | 0 | ... | 24122 | NaN | 2024081001 | 10 | R | 2024-08-27T18:22:14.800Z | ae87cf3c6ed795015b714af7166c7c295b2b67c7 | 09e3f81a-5e16-4b72-b317-1c64e0cfa59c | 8 | 2024 |
2 | FM-5528-OR | 5528 | OR | FM | 2024-08-06 00:00:00+00:00 | 2024 | Fire | ELK LANE FIRE | 0 | 0 | ... | 24116 | NaN | 2024080701 | 10 | R | 2024-08-27T18:22:14.800Z | 432cf0995c47e3895cea696ede5621b810460501 | 59983f89-30bf-4888-b21b-62e8d57d9aac | 8 | 2024 |
3 | FM-5527-OR | 5527 | OR | FM | 2024-08-02 00:00:00+00:00 | 2024 | Fire | MILE MARKER 132 FIRE | 0 | 0 | ... | 24111 | NaN | 2024080301 | 10 | R | 2024-08-27T18:22:14.800Z | 2f21d90cb6bc64b0d4121aa3f18d852bbb4b11fa | 8d13ecf0-bc2f-496b-8c9f-b2e73da832a0 | 8 | 2024 |
4 | FM-5526-CO | 5526 | CO | FM | 2024-08-01 00:00:00+00:00 | 2024 | Fire | QUARRY FIRE | 0 | 0 | ... | 24106 | NaN | 2024080102 | 8 | R | 2024-08-27T18:22:14.800Z | e753ba692156f389dbe19f7a1c332d04ae145f74 | 17c24d4a-49a9-4cac-9322-e5427c4cdfeb | 7 | 2024 |
5 rows × 30 columns
Here, we are doing the same thing as above, except this time, we are taking the year from the incident begin date by using dt.year.
Now we have another new column to the right called incidentYear! This makes it soooo much easier to show what specific months/years had the most incidents and what incidents happened in specific months/years.
Let’s see how many incidents there are per month overall! We can do this by using the groupby() and count function.
incidentMonth | |
---|---|
incidentMonth | |
1 | 12941 |
2 | 4868 |
3 | 4843 |
4 | 5505 |
5 | 5014 |
6 | 4548 |
7 | 2964 |
8 | 8619 |
9 | 8354 |
10 | 3551 |
11 | 1860 |
12 | 4162 |
Here, we grouped all the incident months and counted the number of times each month was in the dataset.
Why? Because any place the incident month is listed, that means that there was an incident. So, now we can see the overall total of incidents for every month. 1 means January, 12 means December, and so on.
Did you notice something wrong about the column names? Both of them are named incidentMonth, which is confusing, but the column containing the numbers 1-12 are the actual months.
Don’t worry, we will fix this soon!
Finally, the to_frame() function just displays the data in a neater way.
What if we wanted to see the specific incidents in January? Let’s look at the simple code below to learn how.
incidentTypeMonth = fema_df.groupby(["incidentMonth", "incidentType"])["incidentMonth"].count()
print(incidentTypeMonth[1])
incidentType
Biological 7857
Chemical 9
Coastal Storm 40
Drought 158
Earthquake 4
Fire 48
Flood 820
Freezing 112
Hurricane 5
Mud/Landslide 1
Other 2
Severe Ice Storm 708
Severe Storm 1840
Snowstorm 1214
Tornado 55
Typhoon 14
Volcanic Eruption 1
Winter Storm 53
Name: incidentMonth, dtype: int64
Here, we are grouping the incidentMonth and incidentType columns together and counting how many times each type of incident happens in that specific month.
In the next line, we said incidentTypeMonth[1] so that we can see how many times each type of incident happened in January. If you want to see a table like this for another month, all you have to do is change the number inside the brackets.
Neat, right?
What if we wanted to see a table like this for every month? Now, this does sound like a lot of work, but we just have to make something called a crosstab, and lucky for us, Pandas already has a crosstab function. So, we can make a crosstab with just one line of code!
incidentType | Biological | Chemical | Coastal Storm | Dam/Levee Break | Drought | Earthquake | Fire | Fishing Losses | Flood | Freezing | ... | Snowstorm | Straight-Line Winds | Terrorist | Tornado | Toxic Substances | Tropical Storm | Tsunami | Typhoon | Volcanic Eruption | Winter Storm |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
incidentMonth | |||||||||||||||||||||
1 | 7857 | 9 | 40 | 0 | 158 | 4 | 48 | 0 | 820 | 112 | ... | 1214 | 0 | 0 | 55 | 0 | 0 | 0 | 14 | 1 | 53 |
2 | 0 | 0 | 6 | 3 | 0 | 25 | 107 | 0 | 948 | 0 | ... | 823 | 0 | 0 | 62 | 1 | 0 | 0 | 12 | 0 | 1 |
3 | 0 | 0 | 3 | 0 | 74 | 7 | 344 | 0 | 1722 | 26 | ... | 984 | 0 | 0 | 164 | 1 | 0 | 9 | 5 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 27 | 8 | 445 | 0 | 1895 | 0 | ... | 87 | 0 | 4 | 536 | 1 | 0 | 0 | 3 | 1 | 7 |
5 | 0 | 0 | 0 | 8 | 10 | 8 | 263 | 18 | 931 | 0 | ... | 6 | 0 | 0 | 298 | 1 | 9 | 0 | 4 | 48 | 0 |
6 | 0 | 0 | 137 | 1 | 219 | 2 | 324 | 0 | 1494 | 8 | ... | 0 | 0 | 0 | 116 | 0 | 0 | 0 | 4 | 0 | 0 |
7 | 0 | 0 | 41 | 0 | 293 | 3 | 417 | 0 | 618 | 0 | ... | 0 | 2 | 0 | 157 | 0 | 0 | 0 | 4 | 0 | 0 |
8 | 0 | 0 | 222 | 0 | 319 | 13 | 748 | 0 | 361 | 0 | ... | 0 | 0 | 0 | 6 | 3 | 430 | 0 | 10 | 0 | 0 |
9 | 0 | 0 | 160 | 0 | 100 | 7 | 399 | 24 | 747 | 0 | ... | 0 | 0 | 1 | 22 | 0 | 472 | 0 | 7 | 1 | 0 |
10 | 0 | 0 | 16 | 0 | 59 | 23 | 163 | 0 | 496 | 0 | ... | 77 | 0 | 0 | 2 | 1 | 6 | 0 | 19 | 0 | 0 |
11 | 0 | 0 | 0 | 0 | 1 | 12 | 449 | 0 | 341 | 0 | ... | 62 | 0 | 0 | 117 | 1 | 54 | 0 | 36 | 0 | 11 |
12 | 0 | 0 | 12 | 1 | 32 | 116 | 32 | 0 | 719 | 155 | ... | 454 | 0 | 0 | 88 | 0 | 0 | 0 | 12 | 0 | 45 |
12 rows × 26 columns
Wow! Doesn’t it look neat? For every month, you can find all the information about how many of each type of incident occurred across all the years.
We made a new DataFrame called femacross_df because we want to be able to use the other dataframe (fema_df) again later. Then, we applied the crosstab function to the incidentMonth and incidentType from the other dataframe (fema_df).
Finally, we just called it so that the code would display the crosstab and voila!
Mostly easy, right? Can you think of what kind of visualizations are possible with a crosstab?
We are sooo close to visualizing our data, but uh oh! Remember our problem from before when we grouped all the incident months and counted the number of times each month was in the dataset?
If you forgot, you can go back and look at it again. You can see that both of the column names were incidentMonth, which made it confusing. Before moving onto the visualizations, we should fix this so it doesn’t present a problem for us later!
So, we know that both of the column names shouldn’t be incidentMonth because one column has the actual incident month number and the other column has the number of times an incident occured in that month. What should the name of the other column be?
A great name would be incidentCount because it shows the incident count! While we are at it, let’s make another dataframe just for this so it’s easier to visualize and it’s separate from our other dataframes!
Okay, so let’s make this.
incidentMonthCount = fema_df.groupby(["incidentMonth"])["incidentMonth"].count().to_frame()
incidentMonthCount.head()
incidentMonth | |
---|---|
incidentMonth | |
1 | 12941 |
2 | 4868 |
3 | 4843 |
4 | 5505 |
5 | 5014 |
Here, we are doing the same thing as above where we are counting the number of times a month appears in the dataset. Remember, every time a month appears in the data, that means that there has been an incident in that month. We are assigning this new dataframe to incidentMonthCount.
incidentMonth | Month | |
---|---|---|
incidentMonth | ||
1 | 12941 | 1 |
2 | 4868 | 2 |
3 | 4843 | 3 |
4 | 5505 | 4 |
5 | 5014 | 5 |
Here, we are assigning the value of the Month column to the index of incidentMonthCount. Before, there wasn’t a Month column, there was only two incidentMonth columns, one which held the actual month, and the other which showed how many times the month appeared in the dataset. Isn’t this confusing?
So that it makes more sense, we added a month column to show the months.
incidentMonthCount.rename(columns = {'incidentMonth':'incidentCount'}, inplace = True)
incidentMonthCount.head()
incidentCount | Month | |
---|---|---|
incidentMonth | ||
1 | 12941 | 1 |
2 | 4868 | 2 |
3 | 4843 | 3 |
4 | 5505 | 4 |
5 | 5014 | 5 |
Now that we added a Month column we should rename the other incidentMonth column (if you are confused, it’s the one with the bigger numbers) something that makes more sense too. So, we named it incidentCount because it shows the number of incidents using the rename feature in Pandas. After doing this, we have an extra column, incidentMonth, because we already have an Month column. Let’s get rid of this.
incidentCount | Month | |
---|---|---|
0 | 12941 | 1 |
1 | 4868 | 2 |
2 | 4843 | 3 |
3 | 5505 | 4 |
4 | 5014 | 5 |
Great! With that, our index column is gone! The index numbers will still be there, but the column is gone! The data looks very simple now and it’s easy to use. Let’s put all the lines together now.
incidentMonthCount = fema_df.groupby(["incidentMonth"])["incidentMonth"].count().to_frame()
incidentMonthCount["Month"] = incidentMonthCount.index
incidentMonthCount.rename(columns = {'incidentMonth':'incidentCount'}, inplace = True)
incidentMonthCount.reset_index(drop=True, inplace=True)
incidentMonthCount.head()
incidentCount | Month | |
---|---|---|
0 | 12941 | 1 |
1 | 4868 | 2 |
2 | 4843 | 3 |
3 | 5505 | 4 |
4 | 5014 | 5 |
And with that, it’s fixed and we are done!
Whew, we did a lot of data manipulation there! If it’s confusing, it’s good to go back and play around with it to understand it better.
By manipulating data, you’re making it easier to use for your own unique purposes. Now, let’s use the data and finally visualize it to answer questions about your environment.
17.3 Visualizations:
17.3.1 In which months do the most incidents occur?
Let’s start with a simple bar chart using our new dataframe, incidentMonthCount.
Here, we are importing the Plotly package, and like we did with the Pandas package, we are giving it a shorter name (px).
import plotly.express as px
fig = px.bar(incidentMonthCount, x = incidentMonthCount.Month, y = incidentMonthCount.incidentCount,
labels={
"Month": "Month",
"incidentCount": "Incident Count"
},
text_auto = True,
title="FEMA Disaster Incidents Per Month" )
fig.show()
Woah… a lot just happened here. This code looks different from all the code we have been doing before. Well, this is because we are visualizing something.
Let’s break it down, piece by piece.
Notice that in the very first piece we said px.bar, and this is to tell Plotly what type of graph we want. In the future, we can similarly change this easily to make a pie chart or histogram.
In the code above, we are making a bar chart using incidentMonthCount. The x-axis is the month and the y-axis is the incidentCount or the number of times there were incidents in that month. Then, we are just giving the labels by making “Month” the same thing and “incidentCount” into “Incident Count” so that the graph is neater.
In the next piece (text_auto), we are making sure the text for the month and incident count appears when the mouse hovers on the bars. We are assigning all this to the variable fig. Finally, we are displaying the figure. Voila!
However, we have a slight problem with the data. To get the most accurate idea of our environment, we need to consider the following.
In 2020 and 2021, the coronavirus(COVID-19) pandemic may have contributed to many of the FEMA disasters in the United States. Can you guess which type of disaster they were counted as? So, let’s take out 2020 and 2021 and see the data without most of the COVID disaster declarations.
So, let’s do the same bar graph, except this time, let’s take out all the data from the years of 2020 and 2021. Yes, we need to manipulate the data again, but don’t worry! We can do it directly in our graph’s code.
fig = px.bar(incidentMonthCount[incidentMonthCount.incidentYear < 2020],
x = incidentMonthCount.Month,
y = incidentMonthCount.incidentCount,
labels={
"index": "Month",
"incidentCount": "incident Count"
},
text_auto = True,
title="FEMA Disaster Incidents Per Month")
fig.show()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[17], line 1
----> 1 fig = px.bar(incidentMonthCount[incidentMonthCount.incidentYear < 2020],
2 x = incidentMonthCount.Month,
3 y = incidentMonthCount.incidentCount,
4 labels={
5 "index": "Month",
6 "incidentCount": "incident Count"
7 },
8 text_auto = True,
9 title="FEMA Disaster Incidents Per Month")
11 fig.show()
File /usr/local/lib/python3.11/site-packages/pandas/core/generic.py:5989, in NDFrame.__getattr__(self, name)
5982 if (
5983 name not in self._internal_names_set
5984 and name not in self._metadata
5985 and name not in self._accessors
5986 and self._info_axis._can_hold_identifiers_and_holds_name(name)
5987 ):
5988 return self[name]
-> 5989 return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'incidentYear'
AttributeError: 'DataFrame' object has no attribute 'incidentYear'
Uh oh! There’s an error. Why? Before, when we made the dataframe incidentMonthCount, everything seemed to be correct? So what’s wrong?
Oh! Look at the first line where we said incidentMountCount.incidentYear. In incidentMonthCount, we only have the month and the number of incidents per month, but not the year. Let’s fix this. What DataFrame also has the year.
Our original dataframe, fema_df, has the year. So, let’s use that this time.
fema_month = fema_df[fema_df.incidentYear < 2020].groupby(["incidentMonth"])["incidentMonth"].count().to_frame().rename({'incidentMonth': 'incidentCount'}, axis =1 )
fema_month.head()
incidentCount | |
---|---|
incidentMonth | |
1 | 4822 |
2 | 3700 |
3 | 4516 |
4 | 5015 |
5 | 4631 |
In these lines, just like before, we are counting the number of times each incident occurs per month and we are renaming the other incidentMonth to incidentCount and we are assigning all this to the fema_month variable. We are basically doing everything we did before when we changed the column names, except we are putting everything on one line.
The cool thing about Python is that you can keep putting dots and a method after and continue the code for a while. Basically, we are chaining the methods!
But do you notice something different? We said fema_df.incidentYear < 2020, which means that the only the years before 2020 will be shown.
Now, let’s move onto the code for the graph.
fig = px.bar(fema_month, x = fema_month.index, y = fema_month.incidentCount, title = "Incidents Per Month (Without COVID)",
labels={
"incidentMonth": "Month",
"incidentCount": "Incident Count"
},
text_auto = True )
fig.show()
Here, we are doing the same thing as the previous graph, but we are using a different DataFrame (fema_month). In the next piece, we are saying that the x-axis is the index of fema_month which is basically just incidentMonth because it’s an index column.
The y-axis is the number of incidents. This time, we added a title, so we can see that this is without COVID. Then, like in the first graph, we are changing the labels and making sure the text shows up when the mouse hovers over the bars.
Yay! Now our graph works! What can you get from this? Using the two graphs, you can see which months the most and least incidents occur. Can you think of why the most/least incidents occur in these months?
Can we visualize the same data (without COVID) in a pie chart so that we can see the percentages instead? We have only done bar graphs so far so let’s try a pie chart!
fig = px.pie(fema_month, values=fema_month.incidentCount,
names = fema_month.index,
title = 'FEMA Incident Count by Month (Without COVID)'
)
fig.show()
We can now see a pie chart of the incident count by month, expect now you can see the percentage for each month. When you hover over a piece of the pie, you can see the month and the number of incidents. On the right, there is a key for the month numbers. Let’s look at the code now.
The first difference is when we say px.pie. We are specifying to Plotly what type of graph we want. Then, like before, we are choosing our dataframe which is fema_month.
The values of the pieces of the pie will be the incidentCount. The names of the pieces of the pie are the index of fema_month. Then, we just made the title and showed the figure.
Did you notice that while this code seems shorter than the code for our bar graphs, it’s actually very similar?
Let’s move onto answering another question about our environment!
17.3.2 What are the top 5 states that face disasters?
So let’s look at the five states with the most incidents. Let’s also do two versions: with COVID and without COVID. Let’s start with COVID.
fema_states_covid = fema_df.groupby(["state"])["state"].count().to_frame().rename({'state': 'incidentCount'}, axis =1).sort_values(by='incidentCount', ascending = False)
fema_states_covid.head(10)
incidentCount | |
---|---|
state | |
TX | 5350 |
FL | 2783 |
KY | 2762 |
MO | 2750 |
GA | 2650 |
LA | 2589 |
VA | 2584 |
OK | 2543 |
NC | 2315 |
PR | 2078 |
Wow, that first line was really long! Let’s break it down.
In the first piece, we are counting the number of times each state appears in the dataset (fema_df). Why? Like before, with the months, everytime a state is listed in the dataset, that means that there has been an incident in that state.
In the next piece, we are renaming state to incidentCount. Again, why? Remember when we were counting incidents per month, and we had two month columns? If we didn’t rename it, then we would have two state columns. In this code, we are basically doing the same thing as before, but we are doing it for the state and adding a few new things.
Then, we are sorting the incidentCount values with ascending set to True so that the values can be displayed from greatest to least. Okay, so now we are done with the first line.
In the second line, we just displayed the first ten rows to make sure that it was working, and yay! It’s working!
Now, we can do two things so that it can show the top 5 states. At the end of the first line, we can add .head(5) so the data only consists of the top 5 states, or we can keep fema_states_covid the same and just print the head, which automatically shows the top 5 rows of the data.
Let’s add .head(5) at the end of the first line so that it’s easier for us later because then we wouldn’t have to add any extra code to display only the top 5 rows of the data.
fema_states_covid = fema_df.groupby(["state"])["state"].count().to_frame().rename({'state': 'incidentCount'}, axis =1).sort_values(by='incidentCount', ascending = False).head(5)
fema_states_covid
incidentCount | |
---|---|
state | |
TX | 5350 |
FL | 2783 |
KY | 2762 |
MO | 2750 |
GA | 2650 |
Now let’s do the same thing, but this time, the data won’t include COVID.
fema_states_nocovid = fema_df[fema_df.incidentYear < 2020].groupby(["state"])["state"].count().to_frame().rename({'state': 'incidentCount'}, axis =1).sort_values(by='incidentCount', ascending = False).head(5)
fema_states_nocovid
incidentCount | |
---|---|
state | |
TX | 4042 |
MO | 2421 |
VA | 2184 |
KY | 2149 |
OK | 2041 |
Here, we are doing the same thing as before, except we are taking out 2020 and 2021, which we have also done before.
Done! That was easy, right? Let’s now take this data and visualize this.
Let’s use a pie chart, but you can always make a bar graph on your own! This time, instead of making two versions of code for with and without COVID, let’s do it all in one cell and let the user decide which graph they want to see.
covid_status = input("Would you like to see a graph with or without COVID being considered? Enter Y for COVID and N for without COVID.")
if covid_status == "Y":
fig = px.pie(fema_states_covid, values = fema_states_covid.incidentCount, names = fema_states_covid.index, title = "Top 5 States with COVID")
fig.show()
else:
fig = px.pie(fema_states_nocovid, values = fema_states_nocovid.incidentCount, names = fema_states_nocovid.index, title = "Top 5 States Without COVID")
fig.show()
Wow! Isn’t this cool? It certainly makes things a lot easier for us. All we did was add an input and an if-else statement.
Let’s break down the code!
We are doing the same thing as before with the figures, so if you don’t understand that, then you should go back and look at our previous code.
Our new variable, covid_status, contains the user input. The first line takes the input from the user, and then the second line, which is the if statement, says that if the user input is equal to “Y”, then the graph with COVID should be displayed.
After, the else statement says that if the user inputs something other than “Y”, like “N”, then the graph without COVID should be displayed.
Easy peeasy lemon squeezy!
17.3.3 More visualizations!
What if we wanted to show all the months on a pie chart and not just the top 5? Let’s do this one without COVID too.
df_nocovid = fema_df[fema_df.incidentYear < 2020].groupby(["state"])["state"].count().to_frame().rename({'state': 'incidentCount'}, axis =1)
df_nocovid["state"] = df_nocovid.index
df_nocovid.head()
incidentCount | state | |
---|---|---|
state | ||
AK | 176 | AK |
AL | 1363 | AL |
AR | 1348 | AR |
AS | 65 | AS |
AZ | 241 | AZ |
We need to create a new dataframe called df_nocovid because the other dataframe only includes the top 5 states. In this dataframe, we are doing the same thing we did for the fema_states_nocovid variable, except we are taking out the .head(5) so that we can see all the states.
Then, we made a new variable, state, to take the value of the index of df_nocovid so that the state column is counted as the index column. Remember when we needed to do this before with the month columns?
Okay, now our data is ready, let’s move onto graphing it.
AHHHH! What is this? There are too many values and it’s so messy and crowded! Is there a way we can fix this?
Let’s group all the smaller values, let’s say states with an incident count less than 1000, into one group called “Other States”.
df_nocovid = fema_df[fema_df.incidentYear < 2020].groupby(["state"])["state"].count().to_frame().rename({'state': 'incidentCount'}, axis =1)
df_nocovid["state"] = df_nocovid.index
df_nocovid.loc[df_nocovid['incidentCount'] < 500, 'state'] = 'Other States'
fig = px.pie(df_nocovid, values = df_nocovid.incidentCount, names = df_nocovid.state)
fig.show()
So, our code is almost exactly the same, but we added one more line. Notice, in the third line, we added df_nocovid.loc.
What does .loc do?.
Basically, it lets us access different rows and columns in the DataFrame.
Here, we are accessing the incidentCount column and saying that if a value is less than 500, take the state, and add it to “Other States”. Now, when we display the figure, we have a new piece called Other States and our graph is much better.
Done! Let’s move onto the next question we can answer about our environment!
17.3.4 What are the most common FEMA disasters in your state?
To find the answer to this question, we need to get user input and get a specific state from the data.
Let’s use a Plotly histogram for this. Why aren’t we using a bar chart? Well, in this case a histogram is better because we don’t need as much code, and bar charts are basically the same thing as histograms depending on the situation.
Also, we are looking at the FEMA disasters across all time in the data, so we aren’t specifying anything but the state, whereas before, we had to change the month, year, etc.
state = input('Enter the state initials in all caps.')
fig = px.histogram(fema_df[fema_df.state == state], x="incidentType")
fig.show()
First, we are asking for the user to input their state initials in all caps.
Then, we are making a variable called fig and assigning the histogram to it by using px.histogram.
The histogram is getting data from the fema_df dataframe, but it is being filtered to only show data from the state that the user inputted. Then, we are giving the x-axis value for the histogram. Since we want to see the type of FEMA disasters, the x-axis will be incidentType.
Since the figure is a histogram, the y-axis is automatically the count and it shows how many times each type of disaster happened.
Finally, we are displaying the figure.
Looking at the graph, what are the most common FEMA disasters in your state? Why do you think that is?
Since we are in Georgia, we will try GA
17.3.5 Are FEMA declared disasters becoming more frequent in your state?
import plotly.express as px
state = input('State:')
fig = px.histogram(fema_df[fema_df.state == state], x="fyDeclared")
fig.show()
In the first two lines of code, we are doing the same thing as before. We are just importing Plotly and getting the state from the user.
Next, we are making a histogram again. The only difference is that this time, we are looking to see how many FEMA declared disasters there are over time. So, the x-axis will be fyDeclared so we know which year the disaster was declared. The y-axis is the count by default, so it shows how many times a disaster was declared for each year.
Then, we are just showing the figure.
Looking at the graph, you can see how many FEMA declared disasters there were in specific years and see if there are more over time. Are they becoming more frequent in your state? Are there any factors that you can think of that are contributing to it?
And with that, we have exhausted this dataset! There are still many more questions that can be answered and you should try it on your own based on what you have learned in this chapter.
First, think of something you want to know about your environment. That’s your question. Then think of the data you need, use your knowledge, and consult the Internet when necessary, and code!
You will make mistakes, but it’s extremely satisfying to figure it out and do something on your own. If you can’t figure it out, there’s always the Google and the lessons from this chapter!