1 Setup and Installation
We have used a Mac for writing the book and running the code. So most of our instructions are for a Mac. But we have tried to include windows instructions when possible.
We need to install the following for running the code in the book
1.1 Python
Python is the primary programming language used in this book. We started learning programming by learning Python and doing simple exercises in Python.
1.2 Why Python?
First, why Python?
Python is one of the most popular programming languages and is used extensively in the industry. It is also relatively easy to learn.
Python has overtaken SQL as the third most commonly-used language, but placing first for those who are not professional developers or learning to code (Other Coders). Source: https://survey.stackoverflow.co/2023/#technology-most-popular-technologies
https://stackoverflow.blog/2023/01/26/comparing-tag-trends-with-our-most-loved-programming-languages/
https://pypl.github.io/PYPL.html
See https://stackoverflow.blog/2023/01/26/comparing-tag-trends-with-our-most-loved-programming-languages/ for a nice illustration of number of Stack Overflow question tags
By the way, this is where we ended up with many questions and learned a lot from the answers on Stack Overflow.
The official website of Python is https://www.python.org/. The latest version for mac as of July 2024 is 3.12.4. We started with Python 3.8 when we initially started coding and most of the code was written using Python 3.11 release.
One can directly download the latest version of Python from https://www.python.org/downloads/ for the operating system that you are using, Windows or Mac.
We would however recommend using Anaconda installer that can be downloaded from https://www.anaconda.com/download for Windows, Mac and Linux operating systems. The current version available on Anaconda is 3.12.4.
1.3 VS Code
We also recommend Visual Studio Code or VS Code https://code.visualstudio.com/, a free source code editor from Microsoft. VS Code supports most of the major programming languages. See https://en.wikipedia.org/wiki/Visual_Studio_Code for more information. It is one of the most popular Integrated Development Environment (IDE), with almost 78% of the developers who are learning to code using it (Source: https://survey.stackoverflow.co/2023/#most-popular-technologies-new-collab-tools)
VS Code gives step-by-step instructions to install VS Code and the Python environment. See https://code.visualstudio.com/docs/python/python-tutorial.
Basically, we need to install
Python 3, either directly from https://www.python.org/ or Anaconda distribution https://www.anaconda.com/download
VS Code. It can be downloaded from https://code.visualstudio.com/download
VS Code Python extension. Once the VS Code is installed, one can go to the left-hand tab where the extensions icon is present. One can search for Python in Extensions: Marketplace and choose the Python extension by Microsoft. See https://code.visualstudio.com/docs/editor/extension-marketplace for more information.
1.4 Python Libraries used in the book
In the section on Python primer and packages, we will cover in detail some Python libraries that we will use throughout the book.
Python is a very popular programming language. There are almost 560,000 packages available as of writing the current version of the book.
There are many packages that are written by open source contributors or companies that can really be useful for the analysis in the book. In fact, one of the strengths of Python is its huge ecosystem of packages that are available for almost every conceivable application.
In order to install packages, we need to use a package manager. PIP https://pypi.org/project/pip/ is the python package manager that is included as part of the Python installation from https://www.python.org.
More information on how to use pip for packages is available at https://pip.pypa.io/en/stable/getting-started
If you use Anaconda, conda** is the package and environment manager program bundled with Anaconda that installs and updates conda packages and their dependencies. See https://conda.io/projects/conda/en/latest/user-guide/install/index.html for the necessary steps to install conda.
1.4.1 Numpy
NumPy is a fundamental package for scientific computing in Python. It is an open source Python library that provides a multidimensional array object and many routines for fast operations on the arrays. NumPy offers comprehensive mathematical functions, random number generators, linear algebra routines and more. It would be useful to work with arrays and matrices and for handling large datasets.
There is more detailed information at https://numpy.org/.
The only prerequisite for installing NumPy is Python itself. The latest version on NumPy as of the final edit of the book is Numpy 2.0.0.
One can install NumPy using pip or conda.
Using pip, the command is
using conda, the command is
However, we won’t be using it directly in many examples. Pandas is built on Numpy. Installing Pandas will install NumPy as it is a dependency.
We can use NumPy in the program by importing it like any other package
Python code in one module gains access to the code in another module by the process of importing it. https://docs.python.org/3/reference/import.html. The import statement is used to bring in modules or specific functions from modules, so we can use them in our program.
Think of it like getting a toolbox with different tools. These tools are stored in modules, which are collections of functions and variables that someone else has already written, in this case Numpy. When you use import, we are telling Python, “I want to use the tools from this module in my code.”
1.4.2 Pandas
pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. Specifically, it is built on NumPy. Pandas makes handling datasets used in this book very easy. It is the main package that we use through the book. See https://pandas.pydata.org/ for more details.
As of this version of the book, pandas 2.2.2 is the latest version. There is more detailed information at https://pandas.pydata.org/.
If we install anaconda distribution, pandas is already installed as part of the distribution. Alternately, you can use the command
You can install pandas using pip by using the command
We can use pandas in the program by importing it like any other package
There is detailed information on working with pandas at https://pandas.pydata.org/docs/getting_started/index.html. We will make extensive use of pandas, and we will give many examples of its usage throughout the book.
pandas has a dataframe object that is very useful for us to work with data, especially tabular data (data with rows and columns). Almost every piece of code in the book uses dataframe object and we will discuss this in detail later in the book.
Also check out the excellent book on Python for Data Analysis by Wes Mckinney, https://wesmckinney.com/book/, that gives more information on using Python for data analysis. The book primarily uses Numpy and Pandas. We have learned a lot by reading the book and we highly recommend it. It is available Wes Mckinney’s website for free! What can be better?
1.4.3 Plotly
One of the important packages we will need is a visualization or a graphing library. There are many good choices, but we will use Plotly https://plotly.com/python/ throughout the book.
We will describe Plotly in more detail later, but here we would like to show how Plotly can be installed using pip or conda.
Note, we are using the specific version of Plotly, 5.23.0, that is current as of writing the book.
If we are using conda, we can use
You can find more information on installing Plotly at https://plotly.com/python/getting-started/
The Plotly library itself has less functionality than the Dash library by the same firm. One can build powerful visualizations and apps using Dash. For more information, see https://dash.plotly.com/
There are many ways to represent the same relationships between two variables, say time and temperature. We used the ones that felt more intuitive to us and that we understood better.
So, we mostly used Plotly express, a subset of the full Plotly library. See https://plotly.com/python/plotly-express/
As the documentation highlights
The plotly.express module (usually imported as px) contains functions that can create entire figures at once, and is referred to as Plotly Express or PX. Plotly Express is a built-in part of the Plotly library, and is the recommended starting point for creating most common figures.
We tried to keep our graphs as simple as possible while illustrating the relationships between various data points, for example, time and temperature.
There may be better ways to represent the relationships. But we relied on the basic graphs that we understood and what we felt are easy to explain to the readers.
For example, most of the graphs in the book are based on basic Plotly Express
Basics: scatter, line, area, bar, funnel, timeline
There are many other graphing options that Plotly can generate
- Part-of-Whole: pie, sunburst, treemap, icicle, funnel_area
- 1D Distributions: histogram, box, violin, strip, ecdf
- 2D Distributions: density_heatmap, density_contour
- Matrix or Image Input: imshow
- 3-Dimensional: scatter_3d, line_3d
- Multidimensional: scatter_matrix, parallel_coordinates, parallel_categories
- Tile Maps: scatter_mapbox, line_mapbox, choropleth_mapbox, density_mapbox
- Outline Maps: scatter_geo, line_geo, choropleth
- Polar Charts: scatter_polar, line_polar, bar_polar
- Ternary Charts: scatter_ternary, line_ternary
Honestly, we didn’t research all these graph types and where they can be used. Our AP Statistics course gave us some understanding of the underlying concepts and where some of these graphs can be used. However, in the interest of time and accessibility, we stuck to the basic graph types and didn’t use the more sophisticated graph types. They may be more suitable in some cases, and you can explore them on your own.
We will explain how to generate graphs using Plotly with climate related data in the next chapters. For now, we can load Plotly like any other package with the following command.
In this case, by importing Plotly, we are making use of all the graphing and visualization capabilities available in Plotly.
We don’t need the full powers of Plotly, so we are just importing only Plotly express library.
We don’t want to write plotly.express each time in the code. We can give the library a shorter alias, px, so that we can refer to it more easily in our code.
For example, if we want to create a scatter plot we can just use px.scatter.
Sometimes, we may need more than plotly.express. Then we can import the relevant modules, say graph_objects and subplots by using:
We will explain these when we use them in later chapters.
1.4.4 Other Visualization Packages
There are many excellent alternatives to Plotly such as
matplotlib is the default in python and pandas for visualization, but its functionality is limited relative to plotly. See https://matplotlib.org/
altair, elegant using the so called grammar of graphics. For the purposes of our book, we found plotly easier to use. See https://altair-viz.github.io/
seaborn. Seaborn is a Python data visualization library based on matplotlib. You can find more information at https://seaborn.pydata.org/
We like plotly as it is easy to learn and the commands are intuitive and simple.
1.5 Other Libraries or Packages
Some other packages that may be useful are listed below. We will try to update them as we find more useful packages.
- openpyxl (v. 3.1.2): for reading Excel files.
1.6 Jupyter Lab and Jupyter Note Books
Alternately, instead of VS Code, you can use Jupyter notebooks or Jupyter Lab. Jupyter notebooks are very popular and extensively used and the results can be easily shared. See https://jupyter.org/. You can first try Jupyter in the browser.
or using conda
Once installed, you can launch Jupyter
One can also enable Jupyter Notebooks in VS Code.
There is detailed information available at https://code.visualstudio.com/docs/datascience/jupyter-notebooks
To work with Python in Jupyter Notebooks, you must activate an Anaconda environment in VS Code, or another Python environment in which you’ve installed the Jupyter package. To select an environment, use the Python: Select Interpreter command from the Command Palette (⇧⌘P).
1.7 Using Python in the Browser
If you don’t want to go through all the hassle of installing python and some other IDE, you can first try accessing python through a browser. One example of such a tool is JupyterLite.
Of course, the functionality would be limited. Sometimes, we need more resources to download data or install packages that are not available in this browser based setup. JupyterLite uses Pyodide as the python kernel.
As the website mentions,
JupyterLab is a next-generation web-based user interface for Project Jupyter. It enables you to work with documents and activities such as Jupyter notebooks, text editors, terminals, and custom components in a flexible, integrated, and extensible manner.
1.8 Using Google Colab Notebooks
Google Colab is the easiest way of using Python and executing some of the code in this book. Colaboratory is built on top of Jupyter Notebook.
Colab, or “Colaboratory”, allows you to write and execute Python in your browser, with
- No configuration required
- Access to computing environment free of charge
- Easy sharing
More information can be found at
- https://colab.research.google.com/,
- https://colab.research.google.com/notebooks/basic_features_overview.ipynb
In the future, we will make some of our notebooks available on Google Colab so that you can play around without installing Python and other packages locally. It is free!
1.9 Other Resources
You will inevitably have issues in installing. We have used Google and Stack Overflow a lot in our coding journey. Recently, we have started using ChatGPT for some help, especially for explaining the code that we wrote. However, as we explained in the introduction, we feel that it is very important to know how to code and not to use ChatGPT or some other similar tool blindly.
There are also excellent resources for the beginners at