Pollution data analysis example¶
This tutorial uses data published by the Department of Computer Science and Engineering, Indian Institute of Technology Delhi, specifically: Delhi Pollution Dataset. The workflow consists of the following steps:
- Establishing an overview by visualizing raw input data records
- Converting data into trajectories
- Removing problematic trajectories using ObservationGapSplitter and filtering by speed
- Plotting cleaned trajectories
- Assigning H3 cell IDs to each trajectory point
- Plotting H3 cells as polygons with pollution measurements
Some of the steps working with H3 are based on the following: Medium article.
In [1]:
import numpy as np
import pandas as pd
import geopandas as gpd
import movingpandas as mpd
import shapely as shp
import hvplot.pandas
import matplotlib.pyplot as plt
import h3
import folium
from geopandas import GeoDataFrame, read_file
from shapely.geometry import Point, LineString, Polygon
from datetime import datetime, timedelta
from holoviews import opts, dim
from os.path import exists
from urllib.request import urlretrieve
import warnings
warnings.filterwarnings("ignore")
plot_defaults = {"linewidth": 5, "capstyle": "round", "figsize": (9, 3), "legend": True}
opts.defaults(
opts.Overlay(active_tools=["wheel_zoom"], frame_width=300, frame_height=500)
)
hvplot_defaults = {"tiles": None, "cmap": "Viridis", "colorbar": True}
mpd.show_versions()
MovingPandas 0.20.0 SYSTEM INFO ----------- python : 3.10.15 | packaged by conda-forge | (main, Oct 16 2024, 01:15:49) [MSC v.1941 64 bit (AMD64)] executable : c:\Users\Agarkovam\AppData\Local\miniforge3\envs\mpd-ex\python.exe machine : Windows-10-10.0.19045-SP0 GEOS, GDAL, PROJ INFO --------------------- GEOS : None GEOS lib : None GDAL : None GDAL data dir: None PROJ : 9.5.0 PROJ data dir: C:\Users\Agarkovam\AppData\Local\miniforge3\envs\mpd-ex\Library\share\proj PYTHON DEPENDENCIES ------------------- geopandas : 1.0.1 pandas : 2.2.3 fiona : None numpy : 1.23.1 shapely : 2.0.6 pyproj : 3.7.0 matplotlib : 3.9.2 mapclassify: 2.8.1 geopy : 2.4.1 holoviews : 1.20.0 hvplot : 0.11.1 geoviews : 1.13.0 stonesoup : 1.4
Loading pollution data¶
In [2]:
%%time
df = pd.read_csv("../data/2021-01-30_all.zip", index_col=0)
print(f"Finished reading {len(df)}")
Finished reading 180573 CPU times: total: 453 ms Wall time: 451 ms
Let's see what the data looks like:
In [3]:
df.head()
Out[3]:
uid | dateTime | deviceId | lat | long | pm1_0 | pm2_5 | pm10 | |
---|---|---|---|---|---|---|---|---|
0 | 0db83849-cd24-477a-a9d8-e48da1c914a3 | 2021-01-30 00:00:01+05:30 | 00000000c37f0aa8 | 28.579370 | 77.228798 | 94.0 | 142.0 | 156.0 |
1 | a77c4ff0-7723-418b-a02e-5653e9fa4530 | 2021-01-30 00:00:01+05:30 | 10000000dc5bb76b | 28.579414 | 77.231705 | 127.0 | 213.0 | 231.0 |
2 | f3109afa-7234-44a1-9911-b01646e33ed8 | 2021-01-30 00:00:04+05:30 | 10000000dc5bb76b | 28.579414 | 77.231705 | 126.0 | 214.0 | 231.0 |
3 | 39b07547-9467-45c6-85e9-3a917ce969f3 | 2021-01-30 00:00:04+05:30 | 00000000c37f0aa8 | 28.579367 | 77.228806 | 94.0 | 145.0 | 157.0 |
4 | 41b70559-af34-486d-975b-9de3bd30c0f0 | 2021-01-30 00:00:06+05:30 | 10000000dc5bb76b | 28.579414 | 77.231705 | 128.0 | 218.0 | 238.0 |
In [4]:
df.plot(c="pm2_5", x="long", y="lat", kind="scatter")
Out[4]:
<Axes: xlabel='long', ylabel='lat'>
Let's create trajectories:
In [5]:
tc = mpd.TrajectoryCollection(df, "deviceId", t="dateTime", x="long", y="lat")
print(tc)
TrajectoryCollection with 11 trajectories
Removing problematic trajectories¶
We use Particulate Matter (PM) as an indicator for air pollution:
In [6]:
traj_gdf = tc.to_traj_gdf(agg={"pm2_5": "mean"})
In [7]:
traj_gdf.plot("pm2_5_mean", cmap="YlOrRd", linewidth=0.7, legend=True, aspect=1)
Out[7]:
<Axes: >
Let's remove problematic trajectories as much as we can:
In [8]:
split = mpd.ObservationGapSplitter(tc).split(gap=timedelta(minutes=10))
split
Out[8]:
TrajectoryCollection with 76 trajectories
In [9]:
split = split.add_speed(units=("km", "h"))
In [10]:
traj_gdf = split.to_traj_gdf(agg={"pm2_5": "mean", "speed": "max"})
Anything over a speed of 108km/h or 30m/s seems unlikely for a bus, so let's filter these points out:
In [11]:
traj_gdf = traj_gdf[traj_gdf.speed_max < 108]
Plotting trajectories¶
Let's plot the resulting trajectories:
In [12]:
traj_gdf["start_t"] = traj_gdf["start_t"].astype(str)
traj_gdf["end_t"] = traj_gdf["end_t"].astype(str)
In [13]:
traj_gdf = traj_gdf.round(2)
In [14]:
traj_gdf.explore(
column="pm2_5_mean",
cmap="YlOrRd",
tiles="CartoDB positron",
style_kwds={"weight": 4},
)
Out[14]:
Make this Notebook Trusted to load map: File -> Trust Notebook