Ship data analysis example¶

This tutorial uses AIS data published by the Danish Maritime Authority. The AIS record sample extracted for this tutorial covers vessel traffic on the 5th July 2017 near Gothenburg.
This tutorial covers:
- Trajectory data preprocessing
- Loading movement data from common geospatial file formats
- Exploring spatial & non-spatial data distributions
- Applying filters to extract relevant data
- Converting GeoDataFrames into Trajectories describing continuous tracks of moving objects
- Trajectory analysis
- Visualizing trajectories and their properties
- Filtering trajectories by area of interest
- Splitting continuous tracks into individual trips
- Exploring trip properties including: origins, destinations, and attributes
import numpy as np
import pandas as pd
import geopandas as gpd
import movingpandas as mpd
import shapely as shp
import hvplot.pandas
import matplotlib.pyplot as plt
from geopandas import GeoDataFrame, read_file
from shapely.geometry import Point, LineString, Polygon
from datetime import datetime, timedelta
from holoviews import opts, dim
from os.path import exists
from urllib.request import urlretrieve
import warnings
warnings.filterwarnings('ignore')
plot_defaults = {'linewidth':5, 'capstyle':'round', 'figsize':(9,3), 'legend':True}
opts.defaults(opts.Overlay(active_tools=['wheel_zoom'], frame_width=500, frame_height=400))
hvplot_defaults = {'tiles':None, 'cmap':'Viridis', 'colorbar':True}
mpd.show_versions()
MovingPandas 0.17.0 SYSTEM INFO ----------- python : 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:34:57) [MSC v.1936 64 bit (AMD64)] executable : H:\miniconda3\envs\mpd-ex\python.exe machine : Windows-10-10.0.19045-SP0 GEOS, GDAL, PROJ INFO --------------------- GEOS : None GEOS lib : None GDAL : 3.7.0 GDAL data dir: None PROJ : 9.2.1 PROJ data dir: H:\miniconda3\pkgs\proj-9.0.0-h1cfcee9_1\Library\share\proj PYTHON DEPENDENCIES ------------------- geopandas : 0.13.2 pandas : 2.0.3 fiona : 1.9.4 numpy : 1.24.4 shapely : 2.0.1 rtree : 1.0.1 pyproj : 3.6.0 matplotlib : 3.7.2 mapclassify: 2.5.0 geopy : 2.3.0 holoviews : 1.17.0 hvplot : 0.8.3 geoviews : 1.9.6 stonesoup : 1.0
Loading sample AIS data¶
%%time
df = read_file('../data/ais.gpkg')
print(f"Finished reading {len(df)}")
Finished reading 84702 CPU times: total: 15.4 s Wall time: 15.6 s
Let's see what the data looks like:
df.head()
Timestamp | MMSI | NavStatus | SOG | COG | Name | ShipType | geometry | |
---|---|---|---|---|---|---|---|---|
0 | 05/07/2017 00:00:03 | 219632000 | Under way using engine | 0.0 | 270.4 | NaN | Undefined | POINT (11.85958 57.68817) |
1 | 05/07/2017 00:00:05 | 265650970 | Under way using engine | 0.0 | 0.5 | NaN | Undefined | POINT (11.84175 57.66150) |
2 | 05/07/2017 00:00:06 | 265503900 | Under way using engine | 0.0 | 0.0 | NaN | Undefined | POINT (11.90650 57.69077) |
3 | 05/07/2017 00:00:14 | 219632000 | Under way using engine | 0.0 | 188.4 | NaN | Undefined | POINT (11.85958 57.68817) |
4 | 05/07/2017 00:00:19 | 265519650 | Under way using engine | 0.0 | 357.2 | NaN | Undefined | POINT (11.87192 57.68233) |
df.plot()
<Axes: >
If we look at the data distributions, we can see that there are a lot of records with speed over ground (SOG) values of zero in this dataframe:
df['SOG'].hist(bins=100, figsize=(15,3))
<Axes: >
Let's get rid of these rows with zero SOG:
print(f"Original size: {len(df)} rows")
df = df[df.SOG>0]
print(f"Reduced to {len(df)} rows after removing 0 speed records")
df['SOG'].hist(bins=100, figsize=(15,3))
Original size: 84702 rows Reduced to 33593 rows after removing 0 speed records
<Axes: >
Let's see what kind of ships we have in our dataset:
df['ShipType'].value_counts().plot(kind='bar', figsize=(15,3))
<Axes: xlabel='ShipType'>
Finally, let's create trajectories:
%%time
df['t'] = pd.to_datetime(df['Timestamp'], format='%d/%m/%Y %H:%M:%S')
traj_collection = mpd.TrajectoryCollection(df, 'MMSI', t='t', min_length=100)
print(f"Finished creating {len(traj_collection)} trajectories")
Finished creating 77 trajectories CPU times: total: 5.03 s Wall time: 5.07 s
traj_collection = mpd.MinTimeDeltaGeneralizer(traj_collection).generalize(tolerance=timedelta(minutes=1))
Plotting trajectories¶
Let's give the most common ship types distinct colors. The remaining ones will be just grey:
shiptype_to_color = {'Passenger': 'blue', 'HSC': 'green', 'Tanker': 'red', 'Cargo': 'orange', 'Sailing': 'grey', 'Other': 'grey',
'Tug': 'grey', 'SAR': 'grey', 'Undefined': 'grey', 'Pleasure': 'grey', 'Dredging': 'grey', 'Law enforcement': 'grey',
'Pilot': 'grey', 'Fishing': 'grey', 'Diving':'grey', 'Spare 2': 'grey'}
traj_collection.plot(column='ShipType', column_to_color=shiptype_to_color, linewidth=1, capstyle='round')
<Axes: >
passenger = traj_collection.filter('ShipType', 'Passenger')
passenger.hvplot(title='Passenger ferries', line_width=2, frame_width=700, frame_height=500)
Visualizing trajectory properties¶
We can also plot individual trajectories to better visualize their properties, such as the changes in NavStatus:
my_traj = traj_collection.trajectories[0]
my_traj.df.head()
Timestamp | MMSI | NavStatus | SOG | COG | Name | ShipType | geometry | |
---|---|---|---|---|---|---|---|---|
t | ||||||||
2017-07-05 17:32:18 | 05/07/2017 17:32:18 | 210035000 | Under way using engine | 9.8 | 52.8 | NORDIC HAMBURG | Cargo | POINT (11.80462 57.67612) |
2017-07-05 17:33:18 | 05/07/2017 17:33:18 | 210035000 | Under way using engine | 9.5 | 58.9 | NORDIC HAMBURG | Cargo | POINT (11.80875 57.67773) |
2017-07-05 17:34:18 | 05/07/2017 17:34:18 | 210035000 | Under way using engine | 9.3 | 70.5 | NORDIC HAMBURG | Cargo | POINT (11.81311 57.67879) |
2017-07-05 17:35:28 | 05/07/2017 17:35:28 | 210035000 | Under way using engine | 9.5 | 71.1 | NORDIC HAMBURG | Cargo | POINT (11.81855 57.67968) |
2017-07-05 17:36:28 | 05/07/2017 17:36:28 | 210035000 | Under way using engine | 9.4 | 71.3 | NORDIC HAMBURG | Cargo | POINT (11.82334 57.68044) |
my_traj.df.tail()
Timestamp | MMSI | NavStatus | SOG | COG | Name | ShipType | geometry | |
---|---|---|---|---|---|---|---|---|
t | ||||||||
2017-07-05 22:47:34 | 05/07/2017 22:47:34 | 210035000 | Moored | 0.1 | 276.0 | NORDIC HAMBURG | Cargo | POINT (11.84571 57.68958) |
2017-07-05 23:08:44 | 05/07/2017 23:08:44 | 210035000 | Moored | 0.1 | 96.0 | NORDIC HAMBURG | Cargo | POINT (11.84571 57.68958) |
2017-07-05 23:09:54 | 05/07/2017 23:09:54 | 210035000 | Moored | 0.1 | 96.0 | NORDIC HAMBURG | Cargo | POINT (11.84571 57.68958) |
2017-07-05 23:11:45 | 05/07/2017 23:11:45 | 210035000 | Moored | 0.1 | 96.0 | NORDIC HAMBURG | Cargo | POINT (11.84571 57.68958) |
2017-07-05 23:35:58 | 05/07/2017 23:35:58 | 210035000 | Moored | 0.1 | 276.0 | NORDIC HAMBURG | Cargo | POINT (11.84571 57.68958) |
my_traj.hvplot(title=f'Trajectory {my_traj.id}', frame_width=700, frame_height=500, line_width=5.0, c='NavStatus', cmap='Dark2')
Finding ships passing under Älvsborgsbron bridge¶
We can find ships passing under the bridge based on trajectory intersections with the bridge area.
area_of_interest = Polygon([(11.89935, 57.69270), (11.90161, 57.68902), (11.90334, 57.68967), (11.90104, 57.69354), (11.89935, 57.69270)])
intersecting = traj_collection.get_intersecting(area_of_interest)
print(f"Found {len(intersecting)} intersections")
Found 20 intersections
bridge_traj = intersecting.trajectories[0]
bridge_traj.hvplot(title=f'Trajectory {bridge_traj.id}', frame_width=700, frame_height=500, line_width=5.0, c='NavStatus', cmap='Dark2')
bridge_traj.df.head()
Timestamp | MMSI | NavStatus | SOG | COG | Name | ShipType | geometry | |
---|---|---|---|---|---|---|---|---|
t | ||||||||
2017-07-05 01:21:57 | 05/07/2017 01:21:57 | 219011922 | Under way using engine | 8.8 | 227.7 | SCANDINAVIA | Tanker | POINT (11.91230 57.69633) |
2017-07-05 01:23:06 | 05/07/2017 01:23:06 | 219011922 | Under way using engine | 8.7 | 227.5 | SCANDINAVIA | Tanker | POINT (11.90842 57.69440) |
2017-07-05 01:24:06 | 05/07/2017 01:24:06 | 219011922 | Under way using engine | 8.7 | 227.0 | SCANDINAVIA | Tanker | POINT (11.90515 57.69275) |
2017-07-05 01:25:06 | 05/07/2017 01:25:06 | 219011922 | Under way using engine | 8.6 | 238.8 | SCANDINAVIA | Tanker | POINT (11.90161 57.69129) |
2017-07-05 01:26:08 | 05/07/2017 01:26:08 | 219011922 | Under way using engine | 8.5 | 245.4 | SCANDINAVIA | Tanker | POINT (11.89763 57.69015) |
Identifying trip origins and destinations¶
Since AIS records with a speed over ground (SOG) value of zero have been removed from the dataset, we can use the split_by_observation_gap()
function to split the continuous observations into individual trips:
trips = mpd.ObservationGapSplitter(passenger).split(gap=timedelta(minutes=5))
print(f"Extracted {len(trips)} individual trips from {len(passenger)} continuous vessel tracks")
Extracted 90 individual trips from 14 continuous vessel tracks
Let's plot the resulting trips!
trips.hvplot(title='Passenger ferry trips', line_width=2, frame_width=700, frame_height=500)