Public Transport Trajectory Analysis with MovingPandas¶
Example: Trips of a single Bus¶
This notebook demonstrates trajectory analysis using real-world public transport GPS data from the UK Bus Open Data Service (BODS).
Learning Objectives:
- Create trajectories from GPS data
- Handle temporal gaps (layovers, depot movements)
- Calculate speed profiles
- Visualize operational patterns
Data: Single bus vehicle, Route 14 (Liverpool), evening peak sample.
Limitations: This example uses raw GPS data without map matching. GPS points are connected by straight lines, not road network geometry.
1. Setup¶
import movingpandas as mpd
import pandas as pd
import matplotlib.pyplot as plt
import hvplot.pandas
print(f"MovingPandas version: {mpd.__version__}")
print(f"Pandas version: {pd.__version__}")
print(f"hvplot version: {hvplot.__version__}")
MovingPandas version: 0.22.4 Pandas version: 2.3.3 hvplot version: 0.12.2
2. Load and Filter Data¶
We'll focus on a single vehicle to demonstrate core concepts clearly.
# Load GPS data
df = pd.read_csv('../data/liverpool_bus/route14_outbound.csv')
df['timestamp'] = pd.to_datetime(df['timestamp'])
print(f"Total records: {len(df)}")
print(f"Unique vehicles: {df['vehicle_id'].nunique()}")
print(f"Time range: {df['timestamp'].min()} to {df['timestamp'].max()}")
print(f"\nVehicles in dataset:")
print(df['vehicle_id'].value_counts())
Total records: 1533 Unique vehicles: 8 Time range: 2026-01-26 15:55:12 to 2026-01-26 18:19:36 Vehicles in dataset: vehicle_id 4720 278 4733 245 4842 235 4836 187 4803 183 4841 159 4716 136 4722 110 Name: count, dtype: int64
# Select single vehicle with most data points
vehicle_id = df['vehicle_id'].value_counts().index[0]
df_vehicle = df[df['vehicle_id'] == vehicle_id].copy()
print(f"Selected vehicle: {vehicle_id}")
print(f"GPS records: {len(df_vehicle)}")
print(f"Time span: {(df_vehicle['timestamp'].max() - df_vehicle['timestamp'].min()).total_seconds()/60:.1f} minutes")
df_vehicle.head()
Selected vehicle: 4720 GPS records: 278 Time span: 128.6 minutes
| id | vehicle_id | trip_id | timestamp | latitude | longitude | bearing | origin | destination | route_name | direction | operator | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 136 | 23825912 | 4720 | 1095 | 2026-01-26 16:03:21 | 53.407516 | -2.983266 | 109.0 | Queen_Square_Bus_Station | Petherick_Road | 14 | outbound | AMSY |
| 137 | 23827089 | 4720 | 1095 | 2026-01-26 16:03:54 | 53.407498 | -2.983178 | NaN | Queen_Square_Bus_Station | Petherick_Road | 14 | outbound | AMSY |
| 138 | 23827871 | 4720 | 1095 | 2026-01-26 16:04:27 | 53.407551 | -2.983131 | NaN | Queen_Square_Bus_Station | Petherick_Road | 14 | outbound | AMSY |
| 139 | 23829454 | 4720 | 1095 | 2026-01-26 16:05:00 | 53.407623 | -2.982990 | NaN | Queen_Square_Bus_Station | Petherick_Road | 14 | outbound | AMSY |
| 140 | 23830647 | 4720 | 1095 | 2026-01-26 16:05:31 | 53.407626 | -2.982890 | NaN | Queen_Square_Bus_Station | Petherick_Road | 14 | outbound | AMSY |
3. Create Trajectory¶
Convert GPS points to a MovingPandas trajectory.
# Create trajectory
traj = mpd.Trajectory(
df_vehicle,
x='longitude', y='latitude', crs='EPSG:4326',
t='timestamp',
traj_id=f'vehicle_{vehicle_id}')
print(f"Trajectory created: {traj.id}")
print(f"Number of points: {len(traj.df)}")
print(f"Start time: {traj.get_start_time()}")
print(f"End time: {traj.get_end_time()}")
print(f"Duration: {traj.get_duration()}")
print(f"Length: {traj.get_length():.0f} meters")
print(f"Bounding box: {traj.get_bbox()}")
Trajectory created: vehicle_4720 Number of points: 278 Start time: 2026-01-26 16:03:21 End time: 2026-01-26 18:11:58 Duration: 2:08:37 Length: 21342 meters Bounding box: (-2.984026, 53.407466, -2.893983, 53.462305)
4. Detect and Handle Gaps¶
Public transport vehicles have operational gaps:
- Layovers at terminals
- Depot movements
- Service breaks
We'll split the trajectory at significant time gaps (>5 minutes).
# Split trajectory at gaps > 5 minutes
gap_splitter = mpd.ObservationGapSplitter(traj)
segments = gap_splitter.split(gap=pd.Timedelta('5min'))
print(f"Original trajectory split into {len(segments)} segments\n")
for i, segment in enumerate(segments.trajectories):
duration = segment.get_duration().total_seconds() / 60
length = segment.get_length()
print(f"Segment {i+1}:")
print(f" Points: {len(segment.df)}")
print(f" Duration: {duration:.1f} minutes")
print(f" Length: {length:.0f} meters")
print(f" Time: {segment.get_start_time().strftime('%H:%M:%S')} - {segment.get_end_time().strftime('%H:%M:%S')}")
print()
Original trajectory split into 2 segments Segment 1: Points: 131 Duration: 46.5 minutes Length: 6327 meters Time: 16:03:21 - 16:49:48 Segment 2: Points: 147 Duration: 51.3 minutes Length: 10048 meters Time: 17:20:40 - 18:11:58
5. Calculate Speed¶
Add speed information to the trajectory segments.
# Add speed to each segment
segments.add_speed(overwrite=True)
segments.add_speed(name="speed (mph)", units=('mi','h'), overwrite=True)
print("Speed statistics by segment:\n")
for i, segment in enumerate(segments.trajectories):
speeds_ms = segment.df['speed']
speeds_mph = segment.df['speed (mph)']
print(f"Segment {i+1}:")
print(f" Mean speed: {speeds_mph.mean():.1f} mph ({speeds_ms.mean():.1f} m/s)")
print(f" Max speed: {speeds_mph.max():.1f} mph ({speeds_ms.max():.1f} m/s)")
print(f" Min speed: {speeds_mph.min():.1f} mph ({speeds_ms.min():.1f} m/s)")
print()
Speed statistics by segment: Segment 1: Mean speed: 5.5 mph (2.5 m/s) Max speed: 28.5 mph (12.7 m/s) Min speed: 0.0 mph (0.0 m/s) Segment 2: Mean speed: 8.3 mph (3.7 m/s) Max speed: 25.2 mph (11.2 m/s) Min speed: 0.0 mph (0.0 m/s)
# Plot all segments
segments.plot(column='trip_id', legend=True).set_title(f'Bus Trajectory: Vehicle {vehicle_id}\nRoute 14 Outbound')
Text(0.5, 1.0, 'Bus Trajectory: Vehicle 4720\nRoute 14 Outbound')
Interactive Map with Speed Colors¶
# Interactive visualization using explore()
segments.explore(
column='speed',
cmap='RdYlGn',
style_kwds={'weight': 5},
tiles='CartoDB positron'
)
7. Speed Profile Analysis¶
# Plot speed over time for all segments
all_data = pd.concat([
seg.df[['speed (mph)']].assign(segment=f'Segment {i+1}')
for i, seg in enumerate(segments.trajectories)
])
# Interactive time series plot
all_data.hvplot.line(
y='speed (mph)',
by='segment',
title=f'Speed Profile: Vehicle {vehicle_id}',
xlabel='Time',
ylabel='Speed (mph)',
width=900,
height=400,
legend='top_left'
)
# Speed distribution - interactive
all_data.hvplot.hist(
y='speed (mph)',
bins=30,
title='Speed Distribution',
xlabel='Speed (mph)',
ylabel='Frequency',
width=450,
height=400
) + all_data.hvplot.box(
y='speed (mph)',
by='segment',
title='Speed by Segment',
ylabel='Speed (mph)',
width=450,
height=400
)
8. Summary Statistics¶
print("=" * 70)
print(f"TRAJECTORY ANALYSIS SUMMARY: Vehicle {vehicle_id}")
print("=" * 70)
print(f"\n📊 OVERVIEW:")
print(f" • Total GPS points: {len(traj.df)}")
print(f" • Time span: {(traj.get_end_time() - traj.get_start_time()).total_seconds()/60:.0f} minutes")
print(f" • Total distance: {traj.get_length():.0f} meters ({traj.get_length()/1609:.1f} miles)")
print(f"\n🔄 SEGMENTATION:")
print(f" • Number of segments: {len(segments)}")
print(f" • Gap threshold: 5 minutes")
operational_time = sum(seg.get_duration().total_seconds() for seg in segments.trajectories) / 60
total_time = (traj.get_end_time() - traj.get_start_time()).total_seconds() / 60
print(f" • Operational time: {operational_time:.0f} minutes")
print(f" • Layover/break time: {total_time - operational_time:.0f} minutes")
print(f"\n⚡ SPEED ANALYSIS:")
all_speeds_mph = pd.concat([seg.df['speed (mph)'] for seg in segments.trajectories])
print(f" • Mean speed: {all_speeds_mph.mean():.1f} mph")
print(f" • Max speed: {all_speeds_mph.max():.1f} mph")
print(f" • Median speed: {all_speeds_mph.median():.1f} mph")
print(f" • Std deviation: {all_speeds_mph.std():.1f} mph")
# Speed categories
stopped = len(all_speeds_mph[all_speeds_mph < 3])
slow = len(all_speeds_mph[(all_speeds_mph >= 3) & (all_speeds_mph < 10)])
medium = len(all_speeds_mph[(all_speeds_mph >= 10) & (all_speeds_mph < 20)])
fast = len(all_speeds_mph[all_speeds_mph >= 20])
print(f"\n🚦 SPEED DISTRIBUTION:")
print(f" • Stopped (<3 mph): {stopped} points ({stopped/len(all_speeds_mph)*100:.1f}%)")
print(f" • Slow (3-10 mph): {slow} points ({slow/len(all_speeds_mph)*100:.1f}%)")
print(f" • Medium (10-20 mph): {medium} points ({medium/len(all_speeds_mph)*100:.1f}%)")
print(f" • Fast (≥20 mph): {fast} points ({fast/len(all_speeds_mph)*100:.1f}%)")
print("\n" + "=" * 70)
====================================================================== TRAJECTORY ANALYSIS SUMMARY: Vehicle 4720 ====================================================================== 📊 OVERVIEW: • Total GPS points: 278 • Time span: 129 minutes • Total distance: 21342 meters (13.3 miles) 🔄 SEGMENTATION: • Number of segments: 2 • Gap threshold: 5 minutes • Operational time: 98 minutes • Layover/break time: 31 minutes ⚡ SPEED ANALYSIS: • Mean speed: 7.0 mph • Max speed: 28.5 mph • Median speed: 4.4 mph • Std deviation: 7.6 mph 🚦 SPEED DISTRIBUTION: • Stopped (<3 mph): 122 points (43.9%) • Slow (3-10 mph): 82 points (29.5%) • Medium (10-20 mph): 45 points (16.2%) • Fast (≥20 mph): 29 points (10.4%) ======================================================================
9. Extension: Multiple Vehicles¶
This example focused on a single vehicle for clarity. To analyze multiple vehicles:
tc = mpd.TrajectoryCollection(
df, x='longitude', y='latitude', crs='EPSG:4326',
t='timestamp', traj_id_col='vehicle_id')
tc
TrajectoryCollection with 8 trajectories
Limitations and Considerations¶
Data Characteristics¶
- Raw GPS data without map matching
- Sample includes partial trip (data collection window)
- Demonstrates handling of incomplete trajectories
- GPS points connected by straight lines (not road geometry)
For Production Analysis¶
Consider these enhancements:
- Map matching: Align trajectories to road network (e.g., using OSMnx, Valhalla)
- GTFS integration: Use scheduled route shapes for reference
- Stop detection: Match GPS points to known bus stop locations
- Quality filtering: Remove anomalous GPS points
MovingPandas Capabilities Demonstrated¶
✓ Trajectory creation from timestamped point data
✓ Temporal gap detection and splitting
✓ Speed calculation from GPS observations
✓ Trajectory metrics (length, duration, bounds)
✓ Visualization with geographic context