Public Transport Trajectory Analysis with MovingPandas¶

Example: Trips of a single Bus¶

This notebook demonstrates trajectory analysis using real-world public transport GPS data from the UK Bus Open Data Service (BODS).

Learning Objectives:

  • Create trajectories from GPS data
  • Handle temporal gaps (layovers, depot movements)
  • Calculate speed profiles
  • Visualize operational patterns

Data: Single bus vehicle, Route 14 (Liverpool), evening peak sample.

Limitations: This example uses raw GPS data without map matching. GPS points are connected by straight lines, not road network geometry.

1. Setup¶

In [1]:
import movingpandas as mpd
import pandas as pd
import matplotlib.pyplot as plt
import hvplot.pandas

print(f"MovingPandas version: {mpd.__version__}")
print(f"Pandas version: {pd.__version__}")
print(f"hvplot version: {hvplot.__version__}")
MovingPandas version: 0.22.4
Pandas version: 2.3.3
hvplot version: 0.12.2

2. Load and Filter Data¶

We'll focus on a single vehicle to demonstrate core concepts clearly.

In [2]:
# Load GPS data
df = pd.read_csv('../data/liverpool_bus/route14_outbound.csv')
df['timestamp'] = pd.to_datetime(df['timestamp'])

print(f"Total records: {len(df)}")
print(f"Unique vehicles: {df['vehicle_id'].nunique()}")
print(f"Time range: {df['timestamp'].min()} to {df['timestamp'].max()}")
print(f"\nVehicles in dataset:")
print(df['vehicle_id'].value_counts())
Total records: 1533
Unique vehicles: 8
Time range: 2026-01-26 15:55:12 to 2026-01-26 18:19:36

Vehicles in dataset:
vehicle_id
4720    278
4733    245
4842    235
4836    187
4803    183
4841    159
4716    136
4722    110
Name: count, dtype: int64
In [3]:
# Select single vehicle with most data points
vehicle_id = df['vehicle_id'].value_counts().index[0]
df_vehicle = df[df['vehicle_id'] == vehicle_id].copy()

print(f"Selected vehicle: {vehicle_id}")
print(f"GPS records: {len(df_vehicle)}")
print(f"Time span: {(df_vehicle['timestamp'].max() - df_vehicle['timestamp'].min()).total_seconds()/60:.1f} minutes")

df_vehicle.head()
Selected vehicle: 4720
GPS records: 278
Time span: 128.6 minutes
Out[3]:
id vehicle_id trip_id timestamp latitude longitude bearing origin destination route_name direction operator
136 23825912 4720 1095 2026-01-26 16:03:21 53.407516 -2.983266 109.0 Queen_Square_Bus_Station Petherick_Road 14 outbound AMSY
137 23827089 4720 1095 2026-01-26 16:03:54 53.407498 -2.983178 NaN Queen_Square_Bus_Station Petherick_Road 14 outbound AMSY
138 23827871 4720 1095 2026-01-26 16:04:27 53.407551 -2.983131 NaN Queen_Square_Bus_Station Petherick_Road 14 outbound AMSY
139 23829454 4720 1095 2026-01-26 16:05:00 53.407623 -2.982990 NaN Queen_Square_Bus_Station Petherick_Road 14 outbound AMSY
140 23830647 4720 1095 2026-01-26 16:05:31 53.407626 -2.982890 NaN Queen_Square_Bus_Station Petherick_Road 14 outbound AMSY

3. Create Trajectory¶

Convert GPS points to a MovingPandas trajectory.

In [4]:
# Create trajectory
traj = mpd.Trajectory(
    df_vehicle, 
    x='longitude', y='latitude', crs='EPSG:4326',
    t='timestamp',
    traj_id=f'vehicle_{vehicle_id}')

print(f"Trajectory created: {traj.id}")
print(f"Number of points: {len(traj.df)}")
print(f"Start time: {traj.get_start_time()}")
print(f"End time: {traj.get_end_time()}")
print(f"Duration: {traj.get_duration()}")
print(f"Length: {traj.get_length():.0f} meters")
print(f"Bounding box: {traj.get_bbox()}")
Trajectory created: vehicle_4720
Number of points: 278
Start time: 2026-01-26 16:03:21
End time: 2026-01-26 18:11:58
Duration: 2:08:37
Length: 21342 meters
Bounding box: (-2.984026, 53.407466, -2.893983, 53.462305)

4. Detect and Handle Gaps¶

Public transport vehicles have operational gaps:

  • Layovers at terminals
  • Depot movements
  • Service breaks

We'll split the trajectory at significant time gaps (>5 minutes).

In [5]:
# Split trajectory at gaps > 5 minutes
gap_splitter = mpd.ObservationGapSplitter(traj)
segments = gap_splitter.split(gap=pd.Timedelta('5min'))

print(f"Original trajectory split into {len(segments)} segments\n")

for i, segment in enumerate(segments.trajectories):
    duration = segment.get_duration().total_seconds() / 60
    length = segment.get_length()
    print(f"Segment {i+1}:")
    print(f"  Points: {len(segment.df)}")
    print(f"  Duration: {duration:.1f} minutes")
    print(f"  Length: {length:.0f} meters")
    print(f"  Time: {segment.get_start_time().strftime('%H:%M:%S')} - {segment.get_end_time().strftime('%H:%M:%S')}")
    print()
Original trajectory split into 2 segments

Segment 1:
  Points: 131
  Duration: 46.5 minutes
  Length: 6327 meters
  Time: 16:03:21 - 16:49:48

Segment 2:
  Points: 147
  Duration: 51.3 minutes
  Length: 10048 meters
  Time: 17:20:40 - 18:11:58

5. Calculate Speed¶

Add speed information to the trajectory segments.

In [6]:
# Add speed to each segment
segments.add_speed(overwrite=True)
segments.add_speed(name="speed (mph)", units=('mi','h'), overwrite=True)

print("Speed statistics by segment:\n")

for i, segment in enumerate(segments.trajectories):
    speeds_ms = segment.df['speed']
    speeds_mph = segment.df['speed (mph)']
    
    print(f"Segment {i+1}:")
    print(f"  Mean speed: {speeds_mph.mean():.1f} mph ({speeds_ms.mean():.1f} m/s)")
    print(f"  Max speed: {speeds_mph.max():.1f} mph ({speeds_ms.max():.1f} m/s)")
    print(f"  Min speed: {speeds_mph.min():.1f} mph ({speeds_ms.min():.1f} m/s)")
    print()
Speed statistics by segment:

Segment 1:
  Mean speed: 5.5 mph (2.5 m/s)
  Max speed: 28.5 mph (12.7 m/s)
  Min speed: 0.0 mph (0.0 m/s)

Segment 2:
  Mean speed: 8.3 mph (3.7 m/s)
  Max speed: 25.2 mph (11.2 m/s)
  Min speed: 0.0 mph (0.0 m/s)

6. Visualization¶

Static Plot¶

In [7]:
# Plot all segments
segments.plot(column='trip_id', legend=True).set_title(f'Bus Trajectory: Vehicle {vehicle_id}\nRoute 14 Outbound')
Out[7]:
Text(0.5, 1.0, 'Bus Trajectory: Vehicle 4720\nRoute 14 Outbound')
No description has been provided for this image

Interactive Map with Speed Colors¶

In [8]:
# Interactive visualization using explore()
segments.explore(
    column='speed',
    cmap='RdYlGn',
    style_kwds={'weight': 5},
    tiles='CartoDB positron'
)
Out[8]:
Make this Notebook Trusted to load map: File -> Trust Notebook

7. Speed Profile Analysis¶

In [9]:
# Plot speed over time for all segments
all_data = pd.concat([
    seg.df[['speed (mph)']].assign(segment=f'Segment {i+1}') 
    for i, seg in enumerate(segments.trajectories)
])

# Interactive time series plot
all_data.hvplot.line(
    y='speed (mph)',
    by='segment',
    title=f'Speed Profile: Vehicle {vehicle_id}',
    xlabel='Time',
    ylabel='Speed (mph)',
    width=900,
    height=400,
    legend='top_left'
)
Out[9]:
In [10]:
# Speed distribution - interactive
all_data.hvplot.hist(
    y='speed (mph)',
    bins=30,
    title='Speed Distribution',
    xlabel='Speed (mph)',
    ylabel='Frequency',
    width=450,
    height=400
) + all_data.hvplot.box(
    y='speed (mph)',
    by='segment',
    title='Speed by Segment',
    ylabel='Speed (mph)',
    width=450,
    height=400
)
Out[10]:

8. Summary Statistics¶

In [11]:
print("=" * 70)
print(f"TRAJECTORY ANALYSIS SUMMARY: Vehicle {vehicle_id}")
print("=" * 70)

print(f"\n📊 OVERVIEW:")
print(f"   • Total GPS points: {len(traj.df)}")
print(f"   • Time span: {(traj.get_end_time() - traj.get_start_time()).total_seconds()/60:.0f} minutes")
print(f"   • Total distance: {traj.get_length():.0f} meters ({traj.get_length()/1609:.1f} miles)")

print(f"\n🔄 SEGMENTATION:")
print(f"   • Number of segments: {len(segments)}")
print(f"   • Gap threshold: 5 minutes")

operational_time = sum(seg.get_duration().total_seconds() for seg in segments.trajectories) / 60
total_time = (traj.get_end_time() - traj.get_start_time()).total_seconds() / 60
print(f"   • Operational time: {operational_time:.0f} minutes")
print(f"   • Layover/break time: {total_time - operational_time:.0f} minutes")

print(f"\n⚡ SPEED ANALYSIS:")
all_speeds_mph = pd.concat([seg.df['speed (mph)'] for seg in segments.trajectories])
print(f"   • Mean speed: {all_speeds_mph.mean():.1f} mph")
print(f"   • Max speed: {all_speeds_mph.max():.1f} mph")
print(f"   • Median speed: {all_speeds_mph.median():.1f} mph")
print(f"   • Std deviation: {all_speeds_mph.std():.1f} mph")

# Speed categories
stopped = len(all_speeds_mph[all_speeds_mph < 3])
slow = len(all_speeds_mph[(all_speeds_mph >= 3) & (all_speeds_mph < 10)])
medium = len(all_speeds_mph[(all_speeds_mph >= 10) & (all_speeds_mph < 20)])
fast = len(all_speeds_mph[all_speeds_mph >= 20])

print(f"\n🚦 SPEED DISTRIBUTION:")
print(f"   • Stopped (<3 mph): {stopped} points ({stopped/len(all_speeds_mph)*100:.1f}%)")
print(f"   • Slow (3-10 mph): {slow} points ({slow/len(all_speeds_mph)*100:.1f}%)")
print(f"   • Medium (10-20 mph): {medium} points ({medium/len(all_speeds_mph)*100:.1f}%)")
print(f"   • Fast (≥20 mph): {fast} points ({fast/len(all_speeds_mph)*100:.1f}%)")

print("\n" + "=" * 70)
======================================================================
TRAJECTORY ANALYSIS SUMMARY: Vehicle 4720
======================================================================

📊 OVERVIEW:
   • Total GPS points: 278
   • Time span: 129 minutes
   • Total distance: 21342 meters (13.3 miles)

🔄 SEGMENTATION:
   • Number of segments: 2
   • Gap threshold: 5 minutes
   • Operational time: 98 minutes
   • Layover/break time: 31 minutes

⚡ SPEED ANALYSIS:
   • Mean speed: 7.0 mph
   • Max speed: 28.5 mph
   • Median speed: 4.4 mph
   • Std deviation: 7.6 mph

🚦 SPEED DISTRIBUTION:
   • Stopped (<3 mph): 122 points (43.9%)
   • Slow (3-10 mph): 82 points (29.5%)
   • Medium (10-20 mph): 45 points (16.2%)
   • Fast (≥20 mph): 29 points (10.4%)

======================================================================

9. Extension: Multiple Vehicles¶

This example focused on a single vehicle for clarity. To analyze multiple vehicles:

In [12]:
tc = mpd.TrajectoryCollection(
    df, x='longitude', y='latitude', crs='EPSG:4326', 
    t='timestamp', traj_id_col='vehicle_id')

tc
Out[12]:
TrajectoryCollection with 8 trajectories

Limitations and Considerations¶

Data Characteristics¶

  • Raw GPS data without map matching
  • Sample includes partial trip (data collection window)
  • Demonstrates handling of incomplete trajectories
  • GPS points connected by straight lines (not road geometry)

For Production Analysis¶

Consider these enhancements:

  • Map matching: Align trajectories to road network (e.g., using OSMnx, Valhalla)
  • GTFS integration: Use scheduled route shapes for reference
  • Stop detection: Match GPS points to known bus stop locations
  • Quality filtering: Remove anomalous GPS points

MovingPandas Capabilities Demonstrated¶

✓ Trajectory creation from timestamped point data
✓ Temporal gap detection and splitting
✓ Speed calculation from GPS observations
✓ Trajectory metrics (length, duration, bounds)
✓ Visualization with geographic context