Revolutionizing Travel-Time Estimation: Lightweight Random Forest Model

Global: New Study Proposes Lightweight Random Forest Model for Accurate Low-Congestion Travel-Time Estimation

A recent study published on arXiv on January 12, 2026 introduces a lightweight estimator designed to predict car travel times under minimally congested conditions. The research was conducted by a team of transportation engineers who aimed to address scalability challenges associated with data‑intensive congestion models and simplistic heuristics. By leveraging open road‑network data and sparse operational features, the authors sought to provide a practical alternative for engineering workflows that lack comprehensive congestion feeds.

Methodology Overview

The proposed pipeline follows four sequential steps: constructing drivable networks from volunteered geographic data, solving shortest‑time routes using Dijkstra’s algorithm, extracting a set of sparse control and turn features, and training a random‑forest regression ensemble to correct bias inherent in traversal‑time baselines.

Network Construction

Researchers transformed publicly contributed geographic information into a directed graph representing drivable road segments. Each edge was annotated with speed constraints derived from posted limits, enabling realistic traversal‑time calculations without requiring proprietary traffic datasets.

Route Optimization

For any origin‑destination pair, the system computes the least‑time path by applying Dijkstra’s algorithm to the weighted graph, where edge weights correspond to estimated travel time based on speed limits rather than Euclidean distance.

Feature Engineering

Along each computed route, the model aggregates sparse operational attributes, including counts of traffic signals, stop signs, pedestrian crossings, yield signs, roundabouts, and various turn maneuvers (left, right, slight, U‑turn). These features capture localized control effects that influence travel time beyond pure speed limits.

Model Training

A random‑forest regression ensemble is trained on a limited set of high‑quality reference travel times collected under low‑traffic conditions. The ensemble learns to adjust the baseline traversal‑time estimates by accounting for the extracted control features, thereby reducing systematic bias.

Evaluation Results

Out‑of‑sample testing on an urban testbed demonstrates marked improvements over the baseline across multiple metrics: mean absolute error, mean absolute percentage error, mean squared error, relative bias, and explained variance. The study reports no statistically significant mean bias under minimally congested scenarios and observes consistent k‑fold cross‑validation stability, indicating negligible overfitting.

Practical Implications

The approach offers a middle ground for transportation planning by preserving point‑to‑point fidelity at metropolitan scale while requiring far fewer data resources than full‑scale congestion models. Consequently, it can support accessibility analyses, network performance assessments, and planning efforts in regions where detailed traffic feeds are unavailable or cost‑prohibitive.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.

New Study Proposes Lightweight Random Forest Model for Accurate Low-Congestion Travel-Time Estimation