NaviLoc: Trajectory-Level Visual Localization for GNSS-Denied UAV Navigation

The problem

Drones rely on GPS to know where they are. When GPS signals get jammed or spoofed — which is now routine in conflict zones — drones lose their position fix. They can’t navigate, can’t return home, can’t complete their mission.

One alternative: use the drone’s camera to match what it sees on the ground against satellite imagery and figure out its location visually. The problem is that at low altitudes (50–150 meters), the drone’s view looks very different from satellite photos. Fields, roads, and buildings repeat across the landscape. A single camera frame can easily match the wrong satellite tile — the system thinks the drone is in one place when it’s actually somewhere else.

Existing methods try to match each frame independently. They fail because of this perceptual aliasing — too many locations look alike from above.

What NaviLoc does differently

NaviLoc doesn’t trust individual frame matches. Instead, it uses the drone’s entire flight trajectory to determine position.

The system combines two sources of information. First: visual place recognition (VPR) — matching camera frames to geo-referenced satellite tiles using deep learning descriptors. These matches are treated as noisy measurements, not ground truth. Second: visual-inertial odometry (VIO) — tracking relative motion between frames using the camera and an IMU (accelerometer + gyroscope). VIO is accurate for short distances but drifts over time.

Neither source is reliable alone. VPR gives approximate global position but frequently matches wrong tiles. VIO gives precise relative motion but accumulates drift. NaviLoc fuses them across the full trajectory so their errors cancel out.

Three stages

Stage 1 — Global Align. The VIO trajectory exists in a local coordinate frame — the drone knows how it moved, but not where in the world it started or which direction it was facing. Stage 1 finds the rotation, translation, and scale that best align this local trajectory to the satellite map. It does this by scanning candidate rotations, computing a robust median translation for each, and picking the alignment where the most camera frames match nearby satellite tiles. The median makes this robust: even if half the frame matches are wrong, the correct ones still dominate.

Stage 2 — Refinement. The global alignment from Stage 1 is coarse. Stage 2 slides a window along the trajectory and applies local geometric corrections using weighted Procrustes analysis. Each window recomputes which satellite tiles best match the camera frames at the current estimated positions, then adjusts the local trajectory segment. A rotation bound prevents overcorrection from noisy matches. Multiple passes converge in 2–3 iterations.

Stage 3 — Smoothing. The final stage formulates a convex optimization problem that fuses VIO motion constraints with the refined position anchors from Stage 2. It detects outlier anchors using z-scores on their visual similarity — if a match looks statistically worse than the others, it gets clamped to the VIO prior instead. The result is a closed-form linear solve that produces the final trajectory estimate.

Satellite map preparation

Before flight, satellite imagery of the operating area is downloaded and divided into a grid of tiles (40m spacing in our benchmark). Each tile is processed through a Vision Transformer (DeiT-Tiny-Distilled) to extract a 192-dimensional descriptor — a compact numerical fingerprint of its visual content. These are stored in a reference database. On our benchmark, 462 tiles cover 1.6 km² at 0.3 m/pixel resolution.

During flight, the same model processes each camera frame to produce its descriptor. Matching is then a nearest-neighbor search in descriptor space — fast enough to run in real time on embedded hardware.

Results

We evaluated NaviLoc on a real-world benchmark: 58 frames from a 2.3 km flight over rural terrain in Ukraine at 50–150m altitude. The terrain is challenging — repetitive agricultural fields and village patterns with few distinctive landmarks.

19.5m mean localization error — Over a 2.3 km trajectory. 16x more accurate than the previous best method (AnyLoc-VLAD, 312m).
32x better than raw VIO drift — VIO alone drifted to 627m error. NaviLoc corrects this to under 20m.
9 FPS on Raspberry Pi 5 — End-to-end inference on a $80 single-board computer. No GPU, no cloud, no internet required.
Training-free — Uses off-the-shelf pretrained descriptors. No domain-specific fine-tuning needed for new areas.

To put this in context: 19.5 meters of accuracy over a multi-kilometer flight, at low altitude, over visually repetitive rural terrain, with no GPS, running on a credit-card-sized computer. The previous state-of-the-art scored 312 meters on the same benchmark.

Why it matters

The core contribution is the trajectory-level approach. Existing methods match frames independently and fail when individual matches are ambiguous. NaviLoc shows that by treating visual matches as noisy measurements and fusing them with motion data across the full flight path, you can get reliable positioning even when most individual matches would be wrong.

The algorithm is lightweight and training-free — it works with any pretrained image descriptor, on any terrain, without collecting domain-specific training data. This makes it practical to deploy in new areas without preparation beyond downloading satellite imagery.

NaviLoc is the research foundation behind NaviX, our GPS-free navigation product currently in closed testing. NaviX takes these algorithms further — optimizing for production hardware, expanding the environmental envelope, and integrating with real autopilot systems.

NaviLoc powers NaviX, our GPS-free navigation product currently in closed testing.

Learn about NaviX →