DetReIDX: Long Range Identification and recognition.
📰 News & Updates
    • NEW Version 1.1 released.
    • NEWPublished preprint on arXiv

Introduction

Hi there,

We are proud to introduce DetReIDX - is a new benchmark dataset built for real-world, long-range human recognition. It supports key computer vision tasks like person detection, re-identification (ReID), multi-view tracking, and action recognition — all captured in complex outdoor scenes using drones and ground cameras.

The dataset starts with an indoor session where each person is photographed from three angles — left, front, and right — and recorded while walking, enabling motion-based recognition like gait analysis.

In the outdoor sessions, drones capture videos from 18 viewpoints per subject, across different heights (up to 120m), distances, and camera angles (30°, 60°, 90°). Each person wears different outfits across sessions to simulate real-world variation.

Every frame is labeled with bounding boxes and 16 soft biometric attributes (like age, gender, clothing, action), offering fine-grained details for deep analysis.

With over 13+ million annotations and rich visual diversity, DetReIDX sets a new standard for evaluating human-centric AI in aerial and surveillance scenarios.

DetReIDX Overview

Figure 1: Comparison between the publicly available datasets (ground-ground, aerial-aerial, and aerial-ground) and the DetReIDX dataset, announced in this paper. Unlike its counterparts, DetReIDX includes clothing variation, detection and tracking annotations, action labels, and wide aerial altitude coverage (5.8m–120m), making it well-suited for long-range surveillance tasks..


Dataset Statistics

DetReIDX is one of the most comprehensive UAV-based person identification datasets, featuring multi-altitude, multi-distance, and multi-session recordings.

509
Identities
13M+
Annotations
18
UAV Viewpoints
120m
Max Altitude
7
Collection Sites

Key Dataset Metrics
Altitude Range

5.8m to 120m

Provides unique perspective variation
Distance Range

10m to 120m

Tests capability at extreme distances
Cross-Sessions

2 Sessions

Different clothing & conditions

Research Challenges

DetReIDX exposes critical challenges in person recognition that are overlooked in traditional datasets but common in real-world UAV surveillance:

Extreme Scale Variation

Person ROIs range from full-HD indoor captures to sub-10px silhouettes at 120m altitude, testing resolution robustness.

Clothing Variation

Subjects wear different outfits across sessions, requiring models to learn identity beyond superficial appearance cues.

Viewpoint Diversity

18 unique UAV perspectives across three pitch angles (30°, 60°, 90°) challenge current view-specific approaches.

Cross-Domain Transfer

Aerial-to-ground matching requires bridging vastly different capture modalities and perspectives.

Occlusion & Blur

Real-world interference from motion blur, atmospheric conditions, and partial visibility.

Temporal Drift

Multi-day sessions with environmental changes test long-term recognition capabilities.

Comparison with Existing Datasets

DetReIDX significantly exceeds prior datasets in altitude span, viewpoint coverage, identity diversity, and annotation richness. The table below compares key features across benchmark datasets for person detection, ReID, tracking, and action recognition.

No. Dataset Camera View Format Detection Tracking ReID Search Action PIDs BBox Height (m) Distance (m)
1 PRID-2011 [1] UAV Aerial Still 1581 40K 20~60 -
2 CUHK03 [2] CCTV Ground Still 1467 13K - -
3 iLIDS-VID [3] CCTV Ground Video 300 42K - -
4 MRP [4] UAV Aerial Video 28 4K <10< /td> -
5 PRAI-1581 [5] UAV Aerial Still 1581 39K 20~60 -
6 CSM [6] Various Aerial Video 1218 11M - -
7 Market1501 [7] CCTV Ground Still 1501 32.6K <10< /td> -
8 Mini-drone [8] UAV Aerial Video - >27K <10< /td> -
9 Mars [9] CCTV Ground Video 1261 20K - -
10 AVI [10] UAV Aerial Still 5124 10K 2~8 -
11 DUKEMTMC [11] CCTV Ground Video 1812 815K - -
12 iQIYI-VID [12] Various Aerial Video 5000 600K - -
13 DRone HIT [13] UAV Aerial Still 101 40K - -
14 LTCC [14] CCTV Ground Still 152 17K - -
15 P-DESTRE [15] UAV Aerial Video 269 >14.8M 5.8~6.7 -
16 UAVHuman [16] UAV Aerial Still 1144 41K 2~8 -
17 AG-ReID.v2 [17] UAV + CCTV Ground + Aerial Still 1615 100.6K 15~45 -
18 G2APS-ReID [18] UAV + CCTV Ground + Aerial Still 2788 200.8K 20~60 -
19 DetReIDX (Ours) DSLR + UAV Ground + Aerial Video + Still 334 13M 5~120 10~120

Data Collection Protocol

DetReIDX is built through a multi-institutional collaboration across Portugal, Angola, Turkey, and India. It captures both indoor and outdoor scenarios to support robust evaluation of person re-identification, tracking, action recognition, and gait analysis.

Data was collected using high-resolution drones (e.g., DJI Phantom 4) and DSLR cameras under diverse altitudes (5–120m), distances (10–120m), and pitch angles (30°, 60°, 90°), across controlled labs and open campuses.

Setting Environment Altitude Distance Data Type
Indoor Lab Ground Close Profile Images, Gait Videos
Outdoor Campus 5–120m 10–120m Multi-view Videos, Action Clips

Our data collection process consists of two major phases:

  • 🔴 Indoor Data Collection:
    • Each subject signs a consent form before participation.
    • Profile pictures are taken from three angles: Right (R), Front (F), and Left (L),.
    • Full-body visibility is ensured.
    • Subjects walk naturally while gait videos are recorded.
  • 🔵 Outdoor Data Collection:
    • Data is collected across **multiple locations** in **different countries**.
    • Participants are required to wear **different outfits** in each session.
    • Outdoor collection is **split into two sessions**, each containing **18 data points**.
    • Each session involves **various distances and heights**, ensuring diversity in data.

All data is collected using **drones** for precise and consistent data gathering.


🔴 1. Indoor Data Collection

The indoor data collection consists of capturing three profile images of each subject: Right (R), Front (F), and Left (L), ensuring full-body visibility.

Additionally, gait videos are captured to analyze the subject's movement patterns.

🎥 Watch how our data collection works!


🔵 2. Outdoor Data Collection

The outdoor collection is conducted in two sessions, where participants wear different outfits in each session. Both sessions have 18 collection points, covering various distances and heights.

Session 1 Session 2
Point Distance (m) Height (m) Point Distance (m) Height (m)
1 10 5.8 1 10 5.8
2 20 11.5 2 20 11.5
3 30 17.3 3 30 17.3
4 40 23.1 4 40 23.1
5 80 40 5 80 40
6 120 60 6 120 60
7 10 15 7 10 15
8 20 30 8 20 30
9 30 45 9 30 45
10 40 60 10 40 60
11 80 75 11 80 75
12 120 90 12 120 90
13 00 10 13 00 10
14 00 20 14 00 20
15 00 30 15 00 30
16 00 40 16 00 40
17 00 80 17 00 80
18 00 120 18 00 120






University Contributions





📌 Annotation Protocol

The DetReIDX dataset includes over 13 million manually annotated bounding boxes for 509 unique identities, collected across 36 UAV viewpoints per subject and indoor references. All annotations were performed using CVAT and verified by multiple annotators to ensure frame-level accuracy and identity consistency.

  • Annotation Tool: CVAT
  • Total Bounding Boxes: 13M+
  • Identity Count: 509 subjects
  • Viewpoints: 18 per session × 2 sessions = 36 drone captures per subject
  • Label Types: Bounding boxes, Tracking IDs, Action Labels

🎯 Soft Biometric Attributes

Each subject is annotated with 16 soft biometric features, including demographic (e.g., age, gender), appearance (e.g., hair, clothing, accessories), and physical traits (e.g., height, body type). These enable fine-grained analysis beyond just visual appearance.

  • 👤 Demographics: Age, Gender, Ethnicity
  • 🧥 Appearance: Upper/Lower clothing, Glasses, Hair Style
  • 🧬 Physical Traits: Height, Body Volume, Facial Hair
  • 🎬 Activity: Action class labels (walking, standing, etc.)

<
# Attribute Values / Description
1Gender0: Male, 1: Female, 2: Unknown
2Age0-11, 12-17, 18-24, 25-34, 35-44, 45-54, 55-64, >65, Unknown
3Height0: Child, 1: Short, 2: Medium, 3: Tall, 4: Unknown
4Body Volume (Weight)0: Thin, 1: Medium, 2: Fat, 3: Unknown
5Ethnicity0: White, 1: Black, 2: Asian, 3: Indian, 4: Unknown
6Hair Color0: Black, 1: Brown, 2: White, 3: Red, 4: Gray, 5: Occluded, 6: Unknown
7Hairstyle0: Bald, 1: Short, 2: Medium, 3: Long, 4: Horse Tail, 5: Unknown
8Beard0: Yes, 1: No, 2: Unknown
9Moustache0: Yes, 1: No, 2: Unknown
10Glasses0: Normal Glass, 1: Sun Glass, 2: No, 3: Unknown
11Head Accessories0: Hat, 1: Scarf, 2: Necklace, 3: Cannot see, 4: Unknown
12Upper Body Clothing0: T-Shirt, 1: Blouse, 2: Sweater, 3: Coat, 4: Bikini, 5: Naked, 6: Dress, 7: Uniform, 8: Shirt, 9: Suit, 10: Hoodie, 11: Cardigan, 12: Unknown
13Lower Body Clothing0: Jeans, 1: Leggings, 2: Pants, 3: Shorts, 4: Skirt, 5: Bikini, 6: Dress, 7: Uniform, 8: Suit, 9: Unknown
14Feet0: Sport Shoe, 1: Classic Shoe, 2: High Heels, 3: Boots, 4: Sandal, 5: Nothing, 6: Unknown
15Accessories0: Bag, 1: Backpack Bag, 2: Rolling Bag, 3: Umbrella, 4: Sport Bag, 5: Market Bag, 6: Nothing, 7: Unknown
16Action0: Walking, 1: Running, 2: Standing, 3: Sitting, 4: Cycling, 5: Exercising, 6: Petting, 7: Talking on Phone, 8: Leaving Bag, 9: Fall, 10: Fighting, 11: Dating, 12: Offending, 13: Trading

Dataset Statistics
Detection Dataset Statistics
Split #Videos #Images #Annotations Formats
Train 120 131,580 5,095,539 YOLO, COCO
Validation 56 63,591 2,483,836 YOLO, COCO
Test 109 108,252 4,217,824 YOLO, COCO
Total 285 303,423 11,797,199
🧠 ReID Dataset Statistics
Scenario #Query #Gallery Total Images
Train (Indoor + UAV) -- -- 289,392
A2A (UAV → UAV) 52,926 52,552 105,478
A2G (UAV → Indoor) 106,927 7,959 114,886
G2A (Indoor → UAV) 7,959 106,927 114,886

📊 Experimental Setup

Explanation of Person Detection Experimental Setup

We conducted several experiments to evaluate the robustness and generalization of person detection models under varying conditions.

  • Baseline: The model was trained on the entire dataset and tested on the complete test set. This serves as a performance reference point.
  • Interpolation: Training was performed on data with 30° and 90° pitch angles. The model was then tested on the unseen 60° angle to check its ability to interpolate between known views.
  • Extrapolation: The model was trained on 30° and 60° angles and tested on 90° — evaluating its ability to generalize to extreme angles it has not seen during training.
  • Distance Splits (D1, D2, D3): We computed aerial distance using the Pythagorean theorem from altitude and ground distance.
    Based on the resulting aerial distance:
    • D1: < 20 meters
    • D2: 20–50 meters
    • D3: > 50 meters
    Models were trained separately on each distance bin and tested across D1, D2, and D3 to assess performance under increasing long-range conditions.

The table below compares DetReIDX with other benchmark datasets for person detection. It highlights modalities, view types, and complexity levels in terms of height, distance, and number of identities.

Experiment Train Test AP50
Baseline ALL ALL 0.734
Interpolation 30° & 90° 60° 0.669 ↓ (-8.86%)
Extrapolation 30° & 60° 90° 0.503 ↓ (-31.5%)
D1 D1 D1 0.914 ↑ (+24.5%)
D1 D2 0.793 ↑ (+8.04%)
D1 D3 0.137 ↓ (-81.3%)
D2 D2 D1 0.694 ↓ (-5.45%)
D2 D2 0.890 ↑ (+21.2%)
D2 D3 0.315 ↓ (-57.1%)
D3 D3 D1 0.015 ↓ (-97.9%)
D3 D2 0.411 ↓ (-44.0%)
D3 D3 0.581 ↓ (-20.8%)
Experiment Train Test AP50
Baseline ALL ALL 0.608
Interpolation 30 & 90 60 0.564 ↓ (-7.24%)
Extrapolation 30 & 60 90 0.474 ↓ (-22.04%)
D1 D1 D1 0.857 ↑ (+40.9%)
D1 D2 0.380 ↓ (-37.5%)
D1 D3 0.008 ↓ (-98.7%)
D2 D2 D1 0.582 ↓ (-4.28%)
D2 D2 0.776 ↑ (+27.6%)
D2 D3 0.111 ↓ (-81.75%)
D3 D3 D1 0.004 ↓ (-99.3%)
D3 D2 0.274 ↓ (-54.9%)
D3 D3 0.408 ↓ (-32.9%)
Experiment Train Test AP50
Baseline ALL ALL 0.620
Interpolation 30 & 90 60 0.514 ↓ (-17.1%)
Extrapolation 30 & 60 90 0.403 ↓ (-35.0%)
D1 D1 D1 0.839 ↑ (+35.3%)
D1 D2 0.428 ↓ (-30.9%)
D1 D3 0.009 ↓ (-98.5%)
D2 D2 D1 0.668 ↑ (+7.74%)
D2 D2 0.770 ↑ (+24.2%)
D2 D3 0.150 ↓ (-75.8%)
D3 D3 D1 0.002 ↓ (-99.7%)
D3 D2 0.261 ↓ (-57.9%)
D3 D3 0.280 ↓ (-54.8%)

Person Re-Identification Results

The table below compares performance of PersonViT, SeCap, and CLIP-ReID on the DetReIDX-ReID dataset across different aerial and ground viewpoints. Metrics include mAP and CMC curves at Rank-1, Rank-5, and Rank-10.

Experiment mAP Rank-1 Rank-5 Rank-10
Aerial → Aerial (LongTerm) 9.9% 8.8% 14.4% 17.6%
Aerial → Ground 22.3% 19.6% 24.8% 27.6%
Ground → Aerial 23.3% 51.9% 59.4% 63.0%
Experiment mAP Rank-1 Rank-5 Rank-10
Aerial → Aerial (LongTerm) 11.16% 8.20% 13.03% 16.16%
Aerial → Ground 20.49% 18.08% 21.50% 23.43%
Ground → Aerial 21.23% 50.89% 57.68% 60.72%
Experiment mAP Rank-1 Rank-5 Rank-10
Aerial → Aerial (LongTerm) 9.5% 8.9% 12.8% 15.3%
Aerial → Ground 22.0% 19.7% 24.0% 26.2%
Ground → Aerial 20.8% 58.1% 63.1% 65.2%
Aerial Distance Distribution

Figure: Distribution of image resolutions (width × height) captured at different aerial distances. Each dot represents one image. Colors indicate image size relative to the mean area: Small, Medium, and Large. Min and Max images are marked with blue "X" and red "P" respectively.


📥 Dataset Access & Downloads

The DetReIDX dataset is organized task-wise, including separate partitions for detection, tracking, ReID, and action recognition. Access is strictly limited to researchers from academic/government institutions. Only the requested module will be shared. To access the full dataset, you must submit separate access requests for each module.

Request Dataset Access
Download License + Request Form

Acknowledgements

We acknowledge and give credit to the following universities for their contributions:
Istanbul Medipol University, J.N.N College of Engineering, SRM Institute of Science and Technology, Swami Ramanand Teerth Marathwada University, Nanded, Universidade Beira Interior, Universidade de Luanda.
(Sorted in A-Z order)


University Logos