DetReIDX: A Stress-Test Dataset for Real-World UAV-Based Person Recognition

DetReIDX: Long Range Identification and recognition.

📰 News & Updates

NEW Version 1.1 released.
NEWPublished preprint on arXiv

Introduction

Hi there,

We are proud to introduce DetReIDX - is a new benchmark dataset built for real-world, long-range human recognition. It supports key computer vision tasks like person detection, re-identification (ReID), multi-view tracking, and action recognition — all captured in complex outdoor scenes using drones and ground cameras.

The dataset starts with an indoor session where each person is photographed from three angles — left, front, and right — and recorded while walking, enabling motion-based recognition like gait analysis.

In the outdoor sessions, drones capture videos from 18 viewpoints per subject, across different heights (up to 120m), distances, and camera angles (30°, 60°, 90°). Each person wears different outfits across sessions to simulate real-world variation.

Every frame is labeled with bounding boxes and 16 soft biometric attributes (like age, gender, clothing, action), offering fine-grained details for deep analysis.

With over 13+ million annotations and rich visual diversity, DetReIDX sets a new standard for evaluating human-centric AI in aerial and surveillance scenarios.

Figure 1: Comparison between the publicly available datasets (ground-ground, aerial-aerial, and aerial-ground) and the DetReIDX dataset, announced in this paper. Unlike its counterparts, DetReIDX includes clothing variation, detection and tracking annotations, action labels, and wide aerial altitude coverage (5.8m–120m), making it well-suited for long-range surveillance tasks..

Dataset Statistics

DetReIDX is one of the most comprehensive UAV-based person identification datasets, featuring multi-altitude, multi-distance, and multi-session recordings.

509

Identities

13M+

Annotations

UAV Viewpoints

120m

Max Altitude

Collection Sites

Key Dataset Metrics

Altitude Range

5.8m to 120m

Provides unique perspective variation

Distance Range

10m to 120m

Tests capability at extreme distances

Cross-Sessions

2 Sessions

Different clothing & conditions

Research Challenges

DetReIDX exposes critical challenges in person recognition that are overlooked in traditional datasets but common in real-world UAV surveillance:

Extreme Scale Variation

Person ROIs range from full-HD indoor captures to sub-10px silhouettes at 120m altitude, testing resolution robustness.

Clothing Variation

Subjects wear different outfits across sessions, requiring models to learn identity beyond superficial appearance cues.

Viewpoint Diversity

18 unique UAV perspectives across three pitch angles (30°, 60°, 90°) challenge current view-specific approaches.

Cross-Domain Transfer

Aerial-to-ground matching requires bridging vastly different capture modalities and perspectives.

Occlusion & Blur

Real-world interference from motion blur, atmospheric conditions, and partial visibility.

Temporal Drift

Multi-day sessions with environmental changes test long-term recognition capabilities.

Comparison with Existing Datasets

DetReIDX significantly exceeds prior datasets in altitude span, viewpoint coverage, identity diversity, and annotation richness. The table below compares key features across benchmark datasets for person detection, ReID, tracking, and action recognition.

No.	Dataset	Camera	View	Format	Detection	Tracking	ReID	Search	Action	PIDs	BBox	Height (m)	Distance (m)
1	PRID-2011 ^[1]	UAV	Aerial	Still	✗	✗	✓	✗	✗	1581	40K	20~60	-
2	CUHK03 ^[2]	CCTV	Ground	Still	✗	✗	✓	✗	✗	1467	13K	-	-
3	iLIDS-VID ^[3]	CCTV	Ground	Video	✗	✗	✓	✗	✗	300	42K	-	-
4	MRP ^[4]	UAV	Aerial	Video	✓	✓	✓	✗	✗	28	4K	<10< /td>	-
5	PRAI-1581 ^[5]	UAV	Aerial	Still	✗	✗	✓	✗	✗	1581	39K	20~60	-
6	CSM ^[6]	Various	Aerial	Video	✗	✗	✓	✗	✗	1218	11M	-	-
7	Market1501 ^[7]	CCTV	Ground	Still	✓	✓	✓	✗	✗	1501	32.6K	<10< /td>	-
8	Mini-drone ^[8]	UAV	Aerial	Video	✓	✓	✗	✗	✓	-	>27K	<10< /td>	-
9	Mars ^[9]	CCTV	Ground	Video	✓	✓	✓	✗	✗	1261	20K	-	-
10	AVI ^[10]	UAV	Aerial	Still	✓	✓	✓	✓	✓	5124	10K	2~8	-
11	DUKEMTMC ^[11]	CCTV	Ground	Video	✓	✓	✓	✗	✗	1812	815K	-	-
12	iQIYI-VID ^[12]	Various	Aerial	Video	✓	✓	✓	✓	✗	5000	600K	-	-
13	DRone HIT ^[13]	UAV	Aerial	Still	✓	✗	✓	✓	✗	101	40K	-	-
14	LTCC ^[14]	CCTV	Ground	Still	✓	✗	✓	✓	✗	152	17K	-	-
15	P-DESTRE ^[15]	UAV	Aerial	Video	✓	✓	✓	✓	✓	269	>14.8M	5.8~6.7	-
16	UAVHuman ^[16]	UAV	Aerial	Still	✗	✗	✓	✗	✗	1144	41K	2~8	-
17	AG-ReID.v2 ^[17]	UAV + CCTV	Ground + Aerial	Still	✓	✓	✓	✓	✗	1615	100.6K	15~45	-
18	G2APS-ReID ^[18]	UAV + CCTV	Ground + Aerial	Still	✓	✓	✓	✓	✗	2788	200.8K	20~60	-
19	DetReIDX (Ours)	DSLR + UAV	Ground + Aerial	Video + Still	✓	✓	✓	✓	✓	334	13M	5~120	10~120

Data Collection Protocol

DetReIDX is built through a multi-institutional collaboration across Portugal, Angola, Turkey, and India. It captures both indoor and outdoor scenarios to support robust evaluation of person re-identification, tracking, action recognition, and gait analysis.

Data was collected using high-resolution drones (e.g., DJI Phantom 4) and DSLR cameras under diverse altitudes (5–120m), distances (10–120m), and pitch angles (30°, 60°, 90°), across controlled labs and open campuses.

Setting	Environment	Altitude	Distance	Data Type
Indoor	Lab	Ground	Close	Profile Images, Gait Videos
Outdoor	Campus	5–120m	10–120m	Multi-view Videos, Action Clips

Our data collection process consists of two major phases:

🔴 Indoor Data Collection:
- Each subject signs a consent form before participation.
- Profile pictures are taken from three angles: Right (R), Front (F), and Left (L),.
- Full-body visibility is ensured.
- Subjects walk naturally while gait videos are recorded.
🔵 Outdoor Data Collection:
- Data is collected across **multiple locations** in **different countries**.
- Participants are required to wear **different outfits** in each session.
- Outdoor collection is **split into two sessions**, each containing **18 data points**.
- Each session involves **various distances and heights**, ensuring diversity in data.

All data is collected using **drones** for precise and consistent data gathering.

🔴 1. Indoor Data Collection

The indoor data collection consists of capturing three profile images of each subject: Right (R), Front (F), and Left (L), ensuring full-body visibility.

Additionally, gait videos are captured to analyze the subject's movement patterns.

🎥 Watch how our data collection works!

🔵 2. Outdoor Data Collection

The outdoor collection is conducted in two sessions, where participants wear different outfits in each session. Both sessions have 18 collection points, covering various distances and heights.

Session 1			Session 2
Point	Distance (m)	Height (m)	Point	Distance (m)	Height (m)
1	10	5.8	1	10	5.8
2	20	11.5	2	20	11.5
3	30	17.3	3	30	17.3
4	40	23.1	4	40	23.1
5	80	40	5	80	40
6	120	60	6	120	60
7	10	15	7	10	15
8	20	30	8	20	30
9	30	45	9	30	45
10	40	60	10	40	60
11	80	75	11	80	75
12	120	90	12	120	90
13	00	10	13	00	10
14	00	20	14	00	20
15	00	30	15	00	30
16	00	40	16	00	40
17	00	80	17	00	80
18	00	120	18	00	120

University Contributions

📌 Annotation Protocol

The DetReIDX dataset includes over 13 million manually annotated bounding boxes for 509 unique identities, collected across 36 UAV viewpoints per subject and indoor references. All annotations were performed using CVAT and verified by multiple annotators to ensure frame-level accuracy and identity consistency.

✅ Annotation Tool: CVAT
✅ Total Bounding Boxes: 13M+
✅ Identity Count: 509 subjects
✅ Viewpoints: 18 per session × 2 sessions = 36 drone captures per subject
✅ Label Types: Bounding boxes, Tracking IDs, Action Labels

🎯 Soft Biometric Attributes

Each subject is annotated with 16 soft biometric features, including demographic (e.g., age, gender), appearance (e.g., hair, clothing, accessories), and physical traits (e.g., height, body type). These enable fine-grained analysis beyond just visual appearance.

👤 Demographics: Age, Gender, Ethnicity
🧥 Appearance: Upper/Lower clothing, Glasses, Hair Style
🧬 Physical Traits: Height, Body Volume, Facial Hair
🎬 Activity: Action class labels (walking, standing, etc.)

#	Attribute	Values / Description
1	Gender	0: Male, 1: Female, 2: Unknown
2	Age	0-11, 12-17, 18-24, 25-34, 35-44, 45-54, 55-64, >65, Unknown
3	Height	0: Child, 1: Short, 2: Medium, 3: Tall, 4: Unknown
4	Body Volume (Weight)	0: Thin, 1: Medium, 2: Fat, 3: Unknown
5	Ethnicity	0: White, 1: Black, 2: Asian, 3: Indian, 4: Unknown
6	Hair Color	0: Black, 1: Brown, 2: White, 3: Red, 4: Gray, 5: Occluded, 6: Unknown
7	Hairstyle	0: Bald, 1: Short, 2: Medium, 3: Long, 4: Horse Tail, 5: Unknown
8	Beard	0: Yes, 1: No, 2: Unknown
9	Moustache	0: Yes, 1: No, 2: Unknown
10	Glasses	0: Normal Glass, 1: Sun Glass, 2: No, 3: Unknown
11	Head Accessories	0: Hat, 1: Scarf, 2: Necklace, 3: Cannot see, 4: Unknown
12	Upper Body Clothing	0: T-Shirt, 1: Blouse, 2: Sweater, 3: Coat, 4: Bikini, 5: Naked, 6: Dress, 7: Uniform, 8: Shirt, 9: Suit, 10: Hoodie, 11: Cardigan, 12: Unknown
13	Lower Body Clothing	0: Jeans, 1: Leggings, 2: Pants, 3: Shorts, 4: Skirt, 5: Bikini, 6: Dress, 7: Uniform, 8: Suit, 9: Unknown
14	Feet	0: Sport Shoe, 1: Classic Shoe, 2: High Heels, 3: Boots, 4: Sandal, 5: Nothing, 6: Unknown
15	Accessories	0: Bag, 1: Backpack Bag, 2: Rolling Bag, 3: Umbrella, 4: Sport Bag, 5: Market Bag, 6: Nothing, 7: Unknown
16	Action	0: Walking, 1: Running, 2: Standing, 3: Sitting, 4: Cycling, 5: Exercising, 6: Petting, 7: Talking on Phone, 8: Leaving Bag, 9: Fall, 10: Fighting, 11: Dating, 12: Offending, 13: Trading

Dataset Statistics

Detection Dataset Statistics

Split	#Videos	#Images	#Annotations	Formats
Train	120	131,580	5,095,539	YOLO, COCO
Validation	56	63,591	2,483,836	YOLO, COCO
Test	109	108,252	4,217,824	YOLO, COCO
Total	285	303,423	11,797,199

🧠 ReID Dataset Statistics

Scenario	#Query	#Gallery	Total Images
Train (Indoor + UAV)	--	--	289,392
A2A (UAV → UAV)	52,926	52,552	105,478
A2G (UAV → Indoor)	106,927	7,959	114,886
G2A (Indoor → UAV)	7,959	106,927	114,886

📊 Experimental Setup

Explanation of Person Detection Experimental Setup

We conducted several experiments to evaluate the robustness and generalization of person detection models under varying conditions.

Baseline: The model was trained on the entire dataset and tested on the complete test set. This serves as a performance reference point.
Interpolation: Training was performed on data with 30° and 90° pitch angles. The model was then tested on the unseen 60° angle to check its ability to interpolate between known views.
Extrapolation: The model was trained on 30° and 60° angles and tested on 90° — evaluating its ability to generalize to extreme angles it has not seen during training.
Distance Splits (D1, D2, D3): We computed aerial distance using the Pythagorean theorem from altitude and ground distance.
Based on the resulting aerial distance:
- D1: < 20 meters
- D2: 20–50 meters
- D3: > 50 meters
Models were trained separately on each distance bin and tested across D1, D2, and D3 to assess performance under increasing long-range conditions.

The table below compares DetReIDX with other benchmark datasets for person detection. It highlights modalities, view types, and complexity levels in terms of height, distance, and number of identities.

Experiment	Train	Test	AP50
Baseline	ALL	ALL	0.734
Interpolation	30° & 90°	60°	0.669 ↓ (-8.86%)
Extrapolation	30° & 60°	90°	0.503 ↓ (-31.5%)
D1	D1	D1	0.914 ↑ (+24.5%)
	D1	D2	0.793 ↑ (+8.04%)
	D1	D3	0.137 ↓ (-81.3%)
D2	D2	D1	0.694 ↓ (-5.45%)
	D2	D2	0.890 ↑ (+21.2%)
	D2	D3	0.315 ↓ (-57.1%)
D3	D3	D1	0.015 ↓ (-97.9%)
	D3	D2	0.411 ↓ (-44.0%)
	D3	D3	0.581 ↓ (-20.8%)

Experiment	Train	Test	AP50
Baseline	ALL	ALL	0.608
Interpolation	30 & 90	60	0.564 ↓ (-7.24%)
Extrapolation	30 & 60	90	0.474 ↓ (-22.04%)
D1	D1	D1	0.857 ↑ (+40.9%)
	D1	D2	0.380 ↓ (-37.5%)
	D1	D3	0.008 ↓ (-98.7%)
D2	D2	D1	0.582 ↓ (-4.28%)
	D2	D2	0.776 ↑ (+27.6%)
	D2	D3	0.111 ↓ (-81.75%)
D3	D3	D1	0.004 ↓ (-99.3%)
	D3	D2	0.274 ↓ (-54.9%)
	D3	D3	0.408 ↓ (-32.9%)

Experiment	Train	Test	AP50
Baseline	ALL	ALL	0.620
Interpolation	30 & 90	60	0.514 ↓ (-17.1%)
Extrapolation	30 & 60	90	0.403 ↓ (-35.0%)
D1	D1	D1	0.839 ↑ (+35.3%)
	D1	D2	0.428 ↓ (-30.9%)
	D1	D3	0.009 ↓ (-98.5%)
D2	D2	D1	0.668 ↑ (+7.74%)
	D2	D2	0.770 ↑ (+24.2%)
	D2	D3	0.150 ↓ (-75.8%)
D3	D3	D1	0.002 ↓ (-99.7%)
	D3	D2	0.261 ↓ (-57.9%)
	D3	D3	0.280 ↓ (-54.8%)

Person Re-Identification Results

The table below compares performance of PersonViT, SeCap, and CLIP-ReID on the DetReIDX-ReID dataset across different aerial and ground viewpoints. Metrics include mAP and CMC curves at Rank-1, Rank-5, and Rank-10.

Experiment	mAP	Rank-1	Rank-5	Rank-10
Aerial → Aerial (LongTerm)	9.9%	8.8%	14.4%	17.6%
Aerial → Ground	22.3%	19.6%	24.8%	27.6%
Ground → Aerial	23.3%	51.9%	59.4%	63.0%

Experiment	mAP	Rank-1	Rank-5	Rank-10
Aerial → Aerial (LongTerm)	11.16%	8.20%	13.03%	16.16%
Aerial → Ground	20.49%	18.08%	21.50%	23.43%
Ground → Aerial	21.23%	50.89%	57.68%	60.72%

Experiment	mAP	Rank-1	Rank-5	Rank-10
Aerial → Aerial (LongTerm)	9.5%	8.9%	12.8%	15.3%
Aerial → Ground	22.0%	19.7%	24.0%	26.2%
Ground → Aerial	20.8%	58.1%	63.1%	65.2%

Figure: Distribution of image resolutions (width × height) captured at different aerial distances. Each dot represents one image. Colors indicate image size relative to the mean area: Small, Medium, and Large. Min and Max images are marked with blue "X" and red "P" respectively.

📥 Dataset Access & Downloads

The DetReIDX dataset is organized task-wise, including separate partitions for detection, tracking, ReID, and action recognition. Access is strictly limited to researchers from academic/government institutions. Only the requested module will be shared. To access the full dataset, you must submit separate access requests for each module.

⚠️ The dataset is delivered as a password-protected ZIP file unique to each user. Sharing this password or redistributing content is strictly prohibited. Access is granted only to the requesting user — other researchers from the same institute must submit their own requests.

Request Dataset Access

🔹 Institutional email (e.g., @university.edu) is required for access.
🔹 Signed access request letter on official letterhead (PDF/scan)
🔹 Completed request form
🔹 To request access, email the following to Kailash A. Hambarde (n1947@ubi.pt or kailas.srt@gmail.com):

Download License + Request Form

Acknowledgements

We acknowledge and give credit to the following universities for their contributions:
Istanbul Medipol University, J.N.N College of Engineering, SRM Institute of Science and Technology, Swami Ramanand Teerth Marathwada University, Nanded, Universidade Beira Interior, Universidade de Luanda.
(Sorted in A-Z order)

BibTeX

                                @article{hambarde2025detreidx,
                                    title={DetReIDX: A Stress-Test Dataset for Real-World UAV-Based Person Recognition},
                                    author={Hambarde, Kailash and Proen\c{c}a, Hugo and Fernandes, Carolina and MP, Pavan Kumar and Mbongo, Nzakiese and Mekewad, Satishkumar and Silahtaroglu, G{\"o}khan and Nithya, Alice A. and Wasnik, Pawan and Rashidunnabi, Md and Samale, Pranita},
                                    journal={arXiv preprint arXiv:2505.04793},
                                    year={2025},
                                    url={https://arxiv.org/abs/2505.04793}
                                  }