Efficient Multimodal Feature Refinement via Adaptive RGB-IR Interaction for Robust Drone Detection and Classification
Abstract
The rapid proliferation of unmanned aerial vehicles (UAVs) has intensified the need for robust surveillance systems capable of distinguishing drones from biological entities like birds in unpredictable environments. While multispectral vision provides a resilient alternative to uni-modal sensors under adverse weather and lighting, existing architectures often struggle with cross-modal feature alignment and noise-induced spatial distortions. This paper proposes Multispectral Attention Context and Receptive-field Network (MACR-Net), an ultra-lightweight multimodal framework designed for high-precision drone detection. MACR-Net introduces a Global-Local Cross-Scale Interaction (GLCI) module to capture multi-scale semantic context and a Multimodal Spatial Cross-Perception (MSCP) mechanism to adaptively fuse RGB-IR streams while preserving target-specific thermal and structural signatures. Furthermore, we design an improved hybrid neck integrating Coordinate-Aware Attention (CAA) and Receptive Field Deformable (RFD) modules to anchor precise spatial coordinates and mitigate geometric distortions. Experimental results on the benchmark Multimodal Drone Detection Dataset demonstrate that MACR-Net outperforms state-of-the-art models, achieving a peak mAP_50 of 91.13% and a significant mAP_50-95 of 65.77%. Remarkably, the architecture maintains an extremely compact footprint with only 2.77M parameters and 0.77 GFLOPs, establishing an optimal balance between superior detection robustness and real-time feasibility for resource-constrained edge deployment.
DOI: http://dx.doi.org/10.21553/rev-jec.451
Copyright (c) 2026 REV Journal on Electronics and Communications
ISSN: 1859-378X Copyright © 2011-2025 |
|