DRL-SecRoute: A Synergetic Deep Reinforcement Learning Paradigm for Mitigating Byzantine Faults and SSDF Attacks through Heuristic Spectrum Cognizance in Next-Generation Cognitive Radio Networks

Cognitive radio wireless sensor networks (CR-WSNs) are particularly susceptible to 
routing vulnerabilities arising from dynamic-spectrum availability and sophisticated 
adversarial attacks, emphasizing the need for secure and efficient routing mecha
nisms. Current solutions address routing optimization, spectrum management, and 
security as independent tasks, leading to suboptimal performance and susceptibil
ity to Byzantine jamming, spectrum sensing data falsification (SSDF), and primary 
user emulation attacks (PUEA). This article introduces DRL-SecRoute (deep rein
forcement learning-based secure routing), a new unified secure routing framework 
that synergistically integrates deep reinforcement learning with adaptive spectrum 
sensing to address the multi-dimensional optimization problem of secure routing 
in dynamic CR-WSN environments. The key contributions are fourfold: (1) a twin
delayed deep deterministic policy gradient (TD3) algorithm enhanced with prior
itized experience replay (PER), specifically designed for continuous state-action 
spaces in CR-WSNs, achieving 40% faster convergence than existing discrete-action 
methods; (2) a hybrid adaptive spectrum sensing mechanism combining Bayesian 
inference with Kalman filtering that reduces sensing overhead by 37.4% while main
taining high prediction accuracy; (3) a lightweight multi-layered anomaly detection 
system integrating statistical divergence analysis with unsupervised learning (Isola
tion Forest) to detect diverse attacks with 91.7% accuracy and less than 6% false 
positive rates; and (4) a multi-objective optimization framework that jointly opti
mizes routing latency, energy consumption, spectrum efficiency, and security risk 
through the unified DRL approach. Extensive simulation experiments across vary
ing network densities (50–200 nodes), traffic loads, spectrum activity levels, and 
six attack scenarios—including an adaptive reinforcement learning-based adver
sary—demonstrate that DRL-SecRoute achieves a packet delivery ratio of up to 
97.8% under collaborative attack conditions and sustains 84.2% PDR even against 
an adaptive reinforcement learning-based adversary, improves energy efficiency 
by 32.7% over existing protocols, maintains a spectrum access collision rate below  3.5% across all primary user activity levels, and extends network lifetime by 41.3%. 
These results confirm that DRL-SecRoute offers a reliable and scalable foundation 
for next-generation CR-WSN deployments.