Publication: Robustifying Neural Adaptive Bitrate Algorithms Against Noisy and Adversarial Network Conditions
Files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Access Restrictions
Abstract
Adaptive bitrate (ABR) algorithms for video streaming aim to maximize user Quality of Experience (QoE) by adjusting video quality to network conditions. Pensieve is a pioneering ABR algorithm that uses deep reinforcement learning (RL) to outperform conventional approaches [1]. However, like other deep RL policies, Pensieve may be vulnerable to adversarial perturbations in its state observations. In this paper, we investigate the robustness of Pensieve under adversarial state perturbations and propose adversarial training to harden it. We consider an adversary that injects small bounded errors into Pensieve’s input state (e.g., throughput history, buffer level, etc.), with the goal of inducing rebuffering events playback stalls) that severely degrade QoE. We develop two attack methods: one based on Bayesian Optimization (BO) to find worst-case perturbations in a black-box manner, and another based on Projected Gradient Descent (PGD) as a white-box attack using Pensieve’s policy network gradients. We then adversarially train Pensieve against these attacks to produce robust models.
We present a comprehensive evaluation using the standard Pensieve simulation environment (with the Mahimahi network trace emulator [2]) to compare the BO and PGD adversaries and the resulting robust policies. Our results show that even small input perturbations (within a maximum norm of 5–10% of feature values) can greatly increase rebuffering time for the original Pensieve (by 5–10×). The BO-based adversary is highly effective, finding perturbations that increase rebuffering by up to 20% more than PGD-based attacks, albeit with more attack queries. Adversarial training with either attack significantly improves Pensieve’s robustness: after training, rebuffering induced by attacks drops by 60–70%. The BO-adversarially-trained model is the most robust, with only a minor QoE degrading scenarios. We discuss the efficiency trade-offs between BO and PGD (BO requires fewer iterations but more simulation time per attack, while PGD is faster per attack but somewhat less optimal), and show that adversarially trained Pensieve generalizes well to unforeseen perturbations. This work demonstrates that adversarial training can substantially bolster the reliability of RL-based streaming algorithms against both malicious attacks and noiselike disturbances, paving the way for safer deployment in real-world networks.