交通运输系统工程与信息 ›› 2024, Vol. 24 ›› Issue (2): 96-104.DOI: 10.16097/j.cnki.1009-6744.2024.02.010

• 智能交通系统与信息技术 • 上一篇    下一篇

基于自注意力机制的深度强化学习交通信号控制

张玺君*,聂生元,李喆,张红   

  1. 兰州理工大学,计算机与通信学院,兰州730050
  • 收稿日期:2023-05-22 修回日期:2023-12-08 接受日期:2024-01-29 出版日期:2024-04-25 发布日期:2024-04-25
  • 作者简介:张玺君(1980- ),男,甘肃临洮人,副教授。
  • 基金资助:
    国家自然科学基金 (62162040);甘肃省自然科学基金重点项目 (22JR5RA226);甘肃省高等学校创新基金项目(2021A-028)。

Traffic Signal Control with Deep Reinforcement Learning and Self-attention Mechanism

ZHANGXijun*,NIE Shengyuan,LI Zhe,ZHANG Hong   

  1. School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China
  • Received:2023-05-22 Revised:2023-12-08 Accepted:2024-01-29 Online:2024-04-25 Published:2024-04-25
  • Supported by:
    NationalNaturalScienceFoundation of China (62162040);Key Program of the Natural Science Foundation of Gansu Province, China (22JR5RA226);Gansu Province Higher Education Innovation Fund-funded Project (2021A-028)。

摘要: 交通信号控制(TrafficSignal Control, TSC)仍然是交通领域中最重要的研究课题之一。针对现有基于深度强化学习(DeepReinforcementLearning, DRL)的交通信号控制方法的状态需要人为设计,导致提取交通状态信息难度大以及交通状态信息无法全面表达的问题,为了从有限特征中挖掘潜在交通状态信息,从而降低交通状态设计难度,提出一种引入自注意力网络的DRL算法。首先,仅获取交叉口各进入口车道车辆位置,使用非均匀量化和独热编码方法预处理得到车辆位置分布矩阵;其次,使用自注意力网络挖掘车辆位置分布矩阵的空间相关性和潜在信息,作为DRL算法的输入;最后,在单交叉口学习交通信号自适应控制策略,在多交叉口路网中验证所提算法的适应性和鲁棒性。仿真结果表明,在单交叉口环境下,与3种基准算法相比,所提算法在车辆平均等待时间等指标上具有更好的性能;在多交叉口路网中,所提算法仍然具有良好的适应性。

关键词: 智能交通, 自适应控制, 深度强化学习, 自注意力网络, 近端策略优化

Abstract: Traffic signal control (TSC) is still one of the most important research topics in the transportation field. The existing traffic signal control method based on deep reinforcement learning (DRL) needs to be designed manually, and it is often difficult to extract the complete traffic state information in the real operations. This paper proposes a DRL algorithm based on the self-attention network for the traffic signal control to analyze the potential traffic from limited traffic state information and reduce the difficulty of traffic state design. The vehicle position of each entry lane at the intersection is obtained, and the vehicle position distribution matrix is established through the non-uniform quantization and one-hot encoding method. The self-attention network is then used to analyze the spatial correlation and latent information of the vehicle location distribution matrix which is an input of the DRL algorithm. The traffic signal adaptive control strategy is trained at a single intersection and the adaptability and robustness of the proposed algorithm are verified in a multi-intersection road network. The simulation results show that in a single intersection environment, the proposed algorithm has better performance on the average vehicle delay and other indicators compared with three benchmark algorithms. The proposed algorithm also has good adaptability in the multi-intersection road network.

Key words: intelligent transportation, adaptive control, deep reinforcement learning, self-attention network, proximal policy optimization

中图分类号: