基于混合近端策略优化的交叉口信号相位与配时优化方法

doi:10.16097/j.cnki.1009-6744.2023.01.012

交通运输系统工程与信息 ›› 2023, Vol. 23 ›› Issue (1): 106-113.DOI: 10.16097/j.cnki.1009-6744.2023.01.012

• 智能交通系统与信息技术 • 上一篇下一篇

基于混合近端策略优化的交叉口信号相位与配时优化方法

陈喜群^*a，朱奕璋^b，吕朝锋^c

浙江大学，a. 建筑工程学院，智能交通研究所；b. 工程师学院，智能交通研究所；c. 建筑工程学院，杭州 310058

收稿日期:2022-08-10 修回日期:2022-12-08 接受日期:2022-12-21 出版日期:2023-02-25 发布日期:2023-02-16
作者简介:陈喜群(1986- )，男，黑龙江人，教授，博士。
基金资助:
国家自然科学基金(72171210)；浙江省自然科学基金重点项目(LZ23E080002)

Signal Phase and Timing Optimization Method for Intersection Based on Hybrid Proximal Policy Optimization

CHEN Xi-qun^*a, ZHU Yi-zhang^b, LV Chao-feng^c

a. Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture; b. Polytechnic Institute & Institute of Intelligent Transportation Systems; c. College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China

Received:2022-08-10 Revised:2022-12-08 Accepted:2022-12-21 Online:2023-02-25 Published:2023-02-16
Supported by:
National Natural Science Foundation of China (72171210)；Key Program of the Natural Science Foundation of Zhejiang Province, China (LZ23E080002)

摘要/Abstract

摘要： 交通信号优化控制是从供给侧缓解城市交通拥堵的重要手段，随着交通大数据技术的发展，利用深度强化学习进行信号控制成为重点研究方向。现有控制框架大多属于离散相位选择控制，相位时间通过决策间隔累积得到，可能与智能体探索更优动作相冲突。为此，本文提出基于混合近端策略优化(Hybrid Proximal Policy Optimization, HPPO)的交叉口信号相位与配时优化方法。首先在考虑相位时间实际应用边界条件约束下，将信号控制动作定义为参数化动作；然后通过提取交通流状态信息并输入到双策略网络，自适应生成下一相位及其相位持续时间，并通过执行动作后的交通状态变化，评估获得奖励值，学习相位和相位时间之间的内在联系。搭建仿真平台，以真实交通流数据为输入对新方法进行测试与算法对比。结果表明：新方法与离散控制相比具有更低的决策频率和更优的控制效果，车辆平均行程时间和车道平均排队长度分别降低了27.65%和23.65%。

关键词: 智能交通, 混合动作空间, 深度强化学习, 混合近端策略优化, 智能体设计

Abstract: Traffic signal timing is one of the critical measures to alleviate urban traffic congestion from the supply side. With traffic big data technology development, traffic signal control based on deep reinforcement learning has become a key research direction. Most of the existing control frameworks belong to discrete phase selection control, where phase associated duration is obtained by accumulating decision intervals. It may conflict with the agent's exploration for better actions. Therefore, this paper proposes a signal phase and timing optimization method based on hybrid proximal policy optimization for intersection. The study first defines a signal control action as a parameterized action under the constraint of practical application boundary condition of phase duration. Then, the state information is extracted and input into the bi-policy network to adaptively generate the next phase and its associated duration. The reward value of implementing action is evaluated according to the state change of the road network, so as to learn the intrinsic connection between phase and phase associated duration. A simulation platform is built to test the proposed method and compare the algorithms with real traffic flow data. Results show that compared with the discrete control, the proposed method achieves a lower decision frequency and better control effect, and the average travel time of vehicles and average queue length of lanes are reduced by 27.65% and 23.65%, respectively.

Key words: intelligent transportation, hybrid action space, deep reinforcement learning, hybrid proximal policy optimization, agent design

中图分类号:

U491

陈喜群, 朱奕璋, 吕朝锋. 基于混合近端策略优化的交叉口信号相位与配时优化方法[J]. 交通运输系统工程与信息, 2023, 23(1): 106-113.

CHEN Xi-qun, ZHU Yi-zhang, LV Chao-feng. Signal Phase and Timing Optimization Method for Intersection Based on Hybrid Proximal Policy Optimization[J]. Journal of Transportation Systems Engineering and Information Technology, 2023, 23(1): 106-113.

导出引用管理器 EndNote|Ris|BibTeX

链接本文: http://www.tseit.org.cn/CN/10.16097/j.cnki.1009-6744.2023.01.012

http://www.tseit.org.cn/CN/Y2023/V23/I1/106

参考文献

[1] 徐东伟, 周磊, 王达, 等. 基于深度强化学习的城市交通信号控制综述[J]. 交通运输工程与信息学报, 2022, 20(1): 16-37. [XU D W, ZHOU L, WANG D, et al. Overview of reinforcement learning-based urban traffic signal control[J]. Journal of Transportation Engineering and Information, 2022, 20(1): 16-37.]

[2] WEI H, ZHENG G J, YAO H X, et al. IntelliLight: Areinforcement learning approach for intelligent traffic light control[C]. London: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018.

[3] MAO F, LI Z H, LI L. A comparison of deep reinforcement learning models for isolated traffic signal control[J/OL]. IEEE Intelligent Transportation Systems Magazine. (2022-02-14) [2022-08-04]. https://doi.org/ 10.1109/MITS.2022.3144797.

[4] LI L, LV Y S, WANG F Y. Traffic signal timing via deep reinforcement learning[J]. IEEE/CAA Journal of Automatica Sinica, 2016, 3(3): 247-254.

[5] LIANG X Y, DU X S, WANG G L, et al. A deep reinforcement learning network for traffic light cycle control[J]. IEEE Transactions on Vehicular Technology, 2019, 68(2): 1243-1253.

[6] XU M, WU J P, HUANG L, et al. Network-wide traffic signal control based on the discovery of critical nodes and deep reinforcement learning[J]. Journal of Intelligent Transportation Systems, 2020, 24(1): 1-10.

[7] LI C H, MA X T, XIA L, et al. Fairness control of traffic light via deep reinforcement learning[C]. Electronic Network: 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE), 2020.

[8] YU M R, CHAI J J, LV Y S, et al. An effective deep reinforcement learning approach for adaptive traffic signal control[C]. Shanghai: 2020 Chinese Automation Congress, 2020.

[9] 马东方, 陈曦, 吴晓东, 等. 基于强化学习的干线信号混合协同优化方法[J]. 交通运输系统工程与信息, 2022, 22(2): 145-153. [MA D F, CHEN X, WU X D, et al. Mixed- coordinated decision-making method for arterial signals based on reinforcement learning[J]. Journal of Transportation Systems Engineering and Information Technology, 2022, 22(2): 145-153.]

[10] ZHENG G J, XIONG Y H, ZANG X S, et al. Learning phase competition for traffic signal control[C]. New York: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019.

[11] FAN Z, SU R, ZHANG W N, et al. Hybrid actor-critic reinforcement learning in parameterized Action space[C]. Macao: Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019.

[12] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. ArXiv Preprint ArXiv:1707.06347, 2017.

[13] YE D H, LIUZ, SUNM F, et al. Mastering complex control in MOBA games with deep reinforcement learning [C]. New York: 34th AAAI Conference on Artificial Intelligence, 2020.

[14] SCHULMAN J, MORITZ P, LEVINE S, et al. Highdimensional continuous controlusing generalized advantage estimation[J]. ArXiv Preprint ArXiv: 1506.02438, 2015.

[15] ZHANG H C, FENG S Y, LIU C, et al. CityFlow: A multiagent reinforcement learning environment for large scale city traffic scenario[C]. San Francisco: Proceedings of the World Wide Web Conference (WWW 2019), 2019.

[16] ZHANG G H, WANG Y H. Optimizing minimum and maximum green time settings for traffic actuated control at isolated intersections[J]. IEEE Transactions on Intelligent Transportation Systems, 2011,12(1): 164-173.

[17] COOLS S B, GERSHENSON C, D' HOOGHE B. Selforganizing traffic lights: A realistic simulation[M]// Advances in Applied Self-Organizing Systems, London: Springer, 2013: 45-55.

基于混合近端策略优化的交叉口信号相位与配时优化方法

Signal Phase and Timing Optimization Method for Intersection Based on Hybrid Proximal Policy Optimization

PDF

PDF(English version)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	陈峥, 赵文龙, 郭凤香, 赵志刚, 刘昱, 刘永刚. 计及安全性与舒适性的智能车辆换道轨迹规划研究[J]. 交通运输系统工程与信息, 2024, 24(1): 55-65.
[2]	裴玉龙, 傅博涵, 王子奇, 张杰. 引力理论框架下基于综合竞争力的自动驾驶拟人换道决策模型[J]. 交通运输系统工程与信息, 2024, 24(1): 66-80.
[3]	李传耀, 张帆, 王涛, 黄德鑫, 唐铁桥. 基于深度强化学习的道路交叉口生态驾驶策略研究[J]. 交通运输系统工程与信息, 2024, 24(1): 81-92.
[4]	王丹, 林业. 警告刺激对驾驶员接管性能的影响机理研究[J]. 交通运输系统工程与信息, 2024, 24(1): 106-114.
[5]	李岩, 王泰州, 徐金华, 陈姜会, 汪帆. 面向动态交通分配的交通需求深度学习预测方法[J]. 交通运输系统工程与信息, 2024, 24(1): 115-123.
[6]	任其亮, 徐韬, 刘媛, 程龙春. 考虑载客状态的改进孤立森林浮动车异常数据检测算法[J]. 交通运输系统工程与信息, 2024, 24(1): 124-131.
[7]	袁见, 刘福强, 安琨, 郑喆, 马万经, 俞秋田. 基于交通波特征的车道级车流溯源方法[J]. 交通运输系统工程与信息, 2024, 24(1): 159-167.
[8]	曲大义, 孟奕名, 王韬, 宋慧, 陈意成. 基于分子力场的网联自主车辆跟驰安全特性及模型[J]. 交通运输系统工程与信息, 2023, 23(6): 33-41.
[9]	郭凤香, 黄金涛, 陈昱光, 郭延永, 刘攀. 多种缺失模式下交通数据组合近似填补方法[J]. 交通运输系统工程与信息, 2023, 23(6): 42-50.
[10]	蒋贤才, 徐慧智. 混合驾驶环境下交叉口空间资源动态控制方法[J]. 交通运输系统工程与信息, 2023, 23(6): 63-73.
[11]	李泰国, 张天策, 李超, 周星宏. 基于面部倒立摆模型与信息熵的驾驶员疲劳检测[J]. 交通运输系统工程与信息, 2023, 23(5): 24-32.
[12]	蒋贤才, 程国柱. 网联交通环境下交叉口进口车道动态配置方法[J]. 交通运输系统工程与信息, 2023, 23(5): 55-66.
[13]	李鹏程, 董宝田, 李思贤. 基于时间图注意力的交叉口交通状态识别及关联度研究[J]. 交通运输系统工程与信息, 2023, 23(5): 67-74.
[14]	蔡浩, 李林峰, 李涵, 李新, 周腾. 基于极限学习机的短期交通流预测混合优化模型[J]. 交通运输系统工程与信息, 2023, 23(5): 75-82.
[15]	龙科军, 张仲根, 刘洋, 高志波. 城市道路应急车辆借道通行与信号协同优化模型[J]. 交通运输系统工程与信息, 2023, 23(5): 194-201.