首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This paper develops a cooperative federated reinforcement learning (RL) strategy that enables two unmanned aerial vehicles (UAVs) to cooperate in learning and predicting the movements of an intelligent deceptive target in a given search area. The proposed strategy allows the UAVs to autonomously cooperate, through information exchange of the gained experience to maximize the target detection performance and accelerate the learning speed while maintaining privacy. Specifically, we consider a monitoring model that includes a search area, a charging station, two cooperative UAVs, an intelligent deceptive uncertain moving target, and a fake (false) target. Each UAV is equipped with a limited-capacity rechargeable battery and a communication unit for exchanging the gained experience. The problem of maximizing the detection probability of the uncertain deceptive target using cooperative UAVs is mathematically modeled as a search-benefit maximization problem, which is then reformulated as a Markov decision process (MDP) due to the uncertainty nature of the problem. Because there is no prior information on the targets’ movement, a cooperative RL, is utilized to tackle the problem. The proposed cooperative RL-based algorithm is a distributed collaborative mechanism that enables the two UAVs, i.e., agents, to individually interact with the operating environment and maximize their cumulative rewards by converging to a shared policy while achieving privacy. Simulation results indicate that a cooperative RL-based dual UAV system can noticeably improve the target detection probability, reduce the detection performance, and accelerate the learning speed.  相似文献   

2.
The study aims to solve the problem of real time tracking and precise landing of unmanned aerial vehicle (UAV) during unmanned surface vehicle (USV) navigation. In this paper, a UAV-USV cooperative tracking and landing control strategy based on nonlinear model predictive control (NMPC) is proposed. Firstly, the UAV-USV heterogeneous intelligent body collaborative system is constructed based on the mathematical model of UAV and USV; secondly, the tracking controller is designed based on NMPC algorithm to ensure that the UAV can track the USV in real time; finally, a UAV-USV cooperative landing control strategy is proposed to realize the heave motion of the USV to the peak vertex, thus, the UAV completes the precise landing with the minimum impact. As the simulation experimental results show, the UAV-USV cooperative tracking and landing control scheme proposed in this paper can provide effective solution against real time tracking and accurate landing of UAV during the navigation of USV.  相似文献   

3.
In this paper, six-rotor UAVs are used in the field of distribution to realize the delivery of materials arriving the demand point by UAVs. Due to the small load capacity of the six-rotor UAV, in response to actual demand, the UAV group will be used to complete the delivery task. Considered to be close to the real requirements, trajectory constraints and dynamic obstacles are established in the trajectory planning based on group perception range. In order to better deal with dynamic obstacles and related constraints, this paper designs a distributed adaptive algorithm based on individual decision-making and group decision-making. Individual decision-making is embodied in the intelligence and adjustment of UAVs, involving actor-critic methods, artificial potential field method ideas and probability finite state machines; group decision-making is embodied in the leadership mechanism and joint decision-making; self-adaptation is embodied in the adaptive adjustment of UAV level in group. In order to avoid collisions between UAV groups, a conflict resolution algorithm is designed. Through simulation analysis, the distributed adaptive algorithm proposed in this paper can not only satisfy all constraints, avoid dynamic obstacles stably, and complete tasks with small fluctuations, but also obtain the most successful decision-making UAV in the group. This article further analyzes and discusses the relevant parameters in the algorithm, and obtains the optimal parameter ranges in individual decision-making and group decision-making.  相似文献   

4.
This paper solves a data-driven control problem for a flow-based distribution network with two objectives: a resource allocation and a fair distribution of costs. These objectives represent both cooperation and competition directions. It is proposed a solution that combines either a centralized or distributed cooperative game approach using the Shapley value to determine a proper partitioning of the system and a fair communication cost distribution. On the other hand, a decentralized non-cooperative game approach computing the Nash equilibrium is used to achieve the control objective of the resource allocation under a non-complete information topology. Furthermore, an invariant-set property is presented and the closed-loop system stability is analyzed for the non-cooperative game approach. Another contribution regarding the cooperative game approach is an alternative way to compute the Shapley value for the proposed specific characteristic function. Unlike the classical cooperative-games approach, which has a limited application due to the combinatorial explosion issues, the alternative method allows calculating the Shapley value in polynomial time and hence can be applied to large-scale problems.  相似文献   

5.
In this paper, we investigate the optimal denial-of-service attack scheduling problems in a multi-sensor case over interference channels. Multiple attackers aim to degrade the performance of remote state estimation under attackers’ energy constraints. The attack decision of one attacker may be affected by the others while all attackers find their own optimal strategies to degrade estimation performance. Consequently, the Markov decision process and Markov cooperative game in two different information scenarios are formulated to study the optimal attack strategies for multiple attackers. Because of the complex computations of the high-dimensional Markov decision process (Markov cooperative game) as well as the limited information for attackers, we propose a value iteration adaptive dynamic programming method to approximate the optimal solution. Moreover, the structural properties of the optimal solution are analyzed. In the Markov cooperative game, the optimal joint attack strategy which admits a Nash equilibrium is studied. Several numerical simulations are provided to illustrate the feasibility and effectiveness of the main results.  相似文献   

6.
In this paper, optimized interaction control is investigated for human-multi-robot collaboration control problems, which cannot be described by the traditional impedance controller. To realize global optimized interaction performance, the multi-player non-zero sum game theory is employed to obtain the optimized interaction control of each robot agent. Regarding the game strategies, Nash equilibrium strategy is utilized in this paper. In human-multi-robot collaboration problems, the dynamics parameters of the human arm and the manipulated object are usually unknown. To obviate the dependence on these parameters, the multi-player Q-learning method is employed. Moreover, for the human-multi-robot collaboration problem, the optimized solution is difficult to resolve due to the existence of the desired reference position. A multi-player Nash Q-learning algorithm considering the desired reference position is proposed to deal with the problem. The validity of the proposed method is verified through simulation studies.  相似文献   

7.
本文考虑随机因素干扰的情形下,运用HJB方程和动态规划方法分别求解Nash非合作博弈和协同创新博弈模型下大学与企业的知识共享策略。结果表明:(1)两种博弈情形下,知识共享的成本越高,共享的知识量越少,知识共享边际收益越高;(2)协同创新博弈模式下的知识共享量、系统总收益均高于Nash非合作博弈,更易于达到Pareto最优,即推动产学研协同创新有助于提升系统总收益;(3)在合作情形下,大学与企业的决策目标定位于整体收益最大化,使得双方在知识共享努力程度与整体收益情况均优于Nash非合作博弈,在对知识共享行为有效协调下,合作策略是大学与企业构建协同创新系统,促进系统内知识共享的最优选择。  相似文献   

8.
应用博弈理论中的Nash均衡及其合作改进,研究了处于平等地位下的两企业合作进行知识创新的收益分配机制,并对创新系数及创新性成本进行了灵敏度分析。研究结果表明,采取合作策略不仅可以达到Pareto改进目的,抵制败德行为,还能更有效地巩固成员企业的合作关系,使联盟向着良性方向发展。  相似文献   

9.
在考虑随机因素干扰的情形下,通过建立一个随机微分博弈模型研究了产学研协同创新主体间的知识共享问题。运用动态规划方法分别求得了Stackelberg主从博弈和协同合作博弈两种下均衡的知识共享策略和创新补贴比例,并对两种博弈模式下的均衡结果进行了比较。比较结果显示:(1)在Stackelberg主从博弈和协同合作博弈两种模式下,知识共享的成本及其创新能力、知识共享的边际收益及其折旧率对创新主体共享的知识量产生重要影响,当知识共享的成本及其折旧率提高时,共享的知识量将减少;当知识创新能力及其共享的边际收益提升时,共享的知识量将提高。(2)在协同合作博弈下,创新主体共享的知识量、知识创新系统的总收益、知识创新量的期望值和方差均高于Stackelberg主从博弈下的值。  相似文献   

10.
盛永祥  孙庆华  吴洁 《软科学》2012,26(8):23-26
采用博弈方法,在构造产学研合作博弈收益表的基础上让不同产学研合作组织选择某种交流方式作为策略,并以收益函数为效用函数对产学研合作组织的不同互动合作网络和不同人员采取不同交流方式的协调进行均衡分析。研究发现:对于任意一个产学研合作互动网络,采取相同交流的策略组合是在纳什均衡的条件下,如果网络是完整的,那么这些是可能仅有的纳什均衡;如果网络不完整,那么可能存在交流方式多样化的均衡。而在每个网络内部参与者采取不同交流的策略组合都是在纳什均衡条件下,无论网络是完整还是不完整的,都可能存在交流方式多样化的均衡。接着分析产学研合作初期阶段中,不同人员交流方式的囚徒困境均衡具有历史路径依赖性,并且难以通过内部协调交流方式走出困境,但是可以通过产学研合作组织中不同人员交流方式的协调达到帕累托均衡。  相似文献   

11.
沙淑欣  于淑俐 《现代情报》2017,37(9):132-137
从博弈论的视角出发,将图书馆作为数据服务合作开展的核心角色,分析其与各利益主体的关系,寻求纳什均衡状态;提出多方共赢的合作博弈策略;并构建实现数据服务顺利、高效开展的保障机制。  相似文献   

12.
This article is concerned with the infinite horizon stochastic cooperative linear-quadratic (LQ) dynamic difference game in both the regular and the indefinite cases. Firstly, due to the constraints imposed on the weighting matrices and the linearity of the dynamic system, the costs are shown to be convex spontaneously for the regular stochastic cooperative LQ difference game, which yields the equivalence between the minimization of the weighted sum of costs and the Pareto optimal control. Secondly, the Pareto optimal control is derived for the regular game on the ground of the solution to the weighted algebraic Riccati equation (WARE) under exact observability, and then Pareto solutions are identified via the optimal feedback gain matrices and the solution to the weighted algebraic Lyapunov equation (WALE). Moreover, a new criterion which is also necessary and sufficient is developed to guarantee the costs to be convex for the indefinite case, and the Pareto optimality is investigated based on the solutions to the weighted generalized algebraic Riccati equation (WGARE) and the weighted generalized algebraic Lyapunov equation (WGALE) combining with the semidefinite programming (SDP). Finally, the fishery management game in the economy is presented to illustrate the obtained results.  相似文献   

13.
在分析AMC不良资产拍卖特点的基础上,采用博弈思想,建立了AMC不良资产拍卖处置决策博弈模型,并对AMC不良资产拍卖定价模型的纳什均衡解进行了探讨,讨论了定价中不良资产实际价值和拍卖处置费用等有关要素对拍卖价格的影响,从博弈的角度解析了AMC不良资产拍卖定价决策机理,为实践中AMC拍卖处置不良资产提供了理论上的指导。  相似文献   

14.
冯庆华  陈菊红  刘通 《软科学》2012,26(12):112-116
通过建立博弈模型求解出了供应链核心企业的合作收益和领导收益,在此基础上提出了核心企业与上下游企业建立合作关系和领导关系时的8种合作和领导策略。并通过收益的比较,给出了不同的合作成本和管理成本条件下,核心企业对8种合作和领导策略的最优选择结果。最后通过一个数值分析验证了结果的有效性。  相似文献   

15.
We consider a remote state estimation process under an active eavesdropper for cyber-physical system. A smart sensor transmits its local state estimates to a remote estimator over an unreliable network, which is eavesdropped by an adversary. The intelligent adversary can work in passive eavesdropping mode and active jamming mode. An active jamming mode enables the adversary to interfere the data transmission from sensor to estimator, and meanwhile improve the data reception of itself. To protect the transmission data from being wiretapped, the sensor with two antennas injects noise to the eavesdropping link with different power levels. Aiming at minimizing the estimation error covariance and power cost of themselves while maximizing the estimation error covariance of their opponents, a two-player nonzero-sum game is constructed for sensor and active eavesdropper. For an open-loop case, the mixed Nash equilibrium is obtained by solving an one-stage nonzero-sum game. For a long term consideration, a Markov stochastic game is introduced and a Nash Q-learning method is given to find the Nash equilibrium strategies for two players. Numerical results are provided to show the effectiveness of our theoretical conclusions.  相似文献   

16.
17.
基于全量折扣和营销费用的供应链生产商—零售商博弈模型为基础,分析了合作和非合作博弈下生产商和零售商的定价、需求量、产量及收益情况。其中非合作博弈基于Stackelberg模型,分为生产商领导和零售商领导,合作博弈则基于Pareto改进。结果表明,在合作博弈情况下,销售价格和市场营销费用均比非合作博弈低,同时在合作博弈情况下,需求也较非合作博弈有所提高。  相似文献   

18.
基于动态博弈模型的企业与供应商项目关系管理   总被引:2,自引:0,他引:2  
通过对博弈的边界条件进行界定,构筑了一个项目关系管理的三阶段斯坦克尔伯格(Stackelberg)动态博弈模型。对本博弈的子博弈精炼纳什均衡的分析认为:企业与项目供应商的非合作博弈中,供应商是否履约以及合同实施的程度取决于企业对博弈双方长远合作关系的引导,企业如果加强关系管理,将激励供应商按照企业要求的合同时序量履约,从而实现双方最优的策略选择——合作式竞争,最终实现企业自身的项目目标乃至战略目标。  相似文献   

19.
This paper tackles the problem of a two-player differential game affected by matched uncertainties with only the output measurement available for each player. We suggest a state estimation based on the so-called algebraic hierarchical observer for each player in order to design the Nash equilibrium strategies based on such estimation. At the same time, the use of an output integral sliding mode term (also based on the estimation processes) for the Nash strategies robustification for both players ensures the compensation of the matched uncertainties. A simulation example shows the feasibility of this approach in a magnetic levitator problem.  相似文献   

20.
王克强  刘红梅  黄智俊 《软科学》2006,20(5):106-108,112
用博弈论的方法从双寡头市场的角度探讨了供水企业在进行节水灌溉设施技术创新活动中的合作与不合作行为,分析了在这样的市场结构中节水灌溉设施技术创新的微观博弈机制。通过比较不同假设条件下供水企业进行节水灌溉设施技术创新的预期利润,得出了不同情形下的纳什均衡。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号