首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
This paper investigates a Q-learning scheme for the optimal consensus control of discrete-time multiagent systems. The Q-learning algorithm is conducted by reinforcement learning (RL) using system data instead of system dynamics information. In the multiagent systems, the agents are interacted with each other and at least one agent can communicate with the leader directly, which is described by an algebraic graph structure. The objective is to make all the agents achieve synchronization with leader and make the performance indices reach Nash equilibrium. On one hand, the solutions of the optimal consensus control for multiagent systems are acquired by solving the coupled Hamilton–Jacobi–Bellman (HJB) equation. However, it is difficult to get analytical solutions directly of the discrete-time HJB equation. On the other hand, accurate mathematical models of most systems in real world are hard to be obtained. To overcome these difficulties, Q-learning algorithm is developed using system data rather than the accurate system model. We formulate performance index and corresponding Bellman equation of each agent i. Then, the Q-function Bellman equation is acquired on the basis of Q-function. Policy iteration is adopted to calculate the optimal control iteratively, and least square (LS) method is employed to motivate the implementation process. Stability analysis of proposed Q-learning algorithm for multiagent systems by policy iteration is given. Two simulation examples are experimented to verify the effectiveness of the proposed scheme.  相似文献   

2.
This paper discusses linear-quadratic infinite-time nonzero-sum closed-loop Nash games for systems with fast and slow modes. It is shown via example that the usual order reduction processes utilizing control ideas of singular perturbation analysis leads to an ill-posed reduced order problem. A modification of the performance indices is presented which leads to a well-posed problem, when the usual order reduction method is used. Finally, a hierarchical reduction procedure is proposed which leads to well-posed fast and slow game problems even when the performance indices are not modified.  相似文献   

3.
We consider a remote state estimation process under an active eavesdropper for cyber-physical system. A smart sensor transmits its local state estimates to a remote estimator over an unreliable network, which is eavesdropped by an adversary. The intelligent adversary can work in passive eavesdropping mode and active jamming mode. An active jamming mode enables the adversary to interfere the data transmission from sensor to estimator, and meanwhile improve the data reception of itself. To protect the transmission data from being wiretapped, the sensor with two antennas injects noise to the eavesdropping link with different power levels. Aiming at minimizing the estimation error covariance and power cost of themselves while maximizing the estimation error covariance of their opponents, a two-player nonzero-sum game is constructed for sensor and active eavesdropper. For an open-loop case, the mixed Nash equilibrium is obtained by solving an one-stage nonzero-sum game. For a long term consideration, a Markov stochastic game is introduced and a Nash Q-learning method is given to find the Nash equilibrium strategies for two players. Numerical results are provided to show the effectiveness of our theoretical conclusions.  相似文献   

4.
This paper solves a data-driven control problem for a flow-based distribution network with two objectives: a resource allocation and a fair distribution of costs. These objectives represent both cooperation and competition directions. It is proposed a solution that combines either a centralized or distributed cooperative game approach using the Shapley value to determine a proper partitioning of the system and a fair communication cost distribution. On the other hand, a decentralized non-cooperative game approach computing the Nash equilibrium is used to achieve the control objective of the resource allocation under a non-complete information topology. Furthermore, an invariant-set property is presented and the closed-loop system stability is analyzed for the non-cooperative game approach. Another contribution regarding the cooperative game approach is an alternative way to compute the Shapley value for the proposed specific characteristic function. Unlike the classical cooperative-games approach, which has a limited application due to the combinatorial explosion issues, the alternative method allows calculating the Shapley value in polynomial time and hence can be applied to large-scale problems.  相似文献   

5.
Substructure transporting is an important phase for on-orbit assembly. This paper investigates a problem of designing a control approach for multiple transporting agents attached to one substructure, so as to complete the task of attitude tracking and stabilization of the substructure in the transportation process. A finite-time fuzzy game control method is developed to solve this problem. Using the framework of differential game, a finite-time nonlinear game is formulated based on the individual performance index functions of agents and attitude dynamics of the combination consisting of transporting agents and the substructure, which can reflect the cooperation and coordination between agents. In order to realize finite-time convergence which is more suitable for engineering requirement, a speed function is introduced to transfer finite-time game into infinite-time game. Considering the limited computational ability of agents, Takagi-Sugeno (T-S) fuzzy is incorporated to divide the nonlinear game problem into weighted average of multiple linear games which are easy to get the Nash equilibrium. Numerical simulations validate the efficiency of the proposed method for attitude control and the advantage in less calculation and better performance in dynamics and steady state than the existing methods.  相似文献   

6.
This paper develops a new dual ML-ADHDP method to solve the optimal consensus problem (OCP) of a class of heterogeneous discrete-time nonlinear multi-agent systems (MASs) with unknown dynamics and time delay. A hierarchical and distributed control strategy is used to transform the original problem into nonlinear model reference adaptive control (MRAC) problems and an OCP of virtual linear MASs. For the nonlinear MRAC problems, a new multi-layer action-dependent heuristic dynamic programming (ML-ADHDP) method is developed to overcome the unknown dynamics and neural network estimation errors, which has higher control accuracy. In order to solve the OCP of virtual linear MASs and improve the convergence speed, a new multi-layer performance index is proposed. Then the ML-ADHDP method is used to solve the coupled Hamiltonian–Jacobi–Bellman equation and obtain the optimal virtual control. Theoretical analysis proves that the original MASs can achieve Nash equilibrium, and simulation results show that the developed dual ML-ADHDP method ensures better convergence speed and higher control accuracy of original MASs.  相似文献   

7.
This paper addresses a finite-time rendezvous problem for a group of unmanned aerial vehicles (UAVs), in the absence of a leader or a reference trajectory. When the UAVs do not cooperate, they are assumed to use Nash equilibrium strategies (NES). However, when the UAVs can communicate among themselves, they can implement cooperative game theoretic strategies for mutual benefit. In a convex linear quadratic differential game (LQDG), a Pareto-optimal solution (POS) is obtained when the UAVs jointly minimize a team cost functional, which is constructed through a convex combination of individual cost functionals. This paper proposes an algorithm to determine the convex combination of weights corresponding to the Pareto-optimal Nash Bargaining Solution (NBS), which offers each UAV a lower cost than that incurred from the NES. Conditions on the cost functions that make the proposed algorithm converge to the NBS are presented. A UAV, programmed to choose its strategies at a given time based upon cost-to-go estimates for the rest of the game duration, may switch to NES finding it to be more beneficial than continuing with a cooperative strategy it previously agreed upon with the other UAVs. For such scenarios, a renegotiation method, that makes use of the proposed algorithm to obtain the NBS corresponding to the state of the game at an intermediate time, is proposed. This renegotiation method helps to establish cooperation between UAVs and prevents non-cooperative behaviour. In this context, the conditions of time consistency of a cooperative solution have been derived in connection to LQDG. The efficacy of the guidance law derived from the proposed algorithm is illustrated through simulations.  相似文献   

8.
This paper deals with the robust position control problem for a three degree-of-freedom (3DOF) laboratory helicopter. The 3DOF helicopter system is a nonlinear multiple-input multiple-output (MIMO) uncertain system, and has the elevation, pitch, and travel angles. The proposed robust controller is a hierarchical controller including an attitude controller and a position controller. The position controller generates the desired reference of the pitch angle based on the tracking error of the travel angle, while the attitude controller achieves the reference tracking of the pitch and elevation angles. It is proven that the tracking errors of the three angles can converge into the given neighborhoods ultimately. Experimental results on the laboratory helicopter demonstrate the effectiveness of the proposed hierarchical control strategy.  相似文献   

9.
In this paper, a novel tracking control scheme for continuous-time nonlinear affine systems with actuator faults is proposed by using a policy iteration (PI) based adaptive control algorithm. According to the controlled system and desired reference trajectory, a novel augmented tracking system is constructed and the tracking control problem is converted to the stabilizing issue of the corresponding error dynamic system. PI algorithm, generally used in optimal control and intelligence technique fields, is an important reinforcement learning method to solve the performance function by critic neural network (NN) approximation, which satisfies the Lyapunov equation. For the augmented tracking error system with actuator faults, an online PI based fault-tolerant control law is proposed, where a new tuning law of the adaptive parameter is designed to tolerate four common kinds of actuator faults. The stability of the tracking error dynamic with actuator faults is guaranteed by using Lyapunov theory, and the tracking errors satisfy uniformly bounded as the adaptive parameters get converged. Finally, the designed fault-tolerant feedback control algorithm for nonlinear tracking system with actuator faults is applied in two cases to track the desired reference trajectory, and the simulation results demonstrate the effectiveness and applicability of the proposed method.  相似文献   

10.
A large class of visual servo controllers relies on an a priori obtained reference image, captured at the desired position and orientation (i.e., pose) of a camera, to yield control signals to regulate the camera from its current pose to a desired pose. In many applications, accessibility and economics of the operation may prohibit acquisition of such a reference image. This paper introduces a new visual servo control paradigm that enables control of the camera in the absence of reference image using a set of terminal constraints. Specifically, the desired pose is encoded using the angle of obliquity of the optical axis with respect to the object plane and its direction of arrival at the plane. A constrained convex optimization problem is formulated over a conic section defined by the terminal constraints to yield an error system for the control problem. Subsequently, this work introduces continuous terminal sliding mode visual servo controllers to regulate the camera to the desired pose. Lyapunov-based stability analysis guarantees that the origin is a finite-time-stable equilibrium of the system. Numerical simulation results are provided to verify the performance of the proposed visual servo controller.  相似文献   

11.
企业员工隐性知识外部化过程中的博弈分析   总被引:1,自引:0,他引:1  
现代企业需要构建企业内部知识库的一个基础条件是要将员工的隐性知识外部化,这一过程中存在着公共物品由私人提供的囚徒困境问题。本文就这一问题建立了一个完全信息静态博弈模型,并讨论了风险规避决策参与者的四种纳什均衡解。进一步的,从员工个人角度探讨了纳什均衡下的隐性知识供给,从企业整体利益的角度讨论了帕累托最优的隐性知识供给,并分析比较了这两种状态下的个人贡献量关系。最后,根据上述分析,给出了一系列相应的管理启示。  相似文献   

12.
In this paper, a subspace predictive control (SPC) method with a novel data-driven event-triggered law is proposed for linear time-invariant systems with unknown model parameters. Based on the conventional SPC method, the event-triggered law is introduced to substitute the typical receding horizon optimization, which reduces the data computation load of the traditional SPC method. The key parameters of the event-triggered law are derived by the Q-learning method via system data and the input-to-state stability of the system can be ensured with the designed event-triggered law. The simulation results illustrate the effect and merits of the proposed method with comparisons.  相似文献   

13.
In this paper, a numerical method to solve nonlinear optimal control problems with terminal state constraints, control inequality constraints and simple bounds on the state variables, is presented. The method converts the optimal control problem into a sequence of quadratic programming problems. To this end, the quasilinearization method is used to replace the nonlinear optimal control problem with a sequence of constrained linear-quadratic optimal control problems, then each of the state variables is approximated by a finite length Chebyshev series with unknown parameters. The method gives the information of the quadratic programming problem explicitly (The Hessian, the gradient of the cost function and the Jacobian of the constraints). To show the effectiveness of the proposed method, the simulation results of two constrained nonlinear optimal control problems are presented.  相似文献   

14.
In practice, many controlled plants are equipped with MIMO non-affine nonlinear systems. The existing methods for tracking control of time-varying nonlinear systems mostly target the systems with special structures or focus only on the control based on neural networks which are unsuitable for real-time control due to their computation complexity. It is thus necessary to find a new approach to real-time tracking control of time-varying nonlinear systems. In this paper, a control scheme based on multi-dimensional Taylor network (MTN) is proposed to achieve the real-time output feedback tracking control of multi-input multi-output (MIMO) non-affine nonlinear time-varying discrete systems relative to the given reference signals with online training. A set of ideal output signals are selected by the given reference signals, the optimal control laws of the system relative to the selected ideal output signals are set by the minimum principle, and the corresponding optimal outputs are taken as the desired output signals. Then, the MTN controller (MTNC) is generated automatically to fit the optimal control laws, and the conjugate gradient (CG) method is employed to train the network parameters offline to obtain the initial parameters of MTNC for online learning. Addressing the time-varying characteristics of the system, the back-propagation (BP) algorithm is implemented to adjust the weight parameters of MTNC for its desired real-time output tracking control by the given reference signals, and the sufficient condition for the stability of the system is identified. Simulation results show that the proposed control scheme is effective and the actual output of the system tracks the given reference signals satisfactorily.  相似文献   

15.
In this paper, the appointed-time prescribed performance and finite-time tracking control problem is investigated for quadrotor unmanned aerial vehicle (QUAV) in the presence of time-varying load, unknown external disturbances and unknown system parameters. For the position loop, a novel appointed-time prescribed performance control (ATPPC) strategy is proposed based on adaptive dynamic surface control (DSC) frameworks and a new prescribed performance function to achieve the appointed-time convergence and prescribed transient and steady-state performance. For the attitude loop, a new finite-time control strategy is proposed based on a new designed sliding mode control technique to track the desired attitude in finite time. Some assumptions of knowing system parameters are canceled. Finally, the stability of the closed-loop system is proved via Lyapunov Theory. Simulations are performed to show the effectiveness and superiority of the proposed control scheme.  相似文献   

16.
This paper presents an integrated and practical control strategy to solve the leader–follower quadcopter formation flight control problem. To be specific, this control strategy is designed for the follower quadcopter to keep the specified formation shape and avoid the obstacles during flight. The proposed control scheme uses a hierarchical approach consisting of model predictive controller (MPC) in the upper layer with a robust feedback linearization controller in the bottom layer. The MPC controller generates the optimized collision-free state reference trajectory which satisfies all relevant constraints and robust to the input disturbances, while the robust feedback linearization controller tracks the optimal state reference and suppresses any tracking errors during the MPC update interval. In the top-layer MPC, two modifications, i.e. the control input hold and variable prediction horizon, are made and combined to allow for the practical online formation flight implementation. Furthermore, the existing MPC obstacle avoidance scheme has been extended to account for small non-apriorily known obstacles. The whole system is proved to be stable, computationally feasible and able to reach the desired formation configuration in finite time. Formation flight experiments are set up in Vicon motion-capture environment and the flight results demonstrate the effectiveness of the proposed formation flight architecture.  相似文献   

17.
This paper addresses the challenging problem of decentralized adaptive control for a class of coupled hidden leader-follower multi-agent systems, in which each agent is described by a nonlinearly parameterized uncertain model in discrete time and can interact with its neighbors via the history information from its neighbors. One of the agents is a leader, who knows the desired reference trajectory, while other agents cannot receive the desired reference signal or are unaware of existence of the leader. In order to tackle unknown internal parameters and unknown high-frequency gains, a projection-type parameter estimation algorithm is proposed. Based on the certainty equivalence principle and neighborhood history information, the decentralized adaptive control is designed, under which, the boundedness of identification error is guaranteed with the help of the Lyapunov theory. Under some conditions, it is shown that the multi-agent system eventually achieves synchronization in the presence of strong couplings. Finally, a simulation example is given to support the results of the proposed scheme.  相似文献   

18.
朱国军  许长新 《科研管理》2012,33(12):117-125
面向中小企业的专利质押融资业务能够有效盘活无形资产,缓解中小企业资金压力。本文主要研究银行专利质押融资业务的核心风险控制指标——质押率决策。基于博弈理论,提出完全市场模式下的专利质押融资质押率实质上是银行与企业博弈的结果,该结果符合Stackelberg leadership model。从企业违约风险评估与银行承担风险程度两个维度,将博弈模型求解问题转化为纳什均衡求解,并运用VaR风险计量等方法,确定银行最优质押率,探索达到纳什均衡的情境。本研究将为银行专利质押融资质押率决策提供重要理论参考。  相似文献   

19.
姚树俊  陈菊红 《软科学》2013,27(2):55-61
针对产品服务化供应链中产品服务价格竞争问题,提出旁支付契约的初始化模型。在Bertrand博弈基础上,利用旁支付契约,构建了双寡头市场下两个服务集成商之间的产品服务价格协调机制。提出了产品服务价格协调路径,并分别从三种不同情形,探讨了旁支付契约机制在产品服务价格协调方面的应用问题。结果表明,与非合作策略相比,借助于旁支付契约,两个服务集成商各自的收益均有所增加;运用Nash仲裁方案和Shapley值建立的收益分配机制能够对总收益增加值进行公平合理地分配,确保双方协同合作的稳定性。最后通过企业数值实例,验证了旁支付契约在产品服务价格协调方面的有效性。  相似文献   

20.
以制造商和供应商为研究对象,运用博弈理论先后比较了当双方处于3种博弈均衡(Nash均衡、Stackelberg均衡和合作均衡)时合作创新的利润差异,证明了合作博弈具有帕累托优势,并利用Shapley值法和Nash讨价还价法解决了合作后利润的分配问题。研究结果表明:制造商与供应商合作创新的动力源于自身产品的边际收益;制造商与供应商的相互合作可以实现帕累托改进,从而为供应链企业增进自身收益乃至整个供应链的收益提供了重要的理论借鉴。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号