登录

双语推荐:迭代学习

结合扫描光刻系统的曝光特点,提出一种分段迭代学习控制方法.该方法继承了非因果迭代学习律充分学习的特点.为改善动态跟踪性能,在加速过程段对前一迭代周期的误差信息进行非因果学习,以保证其沿迭代轴的快速收敛性.为克服非因果迭代学习律盲目学习的缺点,在匀速曝光段不对误差信息进行非因果学习,以保证系统的曝光性能不发生恶化,并改善系统在时间轴的瞬态性能.此外,对该方法的收敛性进行了分析和证明,并结合实例,验证了方法的有效性.
Based on exposure characteristics of the wafer scanner,a segmented iterative learning control strategy is presented.The proposed method inherits the advantage of non-causal iterative learning control,which is full learning of errors.In order to improve the dynamic tracking performance,non-causal learning of errors in the last iteration is implemented during the acceleration phase,which guarantees the fast convergence of the system versus iteration domain.For the purpose of overcoming the weakness of blind learning for noncausal iterative learning control methods,errors are not learnt fully during the constant velocity so that exposure performance does not go bad.It improves the transient performance versus time.In addition,the convergence of the proposed method is analyzed and proved.Combining the numerical example,effectiveness of the method is testified.

[ 可能符合您检索需要的词汇 ]

强化学习(Reinforcement Learning)是学习环境状态到动作的一种映射,并且能够获得最大的奖赏信号。强化学习中有三种方法可以实现回报的最大化:值迭代、策略迭代、策略搜索。该文介绍了强化学习的原理、算法,并对有环境模型和无环境模型的离散空间值迭代算法进行研究,并且把该算法用于固定起点和随机起点的格子世界问题。实验结果表明,相比策略迭代算法,该算法收敛速度快,实验精度好。
Reinforcement learning is learning how to map situations to actions and get the maximize reward signal. In reinforce?ment learning, there are three methods that can maximize the cumulative reward. They are value iteration, policy iteration and policy search. In this paper, we survey the foundation and algorithms of reinforcement learning , research about model-based val?ue iteration and model-free value iteration and use this algorithms to solve the fixed starting point and random fixed starting point Gridworld problem. Experimental result on Gridworld show that the algorithm has faster convergence rate and better con?vergence performance than policy iteration.

[ 可能符合您检索需要的词汇 ]

讨论迭代初态与期望初态存在固定偏移情形下的迭代学习控制问题,提出带有反馈辅助项的PD 型迭代学习控制算法,可实现系统输出对期望轨迹的渐近跟踪。为了进一步实现输出轨迹在预定有限区间上对期望轨迹的完全跟踪,提出分别带有初始修正作用和终态吸引的学习算法。文中给出所提出的学习算法的极限轨迹,并对学习算法进行收敛性分析,推导出收敛性充分条件,可用于学习增益的确定。通过数值结果,验证所提学习算法的有效性。
This paper addresses the problem of iterative learning control for systems in the presence of a fixed initial shift. A feedback-aided PD-type learning algorithm is proposed, and the convergence analysis indicates that such a learning algorithm can ensure that the tracking error achieves asymptotic convergence with respect to time, as the iteration approaches infinity. Furthermore, the initial rectifying and terminal converging strategies are adopted respectively to form learning algorithms for eliminating the effect of the fixed initial shift. It is shown that the system output converges to the desired trajectory over a pre-specified time interval no matter what value the fixed initial shift takes. Numerical results are presented to demonstrate the effectiveness of the proposed learning algorithms.

[ 可能符合您检索需要的词汇 ]

针对迭代学习控制(Iterative learning control,ILC)中的初始状态问题,提出了采用有限时间跟踪微分器安排过渡过程方法,根据迭代学习控制中期望轨迹己知的特点,设计了其参数有明显物理意义并且调节方便的有限时间跟踪微分器.在此基础上,针对一类具有不确定性的非线性时变系统的迭代学习控制问题,提出了具有对不确定项进行估计的迭代学习控制算法,并应用类Lyapunov方法给出了相关定理证明.仿真结果表明所提出的方法是有效的.
For the initial state problem of iterative learning control (ILC), a method of arranging transition process is presented using finite time tracking differentiators. A finite time tracking differentiator, whose parameters are easy to adjust because of their clear physical meanings, is designed according to the feature of ILC that the desired trajectory is known beforehand. Based on this, an ILC algorithm with estimator for system uncertainty is presented for a class of nonlinear time-varying system with uncertainty. A Lyapunov-like approach is used to prove the corresponding theorems. Simulation results verify the effectiveness of the proposed methods.
间歇过程在生产中起到重要作用。针对间歇过程的控制提出了很多方法,迭代学习控制是其中一种。迭代学习控制需要合理的模型,目前数据驱动的建模方法受到重视。由于间歇过程通常为复杂的非线性过程,过程数据具有非线性相关性的特点。为了消除数据的非线性相关性,本文采用核主元回归方法对间歇过程进行建模,即在间歇过程的控制变量和终点质量之间建立间歇过程的模型。在此模型上,通过围绕标称轨迹线性化核主元回归模型,并最小化与终点质量相关的二次型目标函数,导出迭代学习控制算法从而计算控制策略。为了克服过程变化和扰动的不利影响,本文提出在批次间将最早的数据从训练数据集移除并加入最新的数据对核主元回归模型进行更新。由于迭代学习律中的增益矩阵反映的是过程的梯度信息,易使迭代学习控制过早收敛或偏离实际工况,为了获得更好的收敛效果,可对学习增益矩阵进行加权。通过对一个间歇聚合反应仿真过程的应用,加权迭代学习控制有良好的控制性能并显示出对过程变化和扰动的适应能力。该方法比基于主元回归模型的迭代学习控制方法具有更好的性能,因此基于核主元回归模型的加权迭代学习控制是一种有效的间歇过程控制方法。
Batch process plays an important role in manufacturing. Many methods have been proposed for batch process control, one of them is iterative learning control (ILC). Iterative learning control needs a reasonable model. By now, data-driven modeling methods have drawn great attention. Since batch process generally is a complicated nonlinear process, the process data have the character of nonlinear correlation. In order to eliminate the nonlinear correlation of the data, kernel principal component regression (KPCR) method is employed to model a batch process in the paper, by which a batch process is modeled between control variables and end-point qualities. Based on the model, the ILC algorithm is derived to calculate the control policy by linearizing the KPCR model around the nominal trajectories and minimising a quadratic objective function concerning the end-point product quality. To overcome the detrimental effects of uncertain process variations or disturbances, it is proposed in the p

[ 可能符合您检索需要的词汇 ]

针对一类非线性系统,提出了具有初态学习的开闭环PD型迭代学习算法,并给出了该算法的收敛充分条件。依据此收敛条件,可确定初态学习律和输入学习律的学习增益,而不必依赖系统的结构和参数,从而放宽了对初始定位的要求。初态学习允许在每次迭代开始时,其初态与期望初态有一定的定位误差,并允许初态在收敛条件范围内任意设置。利用压缩映射分析方法,证明了系统在任意初态下经过几次迭代后,实际输出能完全跟踪上期望轨迹。最后,通过仿真实例验证了所提算法的有效性和可行性。
In this paper ,for a class of nonlinear system ,an open‐closed‐loop PD‐type iterative learn‐ing principle with initial state learning is proposed and the sufficient condition for convergence is put for‐warded .Based on this convergence condition ,without depending on the system structure and parame‐ters ,the learning gains of initial state learning law and input learning law can be determined to relax the requirement on initial position .The initial state learning principle allow s a certain degree of orientation bias between actual initial state and desired initial state at the beginning of iteration ,and the actual ini‐tial state can be set arbitrarily in the convergence condition .Using the contraction mapping method ,it is proved that the output of the system with an arbitrary initial state can track the expected trajectory com‐pletely after several iterations .Finally ,the simulation results testify the proposed algorithm is effective and feasible .

[ 可能符合您检索需要的词汇 ]

针对一类线性系统,分析数据丢失对迭代学习控制算法的影响.基于lifting方法给出跟踪误差渐近收敛和单调收敛的条件,并分析收敛速度与数据丢失率的关系,结果表明收敛速度随着数据丢失程度的增加而变慢.为了抑制迭代变化扰动的影响,给出一种存在数据丢失时的鲁棒迭代学习控制器设计方法,并将控制器设计问题转化为求取线性矩阵不等式的可行解.仿真实例验证了理论分析结果和鲁棒迭代学习控制算法的有效性.
The effect analysis of data dropout on iterative learning control(ILC) for linear discrete-time systems is considered. By using the lifting technique to ILC, the conditions of tracking error for both asymptotic stability and monotonic convergence are given, and the relationship between convergence speed and data dropout rate is also presented. It is shown that the convergent speed gets slower as dropout rate increases. To attenuate iteration-varying disturbances for ILC system with data dropout, a robust iterative learning controller design is proposed. The controller can be derived in terms of linear matrix inequalities(LMIs) that can be solved by using existing numerical techniques. Some examples are also given to validate the theoretical results and the effectiveness of the proposed robust ILC scheme.

[ 可能符合您检索需要的词汇 ]

本文针对移动机器人轨迹跟踪控制问题的研究,提出了一种基于移动机器人运动模型的模糊开闭环PID-P型非线性离散迭代学习控制方法,给出了PID-P型迭代学习的收敛条件及其证明过程,并采用模糊控制的原理整定PID三个学习增益矩阵的参数.该控制方法提高了移动机器人对特定轨迹的重复跟踪能力,具有算法实现简单的特点.实验仿真结果表明,采用模糊开闭环PID-P型迭代学习控制算法对轨迹跟踪是可行有效的.
Through the research of mobile robot trajectory tracking control problem ,this paper proposes fuzzy PID-P type open-closed loop nonlinear discrete iterative learning control (ILC) method based on the kinematic model of mobile robot .The PID-P type iterative learning convergence condition and certification procedure are presented .The fuzzy control principle is used to set PID parameters of three learning gain matrixes .This control method improves the ability of tracking repeatly specific trajectory for mobile robot ,and has the characteristics of simple algorithm .Simulation results show that the use of fuzzy open-closed-loop PID-P type iterative learning control algorithm for trajectory tracking is feasible and effective .

[ 可能符合您检索需要的词汇 ]

针对环境模拟实验室风环境模拟的阵风特性,设计开发了一种迭代学习自校正PID控制器。该控制器将设定值序列进行迭代学习优化,系统内部采用PID自校正控制策略。仿真和实验结果表明,两种控制方式的结合,可改善系统设定点改变时的动态特性,实现了对期望风环境参数的有效控制。迭代学习控制的作用下,特别是对正弦风的模拟,随着时间推移实际值可跟踪定值,保证了环境模拟的控制精度。
According to the characteristics of gust wind environment simulation in environmental simulation laboratory, an iterative learning self tuning PID controller is designed and developed. The controller optimizes the value set points sequence by iterative learning method. The system adopts PID self-tuning control strategy. The simulation and experimental results show that, with the two kinds of control methods, which can improve the system dynamic characteristic result from set point change, the effective control of the expected environmental parameters is realized. In iterative learning control function, especially for the simulation of the sinusoidal wind, the actual value can track the set value, and the control precision of wind environment simulation is guaranteed.

[ 可能符合您检索需要的词汇 ]

提出一类非线性不确定动态系统基于强化学习的最优控制方法.该方法利用欧拉强化学习算法估计对象的未知非线性函数,给出了强化学习中回报函数和策略函数迭代的在线学习规则.通过采用向前欧拉差分迭代公式对学习过程中的时序误差进行离散化,实现了对值函数的估计和控制策略的改进.基于值函数的梯度值和时序误差指标值,给出了该算法的步骤和误差估计定理.小车爬山问题的仿真结果表明了所提出方法的有效性.
An optimal control based on Euler reinforcement learning(ERL) is proposed for a class of nonlinear uncertain dynamic systems. In this method, the reinforcement learning algorithm is employed to approximate unknown nonlinear functions in the plant, and the online learning rule for the reward function and the policy function is derived. The value function is estimated and the control policy is improved by using the way of implementing the temporal difference(TD) errors which are discretized by using the forward Euler approximation of time derivative. Based on the value-gradient and TD error performance index, the steps of the algorithm and error estimation theorem are given. Simulation results for the mountain-car problem show the effectiveness of the presented method.

[ 可能符合您检索需要的词汇 ]