Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Optimal and suboptimal algorithms in set membership identification

Optimal and suboptimal algorithms in set membership identification Mathematical and Computer Modelling of Dynamical Systems Vol. 11, No. 2, June 2005, 159 – 169 Optimal and Suboptimal Algorithms in Set Membership Identification BOLESŁAW KACEWICZ Department of Applied Mathematics, AGH University of Science and Technology, Al. Mickiewicza 30, A3/A4, 301, 30-059 Cracow, Poland. We discuss in this paper optimality properties of identification algorithms in a set membership framework. We deal with restricted-complexity (conditional) identification, where approxima- tions (models) to a possibly complex system are selected from a low dimensional space. We discuss the worst- and average-case settings. In the worst-case setting, we present results on optimality, or suboptimality, of algorithms based on computing the unconditional or conditional Chebyshev centres of an uncertainty set. In the average-case setting, we show that the optimal algorithm is given by the projection of the unconditional Chebyshev centre. We show explicit formulas for its average errors, allowing us to see the contribution of all problem parameters to the minimal error. We discuss the case of weighted average errors corresponding to non-uniform distributions over uncertainty sets, and show how the weights influence the minimal identification error. Keywords: Identification, worst-case setting, average-case setting, weights, optimal algorithm, suboptimal algorithm. 1. Introduction The idea behind the set membership approach is to use a priori knowledge about the problem being solved, and the measured (or computed, or observed) information about unknown quantities to define a feasible set of elements consistent with the available information. A number of results is devoted to the characterization of feasible sets in system identification, see [1 – 13] and the references therein. In many cases, such as, e.g., in restricted-complexity estimation, the desired estimate is often required to be a member of a set of given simple structure. In this case estimation is referred to as conditional [6]. In this paper, we present results obtained within a set membership framework in two settings: the worst-case and average-case settings. In the worst-case setting, the error of *Email: kacewicz@uci.agh.edu.pl Mathematical and Computer Modelling of Dynamical Systems ISSN 1387-3954 print/ISSN 1744-5051 online ª 2005 Taylor & Francis Group Ltd http://www.tandf.co.uk/journals DOI: 10.1080/13873950500068575 160 B. Kacewicz an algorithm is measured by the worst performance for problems consistent with given information, see e.g., [1 – 6], [9 – 11]. A conservative point of view is thus chosen in the worst-case analysis: exceptionally difficult elements, even if they appear rarely, are taken into account. The optimal algorithm in this setting is defined by the conditional Chebyshev centre of the feasible set. The conditional Chebyshev centre is in general difficult to compute, even for sets of a simple structure [3]. For this reason, we turn our attention to suboptimal, but easy-to-compute estimators. To this end, we discuss, based on [2], the reliability level of the projection algorithm. Compared to the worst-case approach, the average-case setting is less conservative. The error of an algorithm is now defined on the average, which excludes outliers [7]. The optimal algorithm turns out to be the projection algorithm whose suboptimality has been established in the worst-case setting. We provide formulas for its two local average errors, which explicitly exhibit the contribution of the factors such as information, measurement errors, and restrictions imposed on the models. In addition to the uniform distribution over uncertainty sets considered in [7], we discuss, based on [8], weighted average errors for a class of weight functions. The results are given explaining the contribution of the weights to the minimal errors. The contribution is through certain integral parameters that can be computed, or estimated, for a given selection of the weights. An example is given illustrating what the estimators and the errors look like for a particular identification problem. 2. Problem formulation We wish to identify a system x belonging to the system space X = R . We look for a model (an approximation) to x, having at our disposal information about x given by a vector of measurements y which belongs to the information space Y = R , with m 5 n. The spaces X and Y are equipped with norms jjxjj = jjxjj and jjyjj =jjAyjj, where A X Y is a non-singular m6 m matrix, and jj jj denotes the Euclidean norm in X or Y, with qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi jjcjj ¼ c , for any vector c2 R with components c . Given a full rank m6 n i¼1 i matrix U (called the information matrix), the vector y is given as a perturbed value of Ux, jjy  Uxjj  E; ð1Þ where e4 0 is the accuracy of computing (or measuring) information. For example, the information vector may be obtained as follows. For the input sequence u=[u ,u , ..., u ] with u 6¼ 0, the components of y can be given as output 0 1 m–1 0 measurements related to u by y ¼ x u þ e ; l ¼ 1; 2; .. . ; m; ð2Þ l k lk l k¼1 where e is an error corrupting the output (we set x = 0 for k4 n). Some l k normalization condition is imposed on the sequence u, saying, for instance, that a norm of u does not exceed 1. Hence, y ¼ Ux þ e; ð3Þ where e=[e ,e ,... e ] is assumed to satisfy jjejj 4e. The matrix U in this case is the 1 2 m Y m6 n Toeplitz-type matrix Optimal algorithms in identification 161 2 3 u 00 ... 0 6 7 u u 0 ... 0 1 0 6 7 6 7 u u u ... 0 2 1 0 6 7 6 . . . . 7 . . . . 6 7 . . . . 6 7 U ¼ : ð4Þ 6 7 u u ... .. . u n1 n2 0 6 7 6 7 u u ... .. . u n n1 1 6 7 6 . . . . 7 . . . . 4 5 . . . . u u ... .. . u m1 m2 mn Based on y, we compute an approximation to x using an algorithm, by which we mean any mapping f : Y !M ; where M is a given subset of X. An approximation to x is provided by the vector f(y)2M.If M is a proper subset of the system space X, then identification is called conditional; see, e.g., [2], [6]. The choice of M is usually aimed at restricting the complexity of the approximating models. Identification is unconditional if M = X. 3. Worst-case setting In this setting, motivated by the aim of constructing robust algorithms, the behaviour of an approximant is studied for the worst system compatible with computed information. For a given measurement y, the error of an algorithm f, the y-local worst- case error, is defined by worst e ðfÞ¼ sup jjx  fðyÞjj ; ð5Þ y X x2E where the set E (assumed non-empty) contains all systems consistent with information y, E ¼fx 2 X : jjUx  yjj  Eg : ð6Þ The error (5) measures the worst uncertainty that an algorithm may produce for a given measurement y. If the estimate selection is unconstrained, the best choice of z = f(y) is given by the Chebyshev centre of the set E . (This remains true for an arbitrary set E , not g g necessarily given by (6), and any norm in X.) Indeed, the Chebyshev centre of E is defined as an element c(E ) that minimizes the maximal distance from elements of E g g among all z in X, cðE Þ¼ arg min max kz  xk: ð7Þ z2X x2E The algorithm defined by f(y)= c(E ) is optimal, and its worst-case error is equal to the geometric radius of E , worst e ðfÞ¼ sup jjx  cðE Þjj ð¼ radðE ÞÞ : ð8Þ y y y X x2E y 162 B. Kacewicz In many cases of interest, the Chebyshev centre is readily available. For instance, the uncertainty set E in (6) is an ellipsoid in R with the centre (Chebyshev centre) given by T 1 T cðE Þ¼ ðU UÞ U y  , where U ¼ AU and y  ¼ Ay . Computational aspects of the optimality problem are more complicated for conditional identification, even for simple uncertainty sets. The optimal algorithm, opt denoted by f , is now defined by the conditional Chebyshev centre, a minimizer selected among admissible models in the set M: cond c ðE Þ¼ arg min max kz  xk: ð9Þ z2M x2E If a linear parametrization of the selected estimate is desired, then the set M is defined as a linear p-dimensional manifold in R (p 4 n). The computation of the conditional Chebyshev centre in this case, even for such a simple set as an ellipsoid, is not a cond straightforward task; see [3] for an iterative method for finding c (E ). A simple idea to overcome the difficulties with computing the conditional Chebyshev centre is to take the optimal unconditional estimate given by the unconditional Chebyshev centre c(E ) (which is easy to compute for many sets E ), and to find its g g projection onto the manifold M. The algorithm obtained in this way is denoted by proj f . It is quite obvious that this algorithm is in general not optimal, and the question is: how much do we lose with respect to the optimal one? Is the projection algorithm suboptimal within a reasonable margin? The answer to these questions is given in the following result [2], which holds for much more general uncertainty sets than ellipsoids. We only assume that: the set of feasible elements is a compact subset of R ; ð10Þ and that M is a p-dimensional linear manifold, n 0 1 p M¼ fx 2 R : x ¼ z þ a z þ þ a z g; ð11Þ 1 p i n i T j i where a 2 R, z 2 R ,(z ) z = 0 for i, j=0, 1, ..., p, i 6¼ j and jjz jj = 1 for i = 1, 2,. . .,p. In [2], the following theorem has been shown. Theorem 1 If (10) and (11) hold, then pffiffiffiffiffiffiffiffi worst proj worst opt e ðf Þ 4=3 e ðf Þ: ð12Þ y y pffiffiffiffiffiffiffiffi Furthermore, the constant 4=3 cannot be improved. The theorem ensures that the projection algorithm, although not optimal, is pffiffiffiffiffiffiffiffi suboptimal within a reasonable constant of 4=3 , i.e., less than 16% of accuracy is lost with respect to the accuracy of the optimal algorithm. This holds independently of the dimensions n and p. The second statement of the theorem follows from the pffiffiffi pffiffiffi following example [2]. Let n=2, E ¼fðx ; x Þ2 R : x ¼ x = 2 þ 1; jx j 2 g , y 1 2 2 1 1 and M={(x ,x )2 R :x = 0}. Then c(E ) = (0, 1), the projection of it onto M is (0, 1 2 2 g pffiffiffi pffiffiffi cond 0), and c ðE Þ¼ð1= 2; 0Þ. The error of the projection algorithm is equal to 6 , pffiffiffi pffiffiffiffiffiffiffiffi while the optimal error is 3= 2. The ratio is exactly 4=3. 4. Average-case setting The worst-case analysis is necessary when robust approximations are of interest. If, however, robustness is not our main concern, the worst-case error criterion may be Optimal algorithms in identification 163 viewed as conservative. In [7] a less conservative error measure has been analysed, which reflects the average performance of identification algorithms. We remain in the set membership framework, and assume, similarly to the worst- case setting, that systems consistent with a measurement y belong to the set E given in (6). The basic quantity is now the y-local average error of an algorithm defined by vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi aver u 2 e ðfÞ¼ jjx  fðyÞjj dx; ð13Þ y X jE j where jAj is the Lebesgue measure of a set A and ‘ ’ is the Lebesgue integral in R .In contrast to (5), the error of an algorithm is now measured by its mean square (not the worst-case) behaviour in E . In addition to (13), the other error measure has also been considered in [7]. It is sometimes desirable to assess the behaviour of an algorithm for a fixed system x (not aver for a fixed measurement y). Based on the errors fe ðfÞg for x, we define the x-local average error, which describes the behaviour of an algorithm in the set of all measurements consistent with a fixed system x, E ¼fy 2 Y : jjy  Uxjj  Eg: The x-local average error is given by vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi x; aver aver e ðfÞ¼ t e ðfÞ dy; ð14Þ x y jE j for a measurable function f. The following result provides the formula for the y-local error of an arbitrary algorithm f, see [7]. Let K ={t2 R : jj t jj 4 1} be the unit ball in the standard Euclidean norm, let qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi T T 2 T E ¼ E  y  ðI  UðU UÞ U Þy (for E 6¼;, the real non-negative number e is well defined and e 4e), and let B be any y y y n6 n matrix such that B B ¼ U U . Theorem 2 For any M R and f, T T 2 1 2 y 2 aver 1 e ðfÞ ¼jjfðyÞðU UÞ U y jj þ jjB tjj dt: ð15Þ jK j opt Hence, the optimal algorithm f is defined by T T opt 1 f ðyÞ¼ the projection of ðU UÞ U y  ontoMð16Þ (if the projection is well defined). This is the projection of the Chebyshev centre of E onto M. Unlike in the worst-case setting, the optimal algorithm in the average-case setting coincides with the projection algorithm. Its error is given in the following result [7]. 164 B. Kacewicz Theorem 3 2 E T T y T opt 2 1 1 aver e ðf Þ ¼ dist ðU UÞ U y ; M þ trðU UÞ : ð17Þ n þ 2 For a linear manifold M in (11), the x-local error of the optimal algorithm is given in 1 p the next result, see [7]. Let M=[z , ..., z be the orthogonal matrix with column vectors that span the space M. Theorem 4 Let M be given by (11). Then 2 2 1 x;aver opt e ðf Þ ¼ distðx;MÞ þ ðtrðU UÞ m þ 2 ð18Þ T T þ tr ðI  MM ÞðU UÞ ðI  MM Þ Þ: This formula explicitly shows the influence on the minimal error of the factors such as the number of measurements m, the information matrix U, the accuracy of measuring information e, the matrix M representing the subspace M, and the unmodelled dynamics term dist(x,M). For instance, we can now formulate, in algebraic language, criteria for the optimal selection of information matrix or the subspace M to minimize the error of the optimal algorithm, see [7]. 5. Average-case setting with weights The average errors considered in the previous section are defined under the assumption that the distribution of elements in the feasible sets E and E is uniform. This is not a bad assumption when no additional information concerning the distribution is available. In many cases, however, such knowledge is available, and other distributions of measurements and problem elements, such as, for instance, Gaussian distribution, may better describe the problem. The question arises: how far do the results on optimal estimation given in the previous section rely on the assumption of a uniform distribution? The results given in this section, based on [8], provide the answer for a wide class of distributions. Let us generalize the notion of the average error, and define the weighted average y- local error. Assume for a fixed y that a non-negative Lebesgue integrable function p :E ?R is given such that m ðE Þ¼ p ðxÞdx > 0. The weighted y-local average y y y y error of an algorithm f is defined by vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u 2 aver e ðfÞ¼ jjx  fðyÞjj p ðxÞdx: ð19Þ y X m ðE Þ This error criterion reflects the average behaviour of an algorithm with respect to a weighted Lebesgue measure concentrated on E . To define the weighted x-local error, we assume that a non-negative Lebesgue x x x x integrable function q :E ?R is given such that m ðE Þ¼ x q ðyÞdy > 0. For a measurable function f, we define the weighted x-local average error of f by Optimal algorithms in identification 165 vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi x;aver aver x e ðfÞ¼ t e ðfÞ q ðyÞdy: ð20Þ x y m ðE Þ Weight functions p and q under consideration will be defined by transferring some basic functions p ^ and q ^ to proper sets as follows. Let p ^ : K ! R be a non-negative Lebesgue integrable function with k ðp ^Þ¼ p ^ðtÞ dt > 0 . To define p , we express the set E in equivalent form. Recall that an n6 n non-singular matrix B is such that T T T T B B ¼ U U . Defining b ¼ðB Þ U y  , we have that jjUx – yjj 4e iff jjBx – bjj4e , y y where e is defined in the previous section. Since we wish to study uncertain information, for which the unique recovery of x from the knowledge of y is not possible, we assume that E is not a singleton, i.e., e 4 0. We define the weight function p by y y y Bx  b p ðxÞ¼ p ^ for x 2E : ð21Þ y y The function p (x) depends on the matrix B; a possible choice is B ¼ðU UÞ . We shall see that the optimal algorithm is independent of the selection of the matrix B. For a wide class of weight functions the same holds true for the optimal error. Similarly to p , we define the function q by x x q ðyÞ¼ q Aðy  UxÞ ; y 2E ; ð22Þ where q ^ : K ! R is a non-negative integrable function with k ðq ^Þ¼ q ^ðtÞdt > 0: m m The choice of weight functions is not an easy task. Selecting the weights provides in fact additional a priori information about the problem, which should reflect our knowledge about it. A proper weight selection allows us to include relevant a priori information into the problem formulation. For a discussion of the problem of measure selection in a general average setting one is referred to [12], p. 68. We shall see that in many interesting cases the optimal algorithm is independent of the weights. The error of the optimal algorithm is weight-dependent, and the dependence will be explicitly shown in terms of certain parameters that are easy to compute for any selection of the weights. We now restrict ourselves to isotropic weight functions; for a discussion of the more ^ ^ general case one is referred to [8]. We assume that basic weight functions p and q take constant values on spheres in the respective spaces. Letting w and w be real non- negative continuous functions on [0,1], we define 2 2 p ^ðtÞ¼ wðjjtjj Þ for t 2 K and q ^ðtÞ¼ w ðjjtjj Þ for t 2 K : ð23Þ n m First of all, it is possible to show [8] that, similarly to the case of uniform distributions, opt the optimal algorithm f is given by the projection of the least squares approximation on M, T 1 T opt f ðyÞ¼ the projection of ðU UÞ U y  onto M ; ð24Þ provided that the projection is well defined. (This holds for any symmetric weight opt function p ^ .) Hence, f does not depend on the weights. The following result shown in 166 B. Kacewicz [8] gives explicit formulas for the errors of the optimal algorithm, and explains the influence of the weights. Let a and a be parameters depending on the weights, given by n m ðn þ 2ÞI nþ1 2 n a ¼ ðn  1Þ; where I ¼ I ðwÞ¼ wðr Þr dr : n n n nI n1 A similar formula describes the numbers a , with w replaced by w . The following theorem holds. Theorem 5 For weight functions given by (23), we have T T T 2 1 2 n 1 aver opt 2 e ðf Þ ¼ distððU UÞ U y ;MÞ þ E trðU UÞ ; ð25Þ y y n þ 2 and 1  a a m m T opt 2 2 1 x;aver 2 e ðf Þ ¼ distðx;MÞ þ E a þ trðU UÞ n þ 2 m þ 2 ð26Þ m T 1 2 T T þ E tr ðI  MM ÞðU UÞ ðI  MM Þ : m þ 2 The errors of the optimal algorithm depend on the weights through the parameters a and a that can be computed (or estimated) for each particular choice of the n m weights. Bounds on these parameters for three classes of weight functions – continuous, differentiable and truncated Gaussian functions – can be found in [8]. The case of uniform distributions over E and E corresponds to p ^ðtÞ 1 and q ^ðtÞ 1 , that is, to w(t):1 and w ðtÞ 1. Hence, in this case a = 1 and a ¼ 1 , and (25) and (26) agree with (17) and (18). Note that for isotropic weight functions the error depends on the matrix U , but is independent of the choice of the matrix B in (21). Following [8], we illustrate the results given above for an exemplary problem. Example Let A = I , where I is the identity m6 m matrix, i.e., we consider the m m standard Euclidean norm in the space Y. Let information be given by k repeated measurements of all components of the system x. That is, the information matrix is given by U=(I , ... I ) , with I appearing k times, and exact information is the m6 1 n n n T T T vector Ux=[x , ... x ] , with x appearing k times. The total number of measurements m equals m=k n. Assume that we are able to measure components of x in such a way that the result 1 T k T T i obtained, an m6 1 vector y=[(y ) , ... (y ) ] with k vector components y , has mean squared error at most E , vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u X ð1=kÞ jjy  xjj  E : ð27Þ i¼1 The uncertainty set E thus has the form y Optimal algorithms in identification 167 i 2 2 E ¼f x : jjy  xjj  kEgð28Þ i¼1 pffiffiffi pffiffiffi 1=2 T T (so that E ¼ kE ). One can easily verify that U U = kI , B ¼ðU UÞ ¼ kI , and pffiffiffi b ¼ y = k. Hence, E can be written as i¼1 () 1 1 E ¼ x : jjx  y jj  pffiffiffiE ; ð29Þ y y i¼1 where k k X X 2 T 2 2 i i j E ¼ k E  ðk  1Þ jjy jj  ðy Þ y : i¼1 i;j¼1; i6¼j The optimal algorithm is given by the projection onto M of the arithmetic mean of y , opt 1 T T i f ðyÞ¼ the projection onto M of the vector ðU UÞ U y ¼ y : ð30Þ i¼1 We show what the formula (26) looks like for a particular choice of weight functions and a subspace M. Let n 5 2. Let the weights be defined by functions wðrÞ¼ w ðrÞ¼ 1  r , and M={x 2 R : x =...= x = 0}. Then I ðwÞ¼ 3 n n 4 nþ4 I ðw Þ¼ , and a ¼ a ¼ . We have that n n n ðnþ1Þðnþ5Þ nþ6 opt 2 2 x;aver e ðf Þ ¼ distðx;MÞ nðn þ 4Þ 2 m þ 4 þ E þ ð31Þ n þ 6 ðn þ 2Þðm þ 6Þ ðm þ 2Þðm þ 6Þ m þ 4 2 T T 1 T þ k E tr ðI  MM ÞðU UÞ ðI  MM Þ : n n ðm þ 2Þðm þ 6Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Since distðx;MÞ ¼ x (it is equal to zero if n = 2), and i¼3 n  2 T T 1 T tr ðI  MM ÞðU UÞ ðI  MM Þ ¼ ; n n opt the error of f can be expressed as x;aver opt 2 e ðf Þ ¼ x þ hðn; kÞ; ð32Þ i¼3 where 2nðn þ 4Þ hðn; kÞ¼ ðn þ 2Þðn þ 6Þðn þ 6=kÞ nðn þ 4Þðn þ 4=kÞ þ ð33Þ ðn þ 6Þðn þ 2=kÞðn þ 6=kÞ ðn  2Þðn þ 4=kÞ þ : ðn þ 2=kÞðn þ 6=kÞ 168 B. Kacewicz The formula (32) shows the dependence of the average error on E and k. The dependence of h(n,k)on k is not crucial. For a fixed n, when E goes to zero or k goes to infinity, the error tends to the unmodelled dynamics error x (whose appearance i¼3 i cannot be avoided since the identification is conditional), essentially as E =k . For further illustration, we specify the example above for a simple problem with exemplary data. We wish to identify a system x depending on three parameters, x ¼½x ; x ; x  (n = 3). We measure each component of x three times (k = 3) in such a 1 2 3 way that (27) holds, the precision of the measurement device being E ¼ 8 10 . Let the measured vectors be y ¼½5:035;2:955; 0:003 ; y ¼½5:003;3:02; 0:01 ; y ¼½4:993;2:867;0:005 : The least squares approximation is then given by ð1=3Þ y ¼½5:01033;2:94733;0:00267 : i¼1 For the subspace M defined as above, the optimal (on the average) approximation is defined by opt T f ðyÞ¼½5:01033;2:94733; 0 ; and has the y-local error aver opt 2 3 e ðf Þ ¼ 1:739 10 : The x-local error for an exemplary x consistent with the measurements y , given by x = [5, – 3, 0] , is equal to x;aver opt 2 3 e ðf Þ ¼ 2:599 10 opt 2 –3 (since x is in M, dist(x,M) = 0), while the distance jjx – f jj is 2.881 10 (as we see, the actual distance is here greater than the average distance). Let us finally note that the formula (32) suggests an optimization problem concerning the experiment design. For a given positive d, we wish to reduce the second term in (32) to be of order d (neglecting the influence of h(n,k)). In order to achieve this, we can perform, for any k, k (vector) measurements with accuracy pffiffiffi E ¼ kd. In reasonable models of experimental cost, the cost should increase with increasing number of experiments, k, and it should decrease if we relax the accuracy, i.e., if we increase E (this is the case when k grows). The question is to find an optimal k that minimizes the cost. 6. Conclusions We have considered the performance of conditional estimators in set membership identification in the worst-case and average-case settings. Optimal estimators in the Optimal algorithms in identification 169 worst-case setting are in general difficult to compute, and suboptimal estimators such as the projection estimator are often considered. We have shown that the error of the qffiffi projection algorithm never exceeds times the error of the optimal algorithm. In the average-case setting, the projection algorithm turns out to be optimal. Explicit formulas for its local errors are given, showing the influence of the problem parameters on the minimal identification error. The case of weighted average errors is discussed, and the influence of the weights is explicitly shown. Acknowledgment This research was partly supported by local AGH grant No. 10.420.03. References [1] Chernousko, F.L., 1994, State Estimation for Dynamic Systems (Boca Raton: CRC Press). [2] Garulli, A., Kacewicz, B., Vicino, A., Zappa, G., 1999, Reliability of Projection Algorithms in Conditional Estimation. Journal of Optimization Theory and Applications, 101, 1 – 14. [3] Garulli, A., Vicino, A. and Zappa, G., 1997, Conditional Central Algorithms for Worst-Case Estimation and Filtering. In Proc. 36th IEEE Conference on Decision and Control, San Diego. [4] Helmicki, A.J., Jacobson, C.A. and Nett, C.N., 1991, Control Oriented System Identification: A Worst- Case/Deterministic Approach in H . IEEE Transactions on Automatic Control, 36, 1163 – 1176. [5] Jacobson, C.A., Nett, C.N. and Partington, J.R., 1992, Worst Case System Identification in l : Optimal Algorithms and Error Bounds. Systems and Control Letters, 19, 419 – 424. [6] Kacewicz, B., Milanese, M. and Vicino, A., 1988, Conditionally Optimal Algorithms and Estimation of Reduced Order Models. Journal of Complexity, 4, 73 – 85. [7] Kacewicz, B., 2000, Optimal Average Case Estimation in Hilbert Norms. Mathematics of Control, Signals and Systems, 13, 347 – 359. [8] Kacewicz, B., 2003, Weighted Average Errors in Set-Membership Estimation. Mathematics of Control, Signals and Systems, 16, 238 – 253. [9] Ma¨ kila¨ , P. and Partington, J., 1999, On Robustness in System Identification. Automatica, 35, 907 – 916. [10] Milanese, M. and Vicino, A., 1993, Information Based Complexity and Nonparametric Worst-Case System Identification. Journal of Complexity, 9, 427 – 445. [11] Milanese, M., Norton, J., Piet-Lahanier, H., and Walter, E. (Eds), 1996, Bounding Approaches to System Identification (New York: Plenum Press). [12] Novak, E., 1988, Deterministic and Stochastic Error Bounds in Numerical Analysis, Lecture Notes in Mathematics 1349 (Berlin: Springer – Verlag). [13] Walter, E. (Ed.), 1990, Parameter Identifications with Error Bound. Special issue in: Mathematics and Computers in Simulation, 32 (5 – 6). http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Mathematical and Computer Modelling of Dynamical Systems Taylor & Francis

Optimal and suboptimal algorithms in set membership identification

Loading next page...
 
/lp/taylor-francis/optimal-and-suboptimal-algorithms-in-set-membership-identification-9Zteh70fIa

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Taylor & Francis
Copyright
Copyright Taylor & Francis Group, LLC
ISSN
1744-5051
eISSN
1387-3954
DOI
10.1080/13873950500068575
Publisher site
See Article on Publisher Site

Abstract

Mathematical and Computer Modelling of Dynamical Systems Vol. 11, No. 2, June 2005, 159 – 169 Optimal and Suboptimal Algorithms in Set Membership Identification BOLESŁAW KACEWICZ Department of Applied Mathematics, AGH University of Science and Technology, Al. Mickiewicza 30, A3/A4, 301, 30-059 Cracow, Poland. We discuss in this paper optimality properties of identification algorithms in a set membership framework. We deal with restricted-complexity (conditional) identification, where approxima- tions (models) to a possibly complex system are selected from a low dimensional space. We discuss the worst- and average-case settings. In the worst-case setting, we present results on optimality, or suboptimality, of algorithms based on computing the unconditional or conditional Chebyshev centres of an uncertainty set. In the average-case setting, we show that the optimal algorithm is given by the projection of the unconditional Chebyshev centre. We show explicit formulas for its average errors, allowing us to see the contribution of all problem parameters to the minimal error. We discuss the case of weighted average errors corresponding to non-uniform distributions over uncertainty sets, and show how the weights influence the minimal identification error. Keywords: Identification, worst-case setting, average-case setting, weights, optimal algorithm, suboptimal algorithm. 1. Introduction The idea behind the set membership approach is to use a priori knowledge about the problem being solved, and the measured (or computed, or observed) information about unknown quantities to define a feasible set of elements consistent with the available information. A number of results is devoted to the characterization of feasible sets in system identification, see [1 – 13] and the references therein. In many cases, such as, e.g., in restricted-complexity estimation, the desired estimate is often required to be a member of a set of given simple structure. In this case estimation is referred to as conditional [6]. In this paper, we present results obtained within a set membership framework in two settings: the worst-case and average-case settings. In the worst-case setting, the error of *Email: kacewicz@uci.agh.edu.pl Mathematical and Computer Modelling of Dynamical Systems ISSN 1387-3954 print/ISSN 1744-5051 online ª 2005 Taylor & Francis Group Ltd http://www.tandf.co.uk/journals DOI: 10.1080/13873950500068575 160 B. Kacewicz an algorithm is measured by the worst performance for problems consistent with given information, see e.g., [1 – 6], [9 – 11]. A conservative point of view is thus chosen in the worst-case analysis: exceptionally difficult elements, even if they appear rarely, are taken into account. The optimal algorithm in this setting is defined by the conditional Chebyshev centre of the feasible set. The conditional Chebyshev centre is in general difficult to compute, even for sets of a simple structure [3]. For this reason, we turn our attention to suboptimal, but easy-to-compute estimators. To this end, we discuss, based on [2], the reliability level of the projection algorithm. Compared to the worst-case approach, the average-case setting is less conservative. The error of an algorithm is now defined on the average, which excludes outliers [7]. The optimal algorithm turns out to be the projection algorithm whose suboptimality has been established in the worst-case setting. We provide formulas for its two local average errors, which explicitly exhibit the contribution of the factors such as information, measurement errors, and restrictions imposed on the models. In addition to the uniform distribution over uncertainty sets considered in [7], we discuss, based on [8], weighted average errors for a class of weight functions. The results are given explaining the contribution of the weights to the minimal errors. The contribution is through certain integral parameters that can be computed, or estimated, for a given selection of the weights. An example is given illustrating what the estimators and the errors look like for a particular identification problem. 2. Problem formulation We wish to identify a system x belonging to the system space X = R . We look for a model (an approximation) to x, having at our disposal information about x given by a vector of measurements y which belongs to the information space Y = R , with m 5 n. The spaces X and Y are equipped with norms jjxjj = jjxjj and jjyjj =jjAyjj, where A X Y is a non-singular m6 m matrix, and jj jj denotes the Euclidean norm in X or Y, with qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi jjcjj ¼ c , for any vector c2 R with components c . Given a full rank m6 n i¼1 i matrix U (called the information matrix), the vector y is given as a perturbed value of Ux, jjy  Uxjj  E; ð1Þ where e4 0 is the accuracy of computing (or measuring) information. For example, the information vector may be obtained as follows. For the input sequence u=[u ,u , ..., u ] with u 6¼ 0, the components of y can be given as output 0 1 m–1 0 measurements related to u by y ¼ x u þ e ; l ¼ 1; 2; .. . ; m; ð2Þ l k lk l k¼1 where e is an error corrupting the output (we set x = 0 for k4 n). Some l k normalization condition is imposed on the sequence u, saying, for instance, that a norm of u does not exceed 1. Hence, y ¼ Ux þ e; ð3Þ where e=[e ,e ,... e ] is assumed to satisfy jjejj 4e. The matrix U in this case is the 1 2 m Y m6 n Toeplitz-type matrix Optimal algorithms in identification 161 2 3 u 00 ... 0 6 7 u u 0 ... 0 1 0 6 7 6 7 u u u ... 0 2 1 0 6 7 6 . . . . 7 . . . . 6 7 . . . . 6 7 U ¼ : ð4Þ 6 7 u u ... .. . u n1 n2 0 6 7 6 7 u u ... .. . u n n1 1 6 7 6 . . . . 7 . . . . 4 5 . . . . u u ... .. . u m1 m2 mn Based on y, we compute an approximation to x using an algorithm, by which we mean any mapping f : Y !M ; where M is a given subset of X. An approximation to x is provided by the vector f(y)2M.If M is a proper subset of the system space X, then identification is called conditional; see, e.g., [2], [6]. The choice of M is usually aimed at restricting the complexity of the approximating models. Identification is unconditional if M = X. 3. Worst-case setting In this setting, motivated by the aim of constructing robust algorithms, the behaviour of an approximant is studied for the worst system compatible with computed information. For a given measurement y, the error of an algorithm f, the y-local worst- case error, is defined by worst e ðfÞ¼ sup jjx  fðyÞjj ; ð5Þ y X x2E where the set E (assumed non-empty) contains all systems consistent with information y, E ¼fx 2 X : jjUx  yjj  Eg : ð6Þ The error (5) measures the worst uncertainty that an algorithm may produce for a given measurement y. If the estimate selection is unconstrained, the best choice of z = f(y) is given by the Chebyshev centre of the set E . (This remains true for an arbitrary set E , not g g necessarily given by (6), and any norm in X.) Indeed, the Chebyshev centre of E is defined as an element c(E ) that minimizes the maximal distance from elements of E g g among all z in X, cðE Þ¼ arg min max kz  xk: ð7Þ z2X x2E The algorithm defined by f(y)= c(E ) is optimal, and its worst-case error is equal to the geometric radius of E , worst e ðfÞ¼ sup jjx  cðE Þjj ð¼ radðE ÞÞ : ð8Þ y y y X x2E y 162 B. Kacewicz In many cases of interest, the Chebyshev centre is readily available. For instance, the uncertainty set E in (6) is an ellipsoid in R with the centre (Chebyshev centre) given by T 1 T cðE Þ¼ ðU UÞ U y  , where U ¼ AU and y  ¼ Ay . Computational aspects of the optimality problem are more complicated for conditional identification, even for simple uncertainty sets. The optimal algorithm, opt denoted by f , is now defined by the conditional Chebyshev centre, a minimizer selected among admissible models in the set M: cond c ðE Þ¼ arg min max kz  xk: ð9Þ z2M x2E If a linear parametrization of the selected estimate is desired, then the set M is defined as a linear p-dimensional manifold in R (p 4 n). The computation of the conditional Chebyshev centre in this case, even for such a simple set as an ellipsoid, is not a cond straightforward task; see [3] for an iterative method for finding c (E ). A simple idea to overcome the difficulties with computing the conditional Chebyshev centre is to take the optimal unconditional estimate given by the unconditional Chebyshev centre c(E ) (which is easy to compute for many sets E ), and to find its g g projection onto the manifold M. The algorithm obtained in this way is denoted by proj f . It is quite obvious that this algorithm is in general not optimal, and the question is: how much do we lose with respect to the optimal one? Is the projection algorithm suboptimal within a reasonable margin? The answer to these questions is given in the following result [2], which holds for much more general uncertainty sets than ellipsoids. We only assume that: the set of feasible elements is a compact subset of R ; ð10Þ and that M is a p-dimensional linear manifold, n 0 1 p M¼ fx 2 R : x ¼ z þ a z þ þ a z g; ð11Þ 1 p i n i T j i where a 2 R, z 2 R ,(z ) z = 0 for i, j=0, 1, ..., p, i 6¼ j and jjz jj = 1 for i = 1, 2,. . .,p. In [2], the following theorem has been shown. Theorem 1 If (10) and (11) hold, then pffiffiffiffiffiffiffiffi worst proj worst opt e ðf Þ 4=3 e ðf Þ: ð12Þ y y pffiffiffiffiffiffiffiffi Furthermore, the constant 4=3 cannot be improved. The theorem ensures that the projection algorithm, although not optimal, is pffiffiffiffiffiffiffiffi suboptimal within a reasonable constant of 4=3 , i.e., less than 16% of accuracy is lost with respect to the accuracy of the optimal algorithm. This holds independently of the dimensions n and p. The second statement of the theorem follows from the pffiffiffi pffiffiffi following example [2]. Let n=2, E ¼fðx ; x Þ2 R : x ¼ x = 2 þ 1; jx j 2 g , y 1 2 2 1 1 and M={(x ,x )2 R :x = 0}. Then c(E ) = (0, 1), the projection of it onto M is (0, 1 2 2 g pffiffiffi pffiffiffi cond 0), and c ðE Þ¼ð1= 2; 0Þ. The error of the projection algorithm is equal to 6 , pffiffiffi pffiffiffiffiffiffiffiffi while the optimal error is 3= 2. The ratio is exactly 4=3. 4. Average-case setting The worst-case analysis is necessary when robust approximations are of interest. If, however, robustness is not our main concern, the worst-case error criterion may be Optimal algorithms in identification 163 viewed as conservative. In [7] a less conservative error measure has been analysed, which reflects the average performance of identification algorithms. We remain in the set membership framework, and assume, similarly to the worst- case setting, that systems consistent with a measurement y belong to the set E given in (6). The basic quantity is now the y-local average error of an algorithm defined by vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi aver u 2 e ðfÞ¼ jjx  fðyÞjj dx; ð13Þ y X jE j where jAj is the Lebesgue measure of a set A and ‘ ’ is the Lebesgue integral in R .In contrast to (5), the error of an algorithm is now measured by its mean square (not the worst-case) behaviour in E . In addition to (13), the other error measure has also been considered in [7]. It is sometimes desirable to assess the behaviour of an algorithm for a fixed system x (not aver for a fixed measurement y). Based on the errors fe ðfÞg for x, we define the x-local average error, which describes the behaviour of an algorithm in the set of all measurements consistent with a fixed system x, E ¼fy 2 Y : jjy  Uxjj  Eg: The x-local average error is given by vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi x; aver aver e ðfÞ¼ t e ðfÞ dy; ð14Þ x y jE j for a measurable function f. The following result provides the formula for the y-local error of an arbitrary algorithm f, see [7]. Let K ={t2 R : jj t jj 4 1} be the unit ball in the standard Euclidean norm, let qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi T T 2 T E ¼ E  y  ðI  UðU UÞ U Þy (for E 6¼;, the real non-negative number e is well defined and e 4e), and let B be any y y y n6 n matrix such that B B ¼ U U . Theorem 2 For any M R and f, T T 2 1 2 y 2 aver 1 e ðfÞ ¼jjfðyÞðU UÞ U y jj þ jjB tjj dt: ð15Þ jK j opt Hence, the optimal algorithm f is defined by T T opt 1 f ðyÞ¼ the projection of ðU UÞ U y  ontoMð16Þ (if the projection is well defined). This is the projection of the Chebyshev centre of E onto M. Unlike in the worst-case setting, the optimal algorithm in the average-case setting coincides with the projection algorithm. Its error is given in the following result [7]. 164 B. Kacewicz Theorem 3 2 E T T y T opt 2 1 1 aver e ðf Þ ¼ dist ðU UÞ U y ; M þ trðU UÞ : ð17Þ n þ 2 For a linear manifold M in (11), the x-local error of the optimal algorithm is given in 1 p the next result, see [7]. Let M=[z , ..., z be the orthogonal matrix with column vectors that span the space M. Theorem 4 Let M be given by (11). Then 2 2 1 x;aver opt e ðf Þ ¼ distðx;MÞ þ ðtrðU UÞ m þ 2 ð18Þ T T þ tr ðI  MM ÞðU UÞ ðI  MM Þ Þ: This formula explicitly shows the influence on the minimal error of the factors such as the number of measurements m, the information matrix U, the accuracy of measuring information e, the matrix M representing the subspace M, and the unmodelled dynamics term dist(x,M). For instance, we can now formulate, in algebraic language, criteria for the optimal selection of information matrix or the subspace M to minimize the error of the optimal algorithm, see [7]. 5. Average-case setting with weights The average errors considered in the previous section are defined under the assumption that the distribution of elements in the feasible sets E and E is uniform. This is not a bad assumption when no additional information concerning the distribution is available. In many cases, however, such knowledge is available, and other distributions of measurements and problem elements, such as, for instance, Gaussian distribution, may better describe the problem. The question arises: how far do the results on optimal estimation given in the previous section rely on the assumption of a uniform distribution? The results given in this section, based on [8], provide the answer for a wide class of distributions. Let us generalize the notion of the average error, and define the weighted average y- local error. Assume for a fixed y that a non-negative Lebesgue integrable function p :E ?R is given such that m ðE Þ¼ p ðxÞdx > 0. The weighted y-local average y y y y error of an algorithm f is defined by vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u 2 aver e ðfÞ¼ jjx  fðyÞjj p ðxÞdx: ð19Þ y X m ðE Þ This error criterion reflects the average behaviour of an algorithm with respect to a weighted Lebesgue measure concentrated on E . To define the weighted x-local error, we assume that a non-negative Lebesgue x x x x integrable function q :E ?R is given such that m ðE Þ¼ x q ðyÞdy > 0. For a measurable function f, we define the weighted x-local average error of f by Optimal algorithms in identification 165 vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi x;aver aver x e ðfÞ¼ t e ðfÞ q ðyÞdy: ð20Þ x y m ðE Þ Weight functions p and q under consideration will be defined by transferring some basic functions p ^ and q ^ to proper sets as follows. Let p ^ : K ! R be a non-negative Lebesgue integrable function with k ðp ^Þ¼ p ^ðtÞ dt > 0 . To define p , we express the set E in equivalent form. Recall that an n6 n non-singular matrix B is such that T T T T B B ¼ U U . Defining b ¼ðB Þ U y  , we have that jjUx – yjj 4e iff jjBx – bjj4e , y y where e is defined in the previous section. Since we wish to study uncertain information, for which the unique recovery of x from the knowledge of y is not possible, we assume that E is not a singleton, i.e., e 4 0. We define the weight function p by y y y Bx  b p ðxÞ¼ p ^ for x 2E : ð21Þ y y The function p (x) depends on the matrix B; a possible choice is B ¼ðU UÞ . We shall see that the optimal algorithm is independent of the selection of the matrix B. For a wide class of weight functions the same holds true for the optimal error. Similarly to p , we define the function q by x x q ðyÞ¼ q Aðy  UxÞ ; y 2E ; ð22Þ where q ^ : K ! R is a non-negative integrable function with k ðq ^Þ¼ q ^ðtÞdt > 0: m m The choice of weight functions is not an easy task. Selecting the weights provides in fact additional a priori information about the problem, which should reflect our knowledge about it. A proper weight selection allows us to include relevant a priori information into the problem formulation. For a discussion of the problem of measure selection in a general average setting one is referred to [12], p. 68. We shall see that in many interesting cases the optimal algorithm is independent of the weights. The error of the optimal algorithm is weight-dependent, and the dependence will be explicitly shown in terms of certain parameters that are easy to compute for any selection of the weights. We now restrict ourselves to isotropic weight functions; for a discussion of the more ^ ^ general case one is referred to [8]. We assume that basic weight functions p and q take constant values on spheres in the respective spaces. Letting w and w be real non- negative continuous functions on [0,1], we define 2 2 p ^ðtÞ¼ wðjjtjj Þ for t 2 K and q ^ðtÞ¼ w ðjjtjj Þ for t 2 K : ð23Þ n m First of all, it is possible to show [8] that, similarly to the case of uniform distributions, opt the optimal algorithm f is given by the projection of the least squares approximation on M, T 1 T opt f ðyÞ¼ the projection of ðU UÞ U y  onto M ; ð24Þ provided that the projection is well defined. (This holds for any symmetric weight opt function p ^ .) Hence, f does not depend on the weights. The following result shown in 166 B. Kacewicz [8] gives explicit formulas for the errors of the optimal algorithm, and explains the influence of the weights. Let a and a be parameters depending on the weights, given by n m ðn þ 2ÞI nþ1 2 n a ¼ ðn  1Þ; where I ¼ I ðwÞ¼ wðr Þr dr : n n n nI n1 A similar formula describes the numbers a , with w replaced by w . The following theorem holds. Theorem 5 For weight functions given by (23), we have T T T 2 1 2 n 1 aver opt 2 e ðf Þ ¼ distððU UÞ U y ;MÞ þ E trðU UÞ ; ð25Þ y y n þ 2 and 1  a a m m T opt 2 2 1 x;aver 2 e ðf Þ ¼ distðx;MÞ þ E a þ trðU UÞ n þ 2 m þ 2 ð26Þ m T 1 2 T T þ E tr ðI  MM ÞðU UÞ ðI  MM Þ : m þ 2 The errors of the optimal algorithm depend on the weights through the parameters a and a that can be computed (or estimated) for each particular choice of the n m weights. Bounds on these parameters for three classes of weight functions – continuous, differentiable and truncated Gaussian functions – can be found in [8]. The case of uniform distributions over E and E corresponds to p ^ðtÞ 1 and q ^ðtÞ 1 , that is, to w(t):1 and w ðtÞ 1. Hence, in this case a = 1 and a ¼ 1 , and (25) and (26) agree with (17) and (18). Note that for isotropic weight functions the error depends on the matrix U , but is independent of the choice of the matrix B in (21). Following [8], we illustrate the results given above for an exemplary problem. Example Let A = I , where I is the identity m6 m matrix, i.e., we consider the m m standard Euclidean norm in the space Y. Let information be given by k repeated measurements of all components of the system x. That is, the information matrix is given by U=(I , ... I ) , with I appearing k times, and exact information is the m6 1 n n n T T T vector Ux=[x , ... x ] , with x appearing k times. The total number of measurements m equals m=k n. Assume that we are able to measure components of x in such a way that the result 1 T k T T i obtained, an m6 1 vector y=[(y ) , ... (y ) ] with k vector components y , has mean squared error at most E , vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u X ð1=kÞ jjy  xjj  E : ð27Þ i¼1 The uncertainty set E thus has the form y Optimal algorithms in identification 167 i 2 2 E ¼f x : jjy  xjj  kEgð28Þ i¼1 pffiffiffi pffiffiffi 1=2 T T (so that E ¼ kE ). One can easily verify that U U = kI , B ¼ðU UÞ ¼ kI , and pffiffiffi b ¼ y = k. Hence, E can be written as i¼1 () 1 1 E ¼ x : jjx  y jj  pffiffiffiE ; ð29Þ y y i¼1 where k k X X 2 T 2 2 i i j E ¼ k E  ðk  1Þ jjy jj  ðy Þ y : i¼1 i;j¼1; i6¼j The optimal algorithm is given by the projection onto M of the arithmetic mean of y , opt 1 T T i f ðyÞ¼ the projection onto M of the vector ðU UÞ U y ¼ y : ð30Þ i¼1 We show what the formula (26) looks like for a particular choice of weight functions and a subspace M. Let n 5 2. Let the weights be defined by functions wðrÞ¼ w ðrÞ¼ 1  r , and M={x 2 R : x =...= x = 0}. Then I ðwÞ¼ 3 n n 4 nþ4 I ðw Þ¼ , and a ¼ a ¼ . We have that n n n ðnþ1Þðnþ5Þ nþ6 opt 2 2 x;aver e ðf Þ ¼ distðx;MÞ nðn þ 4Þ 2 m þ 4 þ E þ ð31Þ n þ 6 ðn þ 2Þðm þ 6Þ ðm þ 2Þðm þ 6Þ m þ 4 2 T T 1 T þ k E tr ðI  MM ÞðU UÞ ðI  MM Þ : n n ðm þ 2Þðm þ 6Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Since distðx;MÞ ¼ x (it is equal to zero if n = 2), and i¼3 n  2 T T 1 T tr ðI  MM ÞðU UÞ ðI  MM Þ ¼ ; n n opt the error of f can be expressed as x;aver opt 2 e ðf Þ ¼ x þ hðn; kÞ; ð32Þ i¼3 where 2nðn þ 4Þ hðn; kÞ¼ ðn þ 2Þðn þ 6Þðn þ 6=kÞ nðn þ 4Þðn þ 4=kÞ þ ð33Þ ðn þ 6Þðn þ 2=kÞðn þ 6=kÞ ðn  2Þðn þ 4=kÞ þ : ðn þ 2=kÞðn þ 6=kÞ 168 B. Kacewicz The formula (32) shows the dependence of the average error on E and k. The dependence of h(n,k)on k is not crucial. For a fixed n, when E goes to zero or k goes to infinity, the error tends to the unmodelled dynamics error x (whose appearance i¼3 i cannot be avoided since the identification is conditional), essentially as E =k . For further illustration, we specify the example above for a simple problem with exemplary data. We wish to identify a system x depending on three parameters, x ¼½x ; x ; x  (n = 3). We measure each component of x three times (k = 3) in such a 1 2 3 way that (27) holds, the precision of the measurement device being E ¼ 8 10 . Let the measured vectors be y ¼½5:035;2:955; 0:003 ; y ¼½5:003;3:02; 0:01 ; y ¼½4:993;2:867;0:005 : The least squares approximation is then given by ð1=3Þ y ¼½5:01033;2:94733;0:00267 : i¼1 For the subspace M defined as above, the optimal (on the average) approximation is defined by opt T f ðyÞ¼½5:01033;2:94733; 0 ; and has the y-local error aver opt 2 3 e ðf Þ ¼ 1:739 10 : The x-local error for an exemplary x consistent with the measurements y , given by x = [5, – 3, 0] , is equal to x;aver opt 2 3 e ðf Þ ¼ 2:599 10 opt 2 –3 (since x is in M, dist(x,M) = 0), while the distance jjx – f jj is 2.881 10 (as we see, the actual distance is here greater than the average distance). Let us finally note that the formula (32) suggests an optimization problem concerning the experiment design. For a given positive d, we wish to reduce the second term in (32) to be of order d (neglecting the influence of h(n,k)). In order to achieve this, we can perform, for any k, k (vector) measurements with accuracy pffiffiffi E ¼ kd. In reasonable models of experimental cost, the cost should increase with increasing number of experiments, k, and it should decrease if we relax the accuracy, i.e., if we increase E (this is the case when k grows). The question is to find an optimal k that minimizes the cost. 6. Conclusions We have considered the performance of conditional estimators in set membership identification in the worst-case and average-case settings. Optimal estimators in the Optimal algorithms in identification 169 worst-case setting are in general difficult to compute, and suboptimal estimators such as the projection estimator are often considered. We have shown that the error of the qffiffi projection algorithm never exceeds times the error of the optimal algorithm. In the average-case setting, the projection algorithm turns out to be optimal. Explicit formulas for its local errors are given, showing the influence of the problem parameters on the minimal identification error. The case of weighted average errors is discussed, and the influence of the weights is explicitly shown. Acknowledgment This research was partly supported by local AGH grant No. 10.420.03. References [1] Chernousko, F.L., 1994, State Estimation for Dynamic Systems (Boca Raton: CRC Press). [2] Garulli, A., Kacewicz, B., Vicino, A., Zappa, G., 1999, Reliability of Projection Algorithms in Conditional Estimation. Journal of Optimization Theory and Applications, 101, 1 – 14. [3] Garulli, A., Vicino, A. and Zappa, G., 1997, Conditional Central Algorithms for Worst-Case Estimation and Filtering. In Proc. 36th IEEE Conference on Decision and Control, San Diego. [4] Helmicki, A.J., Jacobson, C.A. and Nett, C.N., 1991, Control Oriented System Identification: A Worst- Case/Deterministic Approach in H . IEEE Transactions on Automatic Control, 36, 1163 – 1176. [5] Jacobson, C.A., Nett, C.N. and Partington, J.R., 1992, Worst Case System Identification in l : Optimal Algorithms and Error Bounds. Systems and Control Letters, 19, 419 – 424. [6] Kacewicz, B., Milanese, M. and Vicino, A., 1988, Conditionally Optimal Algorithms and Estimation of Reduced Order Models. Journal of Complexity, 4, 73 – 85. [7] Kacewicz, B., 2000, Optimal Average Case Estimation in Hilbert Norms. Mathematics of Control, Signals and Systems, 13, 347 – 359. [8] Kacewicz, B., 2003, Weighted Average Errors in Set-Membership Estimation. Mathematics of Control, Signals and Systems, 16, 238 – 253. [9] Ma¨ kila¨ , P. and Partington, J., 1999, On Robustness in System Identification. Automatica, 35, 907 – 916. [10] Milanese, M. and Vicino, A., 1993, Information Based Complexity and Nonparametric Worst-Case System Identification. Journal of Complexity, 9, 427 – 445. [11] Milanese, M., Norton, J., Piet-Lahanier, H., and Walter, E. (Eds), 1996, Bounding Approaches to System Identification (New York: Plenum Press). [12] Novak, E., 1988, Deterministic and Stochastic Error Bounds in Numerical Analysis, Lecture Notes in Mathematics 1349 (Berlin: Springer – Verlag). [13] Walter, E. (Ed.), 1990, Parameter Identifications with Error Bound. Special issue in: Mathematics and Computers in Simulation, 32 (5 – 6).

Journal

Mathematical and Computer Modelling of Dynamical SystemsTaylor & Francis

Published: Jun 1, 2005

Keywords: Identification; worst-case setting; average-case setting; weights; optimal algorithm; suboptimal algorithm

References