UFDC Home  Search all Groups  UF Institutional Repository  UF Institutional Repository  UF Theses & Dissertations  Vendor Digitized Files   Help 
Material Information
Subjects
Notes
Record Information

Full Text 
AN IMPROVED ALGORITHM FOR THE DETERMINATION OF THE SYSTEM PARAMETERS OF A VISUAL BINARY BY LEAST SQUARES By YULIN XU L_ A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1988 UNIVERSITY OF FLORIDA LIBRARIES To my mother and late father ACKNOWLEDGEMENTS The author acknowledges his heartfelt gratitude to Dr. Heinrich K. Eichhorn, his research advisor, for proposing this subject, the considerate guidance and encouragement throughout his research, and for the patience in reading and correcting the manuscript. The author has benefited in many ways as Dr. Eichhorn's student. The author is also grateful to Drs. KwanYu Chen, Haywood C. Smith, Frank Bradshaw Wood and Philip Bacon for having served as members of his supervisory committee and for helpful discussions, timely suggestions and the careful review of this dissertation. Likewise, his deep apprecia tion goes to Drs. W. D. Heintz and H. A. Macalister for having provided data used in his dissertation. The author is especially grateful to Drs. Jerry L. Weinberg and RuTsan Wang in the Space Astronomy Laboratory for their considerate encouragement and support. Without their support, the fulfillment of this research could not be possible. It is a great pleasure to acknowledge that all the calculations were performed on the Vax in the Space Astronomy Laboratory. iii TABLE OF CONTENTS page ACKNOWLEDGEMENTS....................................... iii LIST OF TABLES.......................................... vi LIST OF FIGURES..........................................viii ABSTRACT................................................. x CHAPTERS I 1 4 I INTRODUCTION.... ............... ............ II REVIEW AND REMARKS............................ Review of the Methods for OrbitComputation... Definitions of the Orbital Parameters.... The Method of M. Kowalsky and S. Glasenapp........................... The Method of T. N. Thiele, R. T. A. Innes and W. H. van den Bos............ The Method of Least Squares................... Remarks........................................ :II GENERAL CONSIDERATIONS........................ The Condition Equations....................... The General Statement of the Least Squares Orbit Problem.............................. About the Initial Approximate Solution........ IV THE SOLUTION BY NEWTON'S METHOD.............. The Solution by Newton's Method............... Weighting of Observations..................... The Orthogonal Set of Adjustment Parameters and the Efficiency of a Set of Orbital Parameters ....................................... A Practical Example........................... Remarks........................................ page V THE MODIFIED NEWTON SCHEME.................... 85 The Method of Steepest Descent................ 86 The Combination of Newton's Method with the Method of Steepest DescentThe Modified Newton Scheme............................... 92 The Application of Marquardt's Algorithm....... 98 Two Practical Examples ........................ 101 VI DISCUSSION................................... 134 REFERENCES............................................. 136 BIOGRAPHICAL SKETCH.................................... 138 LIST OF TABLES Table Page 41 Expressions for All the Partial Derivatives in f ............................................ 48 42 Expressions for All the Partial Derivatives in f ............................. ................. 53 43 The Observation Data for 51 Tau.................. 73 44 The Reduced Initial Data for 51 Tau.............. 74 45 The Initial Approximate Solution A0 for 51 Tau... 77 46 The Final Solution for 51 Tau.................... 78 47 The Residuals of the Observations for 51 Tau in (p,e) and (x,y)................................ 80 51 The Observation Data for 3738.................... 105 52 The Reduced Initial Data for P738................ 106 53 The Initial Approximate Solution A0 for p738..... 109 54 The Solution #1 for 3738....................... 110 55 The Residuals of the Observations for 3738 in (p,e) and (x,y) in Solution #1................ 112 56 Heintz' Result for P738 ........................ 116 57 The Solution #2 for 3738.................................. 117 58 The Residuals of the Observations for 3738 in (p,6) and (x,y) in Solution #2................ 119 59 The Observation Data for BD+1905116.............. 123 510 The Reduced Initial Data for BD+1905116......... 124 511 The Initial Approximate Solution A0 for BD+195116 ............................ ........... 127 Table page 512 The Final Solution for BD+1905116 by the MQ Method......................................... 128 513 The Residuals of the Observations for BD+1905116 in e,p,x and y................... ....... ....... 130 vii LIST OF FIGURES Figure Page 41 Plot of the observation data for 51 Tau in the x0Y0 plane................................. 75 42 Plot of a) the x0 b) y0coordinates against the observing epochs of the observation data for 51 Tau....................................... 76 43 The residuals of the observation for 51 Tau in (p,) ................................ ... 81 44 The residuals of the observations for 51 Tau in (x,y) ........................................ 82 45 The original observations for 51 Tau compared with the observations after correction.......... 83 51 Plot of the observation data for 3738 in the x0YO plane...................................... 107 52 Plot of a) the x0 b) y0coordinates against the observing epochs of the observation data for B738......................................... 108 53 The residuals of the observations for B738 in (p,8) according to the solution #1........... 113 54 The residuals of the observations for 1738 in (x,y) according to the solution #1.......... 114 55 The original observations of 3738 compared with the observations after correction according to the solution #1..................... 115 56 The residuals of the observations for p738 in (p,6) according to the solution #2........... 120 57 The residuals of the observations for 3738 in (x,y) according to the solution #2........... 121 viii Figure page 58 The original observations for p738 compared with the observations after correction according to the solution #2.................... 122 59 Plot of the observation data for BD+19*5116 in the x0YO plane.............................. 125 510 Plot of a) the x0 b) y0coordinates against the observing epochs of the observations for BD+1905116...................................... 126 511 The residuals of the observations for BD+1905116 in (p,8)............................. 131 512 The residuals of the observations for BD+1905116 in (x,y)............................. 132 513 The original observations for BD+1905116 compared with the observations after correction..................................... 133 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy AN IMPROVED ALGORITHM FOR THE DETERMINATION OF THE SYSTEM PARAMETERS OF A VISUAL BINARY BY LEAST SQUARES By YULIN XU April 1988 Chairman: Dr. Heinrich K. Eichhorn CoChairman: Dr. KwanYu Chen Major Department: Astronomy The problem of computing the orbit of a visual binary from a set of observed positions is reconsidered. It is a least squares adjustment problem, if the observational errors follow a biasfree multivariate Gaussian distribution and the covariance matrix of the observations is assumed to be known. The condition equations are constructed to satisfy both the conic section equation and the area theorem, which are nonlinear in both the observations and the adjustment parameters. The traditional least squares algorithm, which employs condition equations that are solved with respect to the uncorrelated observations and either linear in the adjustment parameters or linearized by developing them in Taylor series by firstorder approximation, is inadequate in our orbit problem. D. C. Brown proposed an algorithm solving a more general least squares adjustment problem in which the scalar residual function, however, is still constructed by firstorder approximation. Not long ago, a completely general solution was published by W. H. Jefferys, who proposed a rigorous adjustment algorithm for models in which the observations appear nonlinearly in the condition equations and may be correlated, and in which construction of the normal equations and the residual function involves no approximation. This method was successfully applied in our problem. The normal equations were first solved by Newton's scheme. Practical examples show that this converges fast if the observational errors are sufficiently small and the initial approximate solution is sufficiently accurate, and that it fails otherwise. Newton's method was modified to yield a definitive solution in the case the normal approach fails, by combination with the method of steepest descent and other sophisticated algorithms. Practical examples show that the modified Newton scheme can always lead to a final solution. The weighting of observations, the orthogonal para meters and the "efficiency" of a set of adjustment parameters are also considered. The definition of "efficiency" is revised. CHAPTER I INTRODUCTION The problem of computing the orbit of a visual binary from a set of observed positions is by no means new. A great variety of methods has been proposed. As is well known, only a few of these suffice to cover the practical contingencies, and the majority fails to handle the input data efficiently and properly. For the visual binary case, the determination of an orbit normally requires a large number of observations. All measures of position angles and separations are, as are all observations, affected by observational errors. For the purpose of our work, these errors are assumed to follow a biasfree multivariate Gaussian distribution. Under this assumption, orbitcomputing is a least squares adjustment problem, in which the condition equations are nonlinear in both the observations and the adjustment parameters. The condition equations must incorporate all relationships that exist between observations and the orbital parameters, *Usually, the term "orbital elements" is used. We prefer "orbital parameters" instead. Strictly speaking, the orbital elements are the constants of integration in the twobody problem and, therefore, do not include the masses of the components. that is, must state both the area theorem, which follows from Kepler's equation, and the condition that the projected orbit is a conic section. H. Eichhorn (1985) has suggested a new form for the construction of a complete set of condition equations for this very problem. The traditional least squares algorithm, which is based on condition equations linearized with respect to the observational errors, will not lead to those orbital parameters which minimize the sum of the squares of observa tional errors, because linearization, in this case, is too crude an approximation. In some earlier papers, H. Eichhorn and W. G. Clary (1974) proposed a least squares solution algorithm, which takes into account the second (in addition to the first) order derivatives in the adjustment residuals (observational errors) and the corrections to the initially available approximation to the adjustment parameters. A completely general solution was published by W. H. Jefferys (1980, 1981), who proposed a rigorous adjustment algorithm for models in which the observations appear nonlinearly in the condition equations. In addition, there may be non linear constraints* among model parameters, and the observations may be correlated. In practice, the method is nearly as simple to apply as the classical method of least **We use the term "constraints" for condition equations which do not contain any observations explicitly. 3 squares, for it does not require calculation of any deriva tives of higher than the first order. In this paper, we present an approach to solve the orbit problem by Jefferys' method, in which both the area theorem and the conic section equation assume the function of the condition equations. CHAPTER II REVIEW AND REMARKS Review of the Methods for OrbitComputation Every complete observation of a double star supplies us with three data: the time of observation, the position angle of the secondary with respect to the primary, and the angular distance (separation) between the two stars. The problem of computing the socalled orbital elements (in this paper, called orbital parameters) of a visual binary from a set of observations superficially appears analogous to the case of orbits in the planetary system, yet in practice there is little resemblance between the problems. The problem of determining the orbit of a body in the solar system is complicated by the motion of the observer who shares the motion of the Earth, so that, unlike in the case of a binary, the apparent path is not merely a projection of the orbit in space onto the celestial sphere. In the case of a binary, the path observed is the projection of the motion of the secondary round the primary onto a plane perpendicular to the line of sight. The apparent orbit (i.e., the observed path of the secondary about the primary) is therefore not a mere scale drawing of the true orbit in space. The primary may be situated at any point within the ellipse described by the secondary and, of course, does not necessarily occupy either a focus or the center. The problem of deriving "an orbit" (meaning a set of estimates of the orbital parameters) from the observations was first solved by F. Savary in 1827. In 1829, J. F. Encke quickly followed with a different solution method which was somewhat better adapted to what were then the needs of the practical astronomer. Theoretically, the methods of Savary and Encke are excellent. But their methods utilize only four complete pairs of measures (angle and distance) instead of all the available data and closely emulate the treatment of planetary orbits. They are therefore inadequate in the case of binary stars (W. D. Heintz, 1971; R. G. Aitken, 1935). Later, Sir John Herschel (1832) communicated a basically geometric method to the Royal Astronomical Society. Herschel's method was designed to utilize all the available data, so far as he considered them reliable. Since then, the contributions to the subject have been many. Some consist of entirely new methods of attack, others of modifications of those already proposed. Among the more notable investigators are Yvon Villarceau, H. H. Midler, E. F. W. Klinkerfues, T. N. Thiele, M. Kowalsky, S. Glasenapp, H. J. Zwiers, K. Schwarzchild, T. J. J. See, H. N. Russell, R. T. A. Innes, W. H. van den Bos and others. One may classify the various methods published so far into "geometric" methods, which are those that enforce only the constraint that the orbit is elliptical, and the dynamicall" ones which enforce, in addition, the area theorem. The geometric treatment initiated by J. Herschel peaked in Zwiers' method (1896) and its modifications, e.g. those of H. M. Russell (1898) and of G. C. Comstock (1918). Every geometric method has the shortcoming that it must assume the location of the center of the ellipse to be known while it ignores the area theorem and thus fails to enforce one of the constraints to which the observations are subject. The growing quantity and quality of observations called for suitable computing precepts, and the successful return to dynamical methods began with van den Bos (1926). Of the many methods for orbitcomputation formulated, some are very useful and applicable to a wide range of problems, e.g. those by Zwiers, Russell and those by Innes and van den Bos. Zwiers' method (1896) is essentially graphical and assumes that the apparent orbit has been drawn. This method is therefore useless unless the apparent ellipse gives a good geometrical representation of the observations and satisfies the law of areas, and thus will not be further described here since we are primarily concerned with the analytical methods. In the following, we will briefly review Kowalsky's method and that by Thiele and Innes. Definition of the Orbital Parameters Seven parameters define the orbit and the space orientation of its plane. The first three of these (P,T,e) are dynamical and define the motion in the orbit; the last four (a,i,Q,w) are geometrical and give the size and orientation of the orbit. The parameters are defined somewhat differently from those for the orbits of planets and comets. The first dynamical parameter P is the period of revolution, usually in units of mean sidereal years; n is the mean (usually annual) angular motion; since n=2n/P, P and n are equivalent. The second, T, is the epoch of periastron passage (usually expressed in terms of years and fractions thereof). The third, e, is the eccentricity of the orbital ellipse. The geometrical parameter a is the angle subtended by the semimajor axis of the orbital ellipse (usually expressed in units of arcseconds). The angle i is the inclination of the orbital plane to the plane normal to the line of sight, that is, the angle between the plane of projection and that of the orbit in space. It ranges from 0 to 1800. When the position angle increases with time, that is, for direct motion, i is between 0O and 90; for retro grade motion, i is counted between 90 and 180; and i is 900 when the orbit appears projected entirely onto the line of nodes. The "node", 0, is the position angle of the line of intersection between the tangential plane of projection and the orbital plane. There are two nodes whose corresponding values of 9 differ by 1800. That node in which the orbital motion is directed away from the sun is called the ascending node. We understand 9, which ranges from 0 to 3600, to refer to the ascending node. Because it is, however, one of the peculiarities of the orbit determination of a visual binary that it is in principle impossiblefrom positional data aloneto decide whether the node is the ascending or descending one Q may be restricted to 00501800. The last, w, is the longitude of the periastron in the plane of the orbit, counted positive in the direction of the orbital motion and starting at the ascending node. It ranges from 0 to 360. These definitions are adhered to throughout our work. Some of them may somewhat differ a little from those given by previous authors. But any way in which one defines them does not affect the principles of the method we describe. The Method of M. Kowalsky and S. Glasenapp This old method was first introduced by J. Herschel in a rather cumbersome form and is better known in its more direct formulation by M. Kowalsky in 1873 and by S. Glasenapp in 1889. R. G. Aitken (1935) gives the detailed derivation of the formulae in his textbook The Binary Stars. Kowalsky's method is essentially analytical. It derives the orbit parameters from the coefficients of the general equation of the ellipse which is the orthogonal projection of the orbit in space, the origin of coordinates being taken at the primary. The projected orbit can be expressed by a quadratic in x and y, whose five coefficients are related to those five orbital parameters which do not involve time. In rectangular coordinates, the equation of an ellipse, and thus of a conic section, takes the form c1x2 + 2c2xy + c3Y2 + 2c4x + 2c5Y + 1 = 0 (21) where the rectangular coordinates (x,y) are related to the more commonly directly observed polar coordinates (p,8) by the equations x = pcose (22a) y = psine (22b) where p is the measured angular distance and 8 the position angle. The five coefficients of equation (21) can be deter mined by selecting five points on the ellipse or by all the available observations in the sense of least squares. There is an unambiguous relationship between the five coefficients (c1,c2,c3,c4,c5) and the five orbital para meters (a,e,i,w,n). We can find the detailed derivation of the formulae in Aitken's The Binary Stars. Here, we will therefore state only the final formulae without derivation. The five orbital parameters (a,e,i,w,Q) can be calcu lated from the known coefficients (c1,c2,c3,c4,c5) by the following procedure. 1) The parameter a can be found from the equation 2(c2c4c5) tan2 = (23) 2 2 (c5 c42+C1C3) To determine in which quadrant 9 is located, we can use two other equations: c'sin2n = 2(c2c4c5) (23'a) c'cos2Q = c52c42+c1c3 (23'b) where tan2i c' (23'c) p2 p which is always positive. More elegantly, we write in Eichhorn's notation, 29 = plg[2(c2c4c5),c52c42+ClC3] (23'd) H. Eichhorn (1985), in his "Kinematic Astronomy" (unpublished lecture notes), defines the plg(x,y) function as follows: plg(x,y) = arctan(x/y) + 90[2sgnx(l+sgny)] , where arctan is the principal value of the arctangent. 2) The inclination i is found from 2 2 2cosn(c5 +c4 clc3 tan2i = 2 + (24) cos2Q(c52+c4c1c3)(c52c42+c1c3) Whether the i is in the first or second quadrant is deter mined by whether the motion is direct or retrograde. If the motion is direct, the position angle increases with time, i<900; otherwise, i>900. 3) The equation w = plg[l(c4sinQc5cosQos i,c4os++c5sinn) (25) gives the value of w. 4) With i,w,Q known, two more parameters (a and e) can be calculated from 2cos2n(c4sinnc5coso)cosi e = (26) 2 2 2 2 [cos2Q(c5 +c4 c1c3)(c5 c4 +c1c3)]sinw 2cos2Q a = (27) [cos2((c5 +c4 clc3)(c52c4+clc3)](1e2 5) To complete the solution analytically, the mean motion n (or the period P) and the time of periastron passage T, must be found from the mean anomaly M, computed from the observa tions by Kepler's equation: M = n(tT) = EesinE (28) where E is the eccentric anomaly. Every M will give an equation of the form M = nt + (29) where r=nT. From these equations the values of n and T are computed by the method of least squares. This is essentially the socalled Kowalsky method. It looks mathematically elegant. But because it is not only severely affected by uncertainties of observations but also ignores the area theorem, it has a very poor reputation among seasoned practitioners. However, in our work, we use it for getting the initial approximation to the solution. It serves this purpose very well. The Method of T. N. Thiele, R. T. A. Innes and W. H. van den Bos T. N. Thiele (1883) published a method of orbit computation depending upon three observed positions and the constant of areal velocity. The radii vectors to two positions in an orbit subtend an elliptical sector and a triangle, the sector being related to the time interval through the law of areas. Gauss introduced the use of the ratio: "sector to triangle" between different positions into the orbit computation of planets, and Thiele applied the idea to binary stars. Although the method could have been applied in a wide range of circumstances, it became widely used only after Innes and van den Bos revived it. In 1926, R. T. A. Innes (Aitken, 1935), seeking a method simpler than those in common use for correcting the preliminary parameters of an orbit differentially, indepen dently developed a method of orbit computation which differs from Thiele's in that he used rectangular instead of polar coordinates. W. H. van den Bos's (1926, 1932) merit is not merely a modification of the method (transcribing it for the use with Innes constants) but chiefly in its pioneeringly successful application. The device became most widely applied. Briefly, the method of computation is as follows. The computation utilizes three positions (pi,9i) or the corresponding rectangular coordinates (xi,Yi) at the times ti (i=1,2,3). The area constant C is the seventh quantity required. Thiele employs numerical integration to find the value of C. First, from the observed data, find the quantities L by L12 = t2tlD12/C (210a) L23 = t3t2D23/C (210b) L13 = t3tlD13/C (210c) where Dij = pipjsin(ejei), are the areas of the corre sponding triangles. Then, from the equations nL12 = psinp (211a) nL23 = qsinq (211b) nL13 = (p+q)sin(p+q) (21ic) the quantities n, p and q can be found by trials, for in the three equations above there are only three unknowns, i.e., n, p and q. The eccentric anomaly E2 and the eccentricity e can thus be computed from (D23sinpD12sinq) esinE, = (212a) (D23+DI2 D3) (D23cosp+D12cosqD13) ecosE2 = (212b) (D23+D12D13) After E2 and e are obtained, the two other eccentric anomalies E1 and E3 can be found from E1 = E2 p (213a) E3 = E2 + q (213b) The Ei are used first to compute the mean anomalies Mi from equations (28), which lead to three identical results for T as a check, and second to compute the coordinates Xi and Yi from Xi = cos Ei e (214a) Yi = sinEijle2) (214b) By writing xi = AXi + FYi (215a) yi = BXi + GYi (215b) the constants A, B, F, G are obtained from two positions, the third again serving as a check. These four coeffi cients, A, B, F, G, are the now socalled ThieleInnes constants. In addition to n, T, e, which have already been determined, the other four orbital parameters a, i, n, w can be calculated from A, B, F, G through Q+w = plg(BF,A+G) (216a) Qw = plg(B+F,AG) (216b) t i (AG)cos(w+Q) (B+F)sin(w+Q) tan == (216c) 2 (A+G)cos(wD) (BF)sin(wQ) 2i A+G = 2acos(w+Q)cos (216d) 2 . In addition to the brief introduction to this method given above, a detailed description of it can be found in many books, e.g., Aitken's book The Binary Stars (1935) and W. D. Heintz's book Double Stars (1971). A more accurate solution is obtained by correcting differentially the preliminary orbit which was somehow obtained by using whatever method. This correction can be achieved by a least squares solution. The Method of Least Squares The method of least squares was invented by Gauss and first used by him to calculate orbits of solar system bodies from overdetermined system of equations. It is the most important tool for the reduction and adjustment of observa tions in all fields, not only in astronomy. However, the traditional standard algorithm, which employs condition equations that are solved with respect to the (uncorrelated) observations and either linear in the adjustment parameters or linearized by developing them in Taylor series which are broken off after the first order terms, is inadequate for treating the problem at hand. An algorithm for finding the solution of a more general least squares adjustment problem was given by D. C. Brown (1955). This situation may briefly be described as follows. Let {x} be a set of quantities for which an approxima tion set {x0} was obtained by direct observation. By ordering the elements of the sets {x} and {xo) and regarding them as vectors, x and x0, respectively, the vector v=xx0 is the vector of the observational errors which are initially unknown. Assume that they follow a multivariate normal distribution and that their covariance matrix a is regarded as known. Further assume that a set of parameters {a} is ordered to form the vector a. The solution of the least squares problem (or the adjustment) consists in finding those values of the elements of {x} and {a} which minimize the quadratic form vTalv while at the same time rigorously satisfying the condition equations fi({x}i,{a}i)=0 (217) This is a problem of finding a minimum of the function vTalv subject to condition equations. A general rigorous and noniterative algorithm for the solution exists only for the case that the elements of {x} and {a} occur linearly in the functions fi. When fi are nonlinear in the elements of either {x} or {a), or both, equations which are practically equivalent to the fi and which are linear in the pertinent variables can be derived in the following way. Define a vector 6 by a=a0+6. The condition equations can then be written f(x0+v, a0+6) = 0 (218) where the vector of functions f=(fl,f2 ..., fm)T, m being the number of equations for which observations are available. Now assume that all elements of {v) and {86 are sufficiently small so that the condition equations can be developed as a Taylor series and written f0 + fxv + fa6 + 0(2) ... = 0 (218'a) where f0 = f(x0.a0) fx j and fa =  0 0xxo0,a 0 @aj x0,a0 If the small quantities of order higher than 1 can be neglected, we can write the linearized condition equations as f0 + fxV + fa6 = 0 . (218'b) These are linear in the relevant variables, which are the components of v and of 6. In order to satisfy the conditions (218'b), we define a vector 2p of Lagrangian multipliers and minimize the scalar function S(v,6) = vTolv 21(f0 + fxV + fa6) (219) in which the components of v and 6 are the variables. Setting the derivatives (aS/av) and (3S/l6) equal to zero and considering equations (218'b), we obtain 6 = [faT(fxfxT)fal]fa T(fxUfT)lf (220a) = (fxafxT)l(fa+fO) (220b) v = afXT (220c) where we have assumed that fxofxT is nonsingular. This is the case only if all equations fi=0 contain at least one component of x; i.e., if there are no pure constraints of the form gi(a)=0. This case (which we shall not encounter in our investigations) is discussed below in the description of Jefferys' method. Note that in constructing the scalar function S in expression (219), first order approximations have been used. In some cases, the linearized representation of Eq. (218) by Eq. (218'b) is not accurate enough. In some of these cases, the convergence toward the definitive solution may be accelerated and sometimes be brought about by a method suggested by Eichhorn and Clary (1974) when a strictly linear approach would be divergent. Their solution algorithm takes into account the second (as well as the first) order derivatives in the adjustment residuals (observational errors) and the corrections to the initially available approximations to the adjustment parameters. They pointed out that the inclusion of second order terms in the adjustment residuals is necessary whenever the adjustment residuals themselves cannot be regarded as negligible as compared to the adjustment parameters, in which cases the conventional solution techniques would not lead to the "best" approximations for the adjustment parameters in the sense of least squares. The authors modified the condition equations as M6 f0 + fxv + fa6 + Vv + D6 + or = 0 (218") Nv Correspondingly, the scalar function to be minimized becomes M6 S"(v,6) = vTo1v 21L(fo+fxV+fa6+Vv+D6+or) .(219") Nv 1 T Here, the ith line of the matrix D is o E., and 2 E. = ( 221) KajBakj x0,a0 the matrix of the Hessean determinant of fi with respect to the adjustment parameters. Similarly, the ith line of 1 vector V is vTWi, and 2 W. =ax ax (222) S axj xk x0,a0 ; the ith line of M is vTHi, and H. axa) = (223) and that of N is THTi, so evidently NM=Nv. Minimizing S", also 6, v can be calculated. For this algorithm in detail, one can refer to the original papers of Eichhorn and Clary. The notation used here is slightly different from the original one used by the authors. Jefferys (1980, 1981) proposed an even more accurate algorithm which also improves the convergence of the conventional least squares method. Furthermore, his method is nearly as simple to apply in practice as the classical method of least squares, because it does not require any second order derivatives. Jefferys defines the scalar function to be minimized as 1 1 s = vT1v + fT(,) + gT(a) (224) 2 where i=x0+v; x,a are the current "best" approximations to x and a; and g is another vector function consisting of the constraints on the parameters; I., y are vectors of Langrangian multipliers and evaluated at (i, a). Minimizing S with respect to i and a, he arrives at the normal equa tions o1i + fT(ii) = 0 (225a) fx( ) + gaT(a) = 0 (225b) f(:,i) = 0 (225c) g(a) = 0 (225d) where fR = fx fi = fa Ix,a, x,a. These equations are exact and involve no approximations. This is the significant difference between Jefferys' method and those of Brown and of Eichhorn and Clary. Remarks As mentioned before, Kowalsky's method will most likely not produce the best obtainable results, because the relative observed coordinates (x, y, t) are subjected only to the condition (21), which involves only five of the seven necessary orbital parameters as adjustment parameters and does not enforce the area theorem. It can therefore never be used for a definitive orbit determination since it completely ignores the observation epochs. Yet, Eq. (21) has the advantage that it appears to be simple, in particular it is linear in the adjustment parameters albeit not in the observations. When it is used for the determination of orbits, the righthand sides of the equations which result from inserting a pair (x, y) of observed rectangular coordinates into Eq. (21) are regarded as errors with a univariate Gaussian distribution (i.e., as normally distributed errors). One may then perform a least squares adjustment which is linear in the adjustment parameters. As Eichhorn (1985) pointed out, this approach, while it has the advantage that approximation values for the adjustment parameters need not be available at the outset, fails to take into account two facts. 1) It is not the righthand sides of the condition equations which are to be considered as normally distributed errors, but rather the observations (x,y) or (p,e). The condition equations (21) thus contain more than one observation each. Since the observations occur in the condition equations nonlinearly, the matrix fx = f(x,a) x J x=x0,a=a0 must be found. This requires knowledge of approximate values ao for a. Approximate values x0 for x are availablethey are the observations themselves. Approximate values a0 for a may sometimes indeed be obtained in the classical way by regarding the righthand sides of the condition equations as normally distributed errors. In addition, it should also be taken into account that the covariance matrix a of the observations is not necessarily diagonal. 2) In some cases, especially when the binary under study is very narrow, the errors of the observations are not negligibly small compared with the adjustment parameters. This requires either that secondorder terms in the observa tional errors v be carried in the equations or, as Jefferys has pointed out, that iterations be performed using in the evaluation of the matrices f, and fa not only improved approximations for a but also improved values for the observed quantities as they become available. If Kowalsky's methods were so modified, the algorithm would yield better values for the adjustment parameters a than the traditional approach. Either way, one can usually find an initial approximation by Kowalsky's method. With respect to both the theoretical clarity and the practical applicability, as far as it is concerned, the ThieleInnesvan den Bos method leaves nothing to be desired. However, the three places selected, even when smoothed graphically or by some computation, may not suffice to describe the motion with sufficient accuracy, so that large and systematic residuals may remain, particularly near periastron. The method is seriously inadequate even if one of the ratios sector to triangle is very close to 1 and thus strongly affected by the measurement errors or if the area constant C is not initially known to the required accuracy. The computation may then produce an erroneous orbit with spuriously high eccentricity, perhaps a hyperbolic one, or no solution at all. And obviously, different combinations of the three positions selected from a set of observations will not likely give the same result. This method therefore fails to use the information contained in the observations in the best possible way. In our work we try to present a fairly general least squares algorithm to solve the orbit problem. We shall adopt Jefferys' least squares method as our basic approach. CHAPTER III GENERAL CONSIDERATIONS This chapter contains a general discussion of the least squares orbit problem. We shall set up condition equations which simultaneously satisfy the ellipse equation and the area theorem. We have seen that it is not sufficient to use Eq. (21) as the only type of condition equation because this would ignore the observing epochs, cf. last chapter. Completely appropriate condition equations must explicitly contain the complete set of the seven independent orbital parameters as the adjustment parameters. Also, to be useful in practice, they must impose both the geometric and dynamical condi tions, and must lead to a convergent sequence of iterations. After the condition equations are established, we present the general outline of the algorithm which solves the orbit problem by Jefferys' method of least squares. We also discuss some further suggestions for obtaining the initial approximate solution required for the least squares algorithm. The Condition Equations Assume that a set of observations {x0} was obtained consisting of complete data triples (t, p, 6), which measure the positions of the fainter component (secondary) with respect to the brighter one (primary): the position angle 8 is counted counterclockwise from North and ranges from 0 to 3600; the angular separation p (also called distance) is usually given in seconds of arc, and t is the observing epoch. The conversion of (p, 6) to rectangular coordinates (x, y) in seconds of arc is as following: Declination difference 8c6p = x = pcos9, (3la) Right ascension difference (acap)cos6 = y = psin6, (31b) where 8c, ac are the declination and right ascension, respectively, of the secondary; 6p, ap those of primary. Equivalently, the observations can also be regarded as relative coordinates (t, x, y) of the secondary with respect to the primary. It might be worthwhile to point out that 1) even though the formulae (31) are approximations valid only for small values of p, they may be regarded as practically rigorous for binaries; 2) we are following the custom in double star astronomy by having the xaxis along the colure* and the y axis tangential to the parallel of declination. All observations in {x0} are affected by observational errors. Let {x} be the set of the true values of the observations, that is, those values the observations would have had if there were no observational errors. By ordering the elements of the sets {x0} and {x} and regarding them as vectors, x0 and x respectively, we have seen that we may write the vector of observational errors as v = x x0. These errors are of course unknown, but as mentioned already in the last chapter, we assume that they follow a multi variate normal distribution with known covariance matrix. For visual binaries, the relative orbit must be an ellipse (strictly speaking, a conic section) in space as well as in projection. All pairs (x, y) must therefore satisfy the condition equations (21): C1x2 + 2C2xy + C3y2 + 2C4x + 2C5Y + 1 = 0 , which implicitly involve five of the seven orbital para meters but do not enforce the area theorem. The wellknown relationships between the five coeffi cients (C1, C2, C3, C4, C5) in Eq. (21) and the five *Following Eichhorn's terminology who uses the term "colure" generally for any locus of constant right ascension. orbital parameters (e, a, i, w, 0) by way of the Thiele Innes constants, have been discussed in the last chapter. Consider a righthanded astrocentric coordinate system K whose XY plane is the true orbital plane such that the positive Xaxis points toward the periastron (of the secondary with respect to the primary). The positive Yaxis is obtained by rotating the Xaxis by 90 on the Zaxis in the direction of the orbital motion. The axes of a second astrocentric, righthanded coordinate system k are parallel to those of the equator system Q. The two systems K and k are related by the transformation xK = R3(w)Rl(i)R3(n)Xk (32) From the theory of the twobody problem we know that, in the system K, the coordinates of the secondary with respect to the primary are given by x = aLcosE e x Y = a sinEV1eJ (33) Z J0 where E is the eccentric anomaly, which is the solution of Kepler's equation n(tT) = EesinE . (34) Here, n and T, the mean motion and the periastron epoch, are the two orbital parameters not involved in Eq. (21). From Eq. (32) we obtain [A B C" xI K = F G H y K L M, z, . where Kx + Ly z = (36) and SA B C ' F G H = R3(w)RI(i)R3(0) K L M , or, in detail, A = cosacosw sinasinwcosi B = singcosw + cosasinwcosi C = sinwsini ; F = cosnsinw sinncoswcosi G = sinnsinw + cosQcoswcosi H = cososini K = singsini (35) (37) (38a) (38b) (38c) (38d) (38e) (38f) (38g) L = cosasini ;(38h) M = cosi (38i) where in this notation, the traditional ThieleInnes constants would be aA, aB, aF and aG. From Eq. (37) and (35) we can get Gx Fy X = Ax + By + Cz = (39a) M Bx Ay Y = Fx + Gy + Hz = (39b) M Thus, we see that X and Y can be expressed in terms of i,w,Q and the observations (x, y). From Eq. (33) we obtain X cosE = + e (310a) a Y sinE = a 1 (310a) a le2 Combining (310) with Kepler's equation (34), we get X [ eYl + e = cos n(tT) + (311a) a a e Y eY a = sin n(tT) + a (311b) More succinctly, we have U = cos[ n(tT) + eV ] (312a) V = sin[ n(tT) + eV ] (312b) or U = cos[ n(tT) + e1U2 i (312'a) V = sin[ n(tT) + eV] (312'b) with x Y U = + e V (312c) a al 2 After X and Y are expressed in terms of i, w, 9 and (x, y) as in Eqs. (39), Eqs. (311) or (312) involve exactly the seven orbital parameters (n, T, a, e, i, w, 9) and the observations (t, x, y). The observing epoch t now appears explicitly, as it must if the area theorem is to be enforced. Now, we see that if both equations (311) are satis fied, Kepler's equation which enforces area theorem would be satisfied and furthermore, the ellipse equation would also be automatically satisfied as can be seen if t is eliminated from Eqs. (311) so that these equations are reduced to one equation in X and Y. If we select the two equations (311) as the condition equations, we need no longer carry Eq. (21) separately. Of course, we can use any one of Eqs. (311) as well as Eq. (21) as the condition equations. However, using Eq. (21) is not convenient, for it contains the five coefficients directly, but not the five orbital parameters themselves, even though there are unique rela tionships between them. Ideally, the condition equations should have the adjustment parameters explicitly as variables. In our work, we use the Eqs. (311) as the complete set of condition equations. The General Statement of the Least Squares Orbit Problem As seen in the last section, the vector of "true observations" x (presumably having been adjusted from the observation x0 by the residuals v) and the vector of "true orbital parameters" a, a=(n, T, a, e, i, w, g)T, must satisfy the condition equations X eY fl(x0+v,a) = + e cos n(tT)+ = 0 (313a) a _e2 Y eY f2(x0+v,a) = aY sin n(tT)+ = 0 (313b) arLJe 2 a e 2 where S A B C x Y = F G H y S0 K L M z , with Kx + Ly z =  M and A B C ' F G H = R3(w)RI(i)R3() K L M In our problem, there are no constraints between the parameters which involve no observations so that Jefferys' g function does not occur. The problem can therefore be stated as follows. Assume that the residuals {v} (regarded as vector v) of a set of observations {x0} (regarded as vector x0) follow a multivariate normal distribution, whose covariance matrix a is regarded as known; we are to find the best approximations of v (for v, the residuals) and i (for a, the parameters) such that fl(xo+v,a) = 0 and f2(xo+v,a) = 0 are both satisfied while at the same time the quadratic form 1 1 S = v 1 (314) 2 is minimized. Following the wellknown procedure introduced by Lagrange, the solution is obtained by minimizing the scalar function S = vT 1 + fT(k ) (315) 2 where x = xO + v, and i is the vector of Lagrangian multi pliers, together with satisfying the equations fl=0=f2. Denoting the matrix of partial derivatives with respect to a variable by a subscript, this is equivalent to solving the following normal equations: o1 + fT(i,a) = 0 (316a) fTil = 0 (316b) f(x,a) = 0 (316c) We have stated before that these equations are exact and therefore involve no approximations. Before Jefferys, all authors used first order or second order approximations in forming S in equation (315). This is the significant difference that distinguishes Jefferys' method from those employed by previous authors. It is evident that the solution of the equations (316) would solve the posed problem. About the Initial Solution The least squares algorithm requires an initial solution as starting point. Any approach which leads to approximate values of the orbital parameters serves this purpose, because our algorithm does not require a very accurate initial approximation. As long as the initial approximation is not too different from the final result, convergence can always be achieved. To obtain an initial solution, the following procedures may lead to an initial solution. 1) As mentioned in Chapter II, Kowalsky's method can produce a preliminary solution. Inserting the pairs (x, y) of observed rectangular coordinates into Eq. (21), we have a set of linear equations in which the five coefficients (cl, c2, c3, C4, c5) are the unknowns. By making a classical least squares solution based on these linear equations, the five coefficients can be computed. An estimate of the five parameters (a, e, i, w, 0) can be obtained in turn from the unique relationships between them and the five coefficients. The remaining two parameters (n, T) also can then be calculated from the known quantities simply by classical least squares, as described in Chapter II. As Eichhorn (1985) has pointed out, it is of course better to use the modified Kowalsky method. Using (21) as condition equations and the five coefficients as the adjustment parameters, one may iterate by Jefferys' algorithm toward the best fitting adjustment parameters and the best corrections to the observations, v, that is, to arrive at the values of a and v which minimize the scalar function SO (see equation 314) while simultaneously satisfying the condition equations. This method is simple to apply in practice. But unfortunately, especially when the observations are not very precise, the five coefficients (cl, c2, c3, c4, c5) in some cases do not always satisfy the conditions for an ellipse; i.e., they do not meet the requirements C1>0, C3>0 and C1C3C22>0. However, in these cases, it does not mean that there is no elliptic solution at all and other approaches can be tried. When this happens, one may for instance take the approach outlined in Chapter 23 of Lawson and Harsson (1974). 2) By using some selected points among the observation data instead of using all the data points, sometimes a solution can be found by Kowalsky's method. Such a solution is usually also good enough to be a starting approximation. 38 3) Or, carefully selecting three points among the observa tion data, one may use the ThieleInnesvan den Bos method to calculate an initial approximation. The method has been described in Chapter II. CHAPTER IV THE SOLUTION BY NEWTON'S METHOD The Solution by Newton's Method In our problem, the normal equations (316) are nonlinear. They must be solved by linearization and successive approximations. Assume that approximate initial estimates of the unknowns in the normal equations have somehow been obtained (using whatever methods). This initial approximation may be improved by Newton's method, which consists of linearizing the normal equations about the available solution by a first order development and obtaining an improved solution by solving the linearized equations. This process is iterated with the hope that convergence to a definite solution would eventually be obtained. This expectation is reasonable if the initial approximation is sufficiently close to the final solution and if the observational errors are not too large. Following Jefferys' notation (which is also the notation we have used in Chapter III), let the initial approximate solution (and also the current approximate solution during iteration) be given by (x,a), where x = x0 + v, x0 being the vector of observation, v the vector of observational errors (for which we adopt the nullvector as initial approximation); a is the initial approximation of the vector of the seven orbital parameters (adjustment parameters); also let the corrections to both i and i be denoted by e and 8, respectively. The normal equations in our problem now become o1(r+g) + fRT = 0 (4la) fTT = 0 (4lb) f + f fe + f = 0 (4ic) Here we have ignored (as Jefferys did) products of e and 8 with Lagrangian Multipliers. This does not affect the final result, as Jefferys also pointed out. A caret in equations (41) above means evaluation at current values of i and a. Similar to Jefferys' procedure, we solve the equations (41) as follows. Solving Eq. (4la) for e we have a = v of i (42) Substituting (42) into (41c) for g, Eq. (41c) becomes f fr_ fafTT + f6 = 0 (43) Solving Eq. (43) for u, we obtain 4 = w(f f +i) + f68) where the "weight matrix" w is given by w = (f7of.T)l 1 Inserting this solution for 4 into Eq. (416), we have fTw(f fjv + fj8) = 0 a (46) If we now define $ = f fiv (47) and rearrange Eq. (46), the equation for 8 will have the form (fTwfs)8 = faTw (48) This set of linear equations is easy to solve for the corrections 8 by general methods. With this solution for 8, the improved residuals in are obtained from the equation (49) (44) (45) which follows from Eqs. (4la), (44) and (47). We then get the new vectors of an and in as in = a + 8 (410a) Xn = x + Vn (410b) which constitute the improved solution. After each iteration, we check the relative magnitude of each component in v and 8 against the corresponding component in k and a and get the maximum value among all of these and test if this value is smaller than some specified number, say 108. If the improved solution is still not sufficiently accurate (i.e. if the above found maximum value is still not smaller than the specified value), the process of iterations has to be continued until convergence has been attained, that is, until subsequent iterations no longer give significant corrections. At the outset, the obvious starting point for this scheme is to adopt *=x0 as the initial approximation for the "true observation" vector x (in this case, v=0), and to use as first approximation of a for a a vector iO, an initial solution of the seven orbital parameters, which has been obtained somehow. It is important for convergence that the initial solution of (xi,) is not too different from the final solution which is obtained by the process given by Eq. (41) through (410). In Chapter III, we discussed how to find a good approximation as an initial solution. According to the scheme outlined above, the application of Newton's method in our problem would consist of the following steps. Step 1. Calculate f, fj and fi from the current values of x and a ; Step 2. Calculate $ from $ = fvr ; Step 3. Calculate the "weight matrix" w from w = (fjofT)i Step 4. Solve the corrections to the parameters, 8, from the linear equations (fTwf)8 = fTw Step 5. Calculate the improved residuals vn from Vn = af Tw(+f8) ; Step 6. Find the new approximate solution from fi = a + 6 Xn = x0 + Vn Step 7. Test the relative magnitude of each component of 8 and v against i and k, and decide if a further iteration is needed, in which case all the steps above must be repeated. In detail, the steps are as follows. Step 1. Calculate the vector of functions in condition equa tions f, and the vectors of partial derivatives fk and fa from the current values of x and i, where i=x+O+V. The dimension of the vectors x0, v, i all are 2m, m being the number of observed positions. The observation vector is xO = (X10,Yl0,...,xi0,Yi0,...,xmOYmO)T (411) The current vector of corrections to observations is v = (Vxl,Vyl,...,Vxi,Vyi,...,VxmVym) (412) From x0 and v, x can be easily found by =x0 +V, k = (x10+VxlYi0+Vyl,.. .,x+xi0+xii+vy*,... ...,XmO+VxmYmO+Vym)T. (413) The current approximation for the seven parameters, a, is S= (n,T,a,e,i,w,n)T, (414) at current values. Insert the known (i,a) into the condition equations to get the function 2mvector f. It has a form f = [fll,f21**flif2i,.***..,flmf2m ]T (415) where xi f = + e cosE (416a) 11 i a f = sinE (416b) 21 i b with b = aJle2 (416c) and eY. E. = n(tiT) + (416d) b . The coordinates X,Y are calculated from Eqs. (39), A,B,F,G,M from Eqs. (38). The first derivatives fi, fi are calculated at current values of x and a. The partial derivatives of f with respect to the observations i, fk, is a blockdiagonal 2mx2m square matrix of the form a11 ax1 af21 ax1 fl1 ay1 a21 ay1 afli ax 21 af2i axi fli aYi af2i ayi (417) i.e., (418) with afli axi af2i axi afli ayi af2i ayi (419) i=1,m (416), (38) and (39), we obtain aflm axm f2m axm afm aym a2m aym fx = diag(g1,g2,...,gi,...,gm) From Eqs. e sinEi b 1 0 (1ecosE.) b In particular, we have ax. ax axi ax ay aY xi ax axi ax G M B M , axi ayi ayi and therefore e sinE. b 1 0 (1ecosE ) b (421) aY A ay M The expressions for all the partial derivatives in fi are listed in Table 41. The dimension of fi, the vector of the partial deriva tives of f with respect to the seven parameters, is 2mx7. It has the form (423) axi 1 axi aYi axi 1 axi ay. Yi aYi ayi (420) ax ay F M gi = F M B M M (422) f& = (GI,G2,...,Gi,...,Gm,) Table 41. Expressions for All the Partial Derivatives in fi fli f2i 1 M 1 esinEi M a b i  esinEi M a b B  (1ecosE.) b A  (1ecosE.) b Sfli fli fli li fli afli an aT aa ae ai aw f2i af2i af2i af2i af2i af2i an aT .aa ae ai aw The expressions for all the elements in Gi are listed below. sinE. cosEi 1 af2i aT ( tiT n Xi eYi sinE. a ab Y ( (lecosE) ; ab YisinEi + 1 b(le2) e cosE. Y2 Yi b(1e i where G. = 1 f2i1 af )n afli an af2i an (424a) af li aa af 2i _ ae afli ae 21i 3e (424b) (424c) (424d) (424e) af i afi afli ai aw an 2i 2i af21 ai aw an e sinEi b 0 (lecosE ) b ax. aX. ax. ai aw an aY. ay. ay. a1 1aw a ai aw an' (424f) From Eqs. (38) and (39) we can find the expressions for the following six partial derivatives. ax. ai ax M Ay Bxi , aw i i aX. M Gyi + Fx. , an aY. ai ay. aw 1 a(n aY. zcos ; Fyi Gxi 1 1  By Ax . 1 1 In terms of these derivatives, we have f afii af' 1 ai aw an Ma af2i af2i af2i Si aw a ai aw ac b e z. sinw sinE zsin Mb 1 e  cosEi z.cosw Mb Mb AyiBxi Gyi+Fxi FyiGxi ByiAxi (424h) All expressions in fi are listed in Table 42. (424g) Table 42. Expressions for All the Partial Derivatives in fi (tiT)sinEi nsinEi Xi Yi 1 1i   esinEi a ab a Y. 1 + sinEi ae b(1e ) a z. "sinw e i M + sinEicosw] ai M a b a 1 AyiBxi e   i + sinE (FyiGx ) aw M a b a 1 Gy +Fx e  M sinE (BYi+Axi an M a b (tiT)cosEi ncosEi Yi S (1ecosEi) ab Yi S(ecosEi) b(le ) 1 (1ecosE )z cosw Mb 1 (1ecosEi )(FyGxi) Mb 1 (ecosEi1)(BYi+Axi) Mb Step 2. Calculate $ from f, v and f* by $=ffj'. The dimension of the vector d is also 2m. It has the form 1= ( ,02,'* 0*,i m)T (425) where faf af li fli Vxxi vyi il axi ayi i =2i ff af2i 2i xi yi i2 ax i ay (425') Step 3. Calculate the weight matrix w from fi and a. The matrix fi has been calculated in step 1. The covariance matrix a is assumed to be known. The dimension of a is 2mx2m. An example for computing a is shown below. The relationship between the covariance matrix of rectangular coordinates (x, y) and that of polar coordinates (p, 6) is xy a(x,y) a P a(x,y) xy a(p,e) p a(p,e) where x=pcose, y=psine and a(x,y) cose psine = (426') a(p,8) sine pcose . Thus, we have y cos8 psine C a 0 cose sine xy sine pcose 0 Ce psine pcose and therefore xv 2 2 2 2 SpCOS2e + (e 2sin2 lo ae 2)cosesine y (= p oep )cosesine a sin e + 0 p2cos2 (426") In these expressions, ap=A2p, ae=A2e and Ap, A8 are the observational errors in p and e, respectively. For the observations, the random errors are of similar order of magnitude as the systematic ones, larger in separation than in position angle. The average errors pAe and Ap vary somewhat with the separation p and can be assumed, for many series of observations, to follow the form Cp1/3, where C varies with different observers. For a single good observa tion C will not exceed '.'03 in position angle (pAe) and 0'.'08 in separation (Ap) (Heintz, 1971). Errors will be somewhat larger and difficult to measure if one or both components are faint. If the errors are expressed in the dimensionless (relative) forms AO and Ap/p, it is seen that they increase as pairs become closer. Based on the considerations above, we can, for example, put Ap=0'08p1/3 and pA6=0703p1/3, i.e. A6=0.03p2/3. If we allow each pair of numbers (xij,i) in an observation to be correlated, but no correlations between different observa tions, a, would be blockdiagonal. In this case, we have a = diag(al,a2,...,oi,...,om) , where i axixi axiyi a fil ai3 ayixi ayiyi ai4 ai2 with Oxiyi=oyixi, i.e., 0i3=ai4 The form of the weight matrix is simpler in this case; it is also blockdiagonal. According to Eq. (45), w=(fiaf)l1. The matrices fi, o, fT now are all blockdiagonal. Therefore w = diag(wl,w2,...,wi,...,wm) (429) with i il Wi wi4 Wi3 wi2 * (430) The computation of w is straightforward. We first find w1. If we denote u=wl=fjafT, it is obvious that u has the same form u = diag(ul,u2,...,ui,...,um) (431) where Uil ui4 ui = afli axi af2i axi fli ayi af2i ayi ui3 = ui2 ail 3i3 ai4 i2 fli axi fli ayi af2i axi ^i f2i ayi (432) i.e., ffli fli f li Uil i 1 2 ax + ai3 + 1 axi ax. Bay 11 1 a i2 i2 (433a) 2 2 2i f2 +2 2 I 2i": 0i +af3 fi2 fi 2f2i afi fi 2 ui2 = i + 2 oi3 + 0 (433b) Saxi. axi ayi y3 2 1 1 Sfli af2i + fli f2i fli 2 f2i] axi ax. i axi ayi ayi ax i3 + afli af2i ai2 ; (433c) ayi yi Ui4 = ui3 (433d) After computing u, we can find its inverse w very easily. If we denote u=uilui2ui3ui4=uilUi2u23, we have Wil Wi3 ui2 i4 w = = (434) ui3 Uil wi4 wi2 u u As seen above, we can calculate the components of w directly from the components of u. If the quantities which constitute different observa tions were also correlated, we would have to find the product of the matrices fk, a, fT and calculate its inverse to get w. This would be more complicated by far. Step 4. Find fgTwfa and f7T$ from fg, $ and w, then solve the linear equations (fTwf)8s = fwT to get the correction vector 6. The dimension of the matrix fTwf is 7x7 and fTw$, 6 are 7vectors. Denoting A=faTwf and B=fTw$, we have m af a d Bf. wf m fli 1i f2i fli A(j,k) = il + wi3 + J= ; ak j kk afli f21 afi 2 2i i + wi3 + wi a.j ak aaj ak j k j k j=1,7; k=1,7; (435a) and m B(j) = j 1=1 af af iia i3 il Sw il + l2i i3 il + aa aa a + w 3 i2 + wi2 Ji2 aa a . j=1,7; (435b) where afi afi afi fli afi afi fli afi 11 11i 11 11i 1i 111 l i aaa an Da2 aT Da3 3a aa4 De , ..., and so forth. To solve the linear equations A6=B for 8, we can apply any linear equation solution method, e.g. Gaussian Elimination, GaussJordan Elimination, the Jacobi Iterative method, etc. In a private communication, Eichhorn (1985) proposed a special solution method for this particular problem, because the covariance matrix, i.e., the inverse of the coefficient matrix of 8, is fortunately a positive definite matrix. The method will be described in another section of this chapter. In our work, we use this special method for solving the linear equations above for 8. Step 5. Calculate the new vector Vn, which now is the improved correction vector to the observations, from = f (4+f8). The dimension of the vector v is 2m. Denote Q=$fi8 and R=fTwQ, Q and R both are also 2mvectors. We can see that Q = (QIQ2,...Qi,..,Qm)T (436) where afli i ax xi 1 f vxi ax. afl v . ayi 7 1=1 7 v i + af2i ayi 7 fli + 111 7 f2i + 1=1 af fli ^j 2i a. j Qil Qi2 8. Da 3 af 6j a. 2i 6j Da j (436') In the case of w being blockdiagonal, R = (RI, R2, ..., Ri, ..., Rm)T (437) where Ril 1 Ri2 i.e., Qi = Ri li S(QilWil+Qi2Wi3) + ax lii (Qilil+Qi2i3) + i n i Therefore, in this case, n= (V1, V2, where Vi v Iyi ) f 2i  (QilWi4+Qi2Wi2) axi i 2i (QilWi4+Qi2Wi2) aYi *..., i, ..., Vm)T = ilRil + oai3Ri2 i4Ril + Ri2Ri2 S . (437') (438) (438') Step 6. Find the new approximate solutions an and in from an = a + , Xn = 0 + Vn Step 7. Determine if another iteration is needed. Calculate all quantities of Xi(new) xi(old) Yi(new) Yi(old) Xi(old) Yi(old) and I aj 61 (altogether 2m+7 quantities) and pick up the maximum value among them. If this maximum exceeds a preset specified value, say 108, return to the very beginning and take all of the steps once again. Only when the maximum is less than the specified value, we assume that good convergence has been achieved. Above is the scheme of Newton's method in our orbit problem. In real programming, some additional considera tions have been taken into account. For example, in the actual program, all variables are dimensionless. For rectangular coordinate pairs (x, y), two new quantities are defined, x y x = y = (439) x0 Y0 where (x0, Yo) are the "observations" in the initial data, so that x = x0x', y = YOY'. For the seven parameters, seven new variables are defined as well. They are n T a e i aI a = a3 = a = a = 1 2 3 4 5 no TO a e0, i0 w Q a = a (439') WO 0 g where (no,T0,a0,e0,io0,0,n0) in oa is the initial approximate solution and e is defined as e=sino and e0=sino0. In terms of these new variables, all formulae for computing the partial derivatives need only slight modifica tions, and the whole process remains in principle unchanged. In addition, we have seen that fi, a, ftT are square 2mx2m matrices. If the number of observations is, e.g. m=100, the covariance matrix a has 40,000 components, and these three matrices alone occupy a huge storage, 120,000, in the computer. In practical programming, we should keep the required computer storage to a possible minimum. A blockdiagonal covariance matrix a can be stored in a 2mx2 matrix, and the same applies to all other related matrices. This greatly reduces the need for storage. Weighting of Observations As we know, the measures of any binary star will be affected by the accidental and systematic observational errors and, occasionally, from blunders, i.e., actual mistakes. The points, when plotted, will not lie exactly on an ellipse but will occupy a region which is clustered around an ellipse only in a general way. Observations are frequently combined into normal places. Among the observed quantities, the time is the one observed most precisely. To investigate the measurements for discordance before using them to calculate an orbit, the simplest method is to plot the distance p in seconds of arc and the position angles 6, separately, as ordinates, against the times of observation as abscissae. Smooth curves are drawn to represent the general run of the measures and in drawing these curves, more consideration will naturally be given to those observation points which are relatively more precise (for example, a point based upon several well agreeing measures by a skilled observer and supported by the preceding and following observations) than to the others. The deviation of the observation points is in the ordinate direction only and gives a good idea of the accuracy of both observed quantities. The curves will show whether or not the measures as a whole are sufficiently good to warrant the determination of a reasonably reliable orbit. Observations which are seriously in error will be clearly revealed and should be rejected or given very low weights. The curves will also give a general idea of how the observations are to be weighted. The points which show larger deviations should be given lower weights and the welldetermined points should be given higher weights. It is hard to recommend a general rule for the weighting of measurements and normal positions. Some precept could be considered (W. D. Heintz, 1971): compute a weight pl according to the number of observations, and P2 according to the "weight" assigned to the observers, then the normal place receives the weight pP PlP2 (if computations are made in the quantities dO and dp/p). This precept could avoid unduly high weights for single measurements as well as for very many observations by few observers, and would reduce the influence of residual systematic errors. Its implicit assumption is a proportionality of the errors pde and dp with ~p. This holds better for the multiobserver average, as the share by less accurate data usually increases at larger separations. In our work, we given the points on or nearly on the smooth curves equal unit weights, and assign lower weights to those farther from the lines. When we take into account the weights for the observa tions, some slight and very simple modification is needed in the solution process described above. Let G be the matrix of weights for the observational errors v. The dimension of G is 2mx2m, and G is diagonal. Taking into account G, the residual function SO will be modified as 1 1 SO = vGa1Gv = vGoGv (440) 2 2 because GT = G. If we define E1 = Go1G, then 1 S = v E v (440') 2 which has the exact form as before except a is replaced by E. Therefore we need only to replace a by E, o1 by E1 wherever a of ol appears. Because G is diagonal, the calculations of E and E1 from G and a are also very simple. The Orthogonal Set of Adjustment Parameters and the Efficiency of a Set of Orbital Parameters As mentioned in the first section of this chapter, Eichhorn (1985) has proposed a special method in a private communication for solving the linear normal equations A6=B of our problem. This method is as follows. According to its definition, A=ffTwfj is obviously a symmetrical positive definite matrix. Given a symmetrical positive definite matrix Q, we look for its eigenvectors X which have the property that QX = XX (441) Any value X which satisfies Eq. (441) is an eigenvalue of Q, and since Q is positive definite we know that all X>0. Obviously X must satisfy the equation IQIX = 0, the "characteristic equation" of Q. This is a polynomial in X of the same order n as that of Q. The solutions are X11 X2,...,Xn. The solution Xk of the homogeneous system (441) with X=XK is the kth eigenvector. Since this is determined only with the uncertainty of an arbitrary scale factor, we may always achieve jXKI = 1 for all k. Let the matrix P=(Xl,X2,...,Xn) be the matrix of the normalized eigenvectors of Q. It can be shown that P is therefore orthogonal, so that PTp1. Equation (441) can be written as QP = Pdiag(Xl, ..., Xn) (442) Writing the diagonal matrix diag(Xl, X2, ..., An)=D, we rewrite Eq. (442) as QP = PD , whence pTQP = D , pTQ1p = D1 Q = pDpT , Ql = PD1pT The equation (444) shows incidentally that the eigenvectors of a matrix are identical to those of its inverse, but that the eigenvalues of a matrix are the inverses of the eigen values of the inverse. Let Q be the covariance matrix of a set of statistical variates X. We look for another set Y as function of X, such that the covariance matrix of Y is diagonal. and (443) (444) Putting dX=a, the quadratic form which is minimized is aTQla, where Q is the covariance matrix of X. If we define Y = PT (445) then we have X = PY and dX = PdY or a = Pp aT = TpT . and in terms of 0 = dY, the quadratic form minimized which led to the values X becomes pTpTQ1pp, whence the covariance matrix of Y is seen to be (PTQlp)lpTQp=D by Eq. (444). It is then shown that Y=PTX is the vector of linear transforms of X whose components are uncorrelated. A vector of such components is called a vector of statistical variates with efficiency 1 (Eichhorn and Cole, 1985). If any of its components are changed (e.g. in the process of finding their values in a course of iteration), none of the other components of Y would thereby be changed in a tradeoff between changes of correlated unknowns. Now, back to the normal equations A8=B. The problem of solving normal equations above can therefore be attacked as follows. 1) Find the eigenvalues and eigenvectors of A, i.e., D1 and P. (A=Q1, according to the notation above.) 2) The covariance matrix of 8, Q, can then be calculated from Eq. (443), Q=A1=PDPT. 3) Let 8'=PTS. This gives AP8'=B, whence 8'=PTQB, or from Eq. (443), 8' = DPTB (446) Finding D1 from D is very easy because D is diagonal. The vector 8 is then easily calculated directly from 8=P8'. One of the advantages of this method is that we can calculate both 8 and 8', the correlated and uncorrelated elements of the corrections to the adjustment parameters, at the same time. The vector 8' is the set of the corrections to the orthogonal adjustment parameters. Furthermore, using this method, a measure for the "efficiency" of a set of adjustment parameters can be easily calculated, cf. Eichhorn and Cole (1985). They point out that the information carried by the vector 8 of the correlated estimates (whose covariance matrix is Q) is partly redundant because of the nonvanishing correlations between the estimates. What is the information content carried by these estimates? We see that the matrix pTQp is diagonal if P is the orthogonall) matrix of the normalized eigenvectors of Q and that it is also the (therefore uncorrelated) linear transforms 8'=P8 of 8. We might regard the number IQ e = (447) 911"'nn , that is, the product of the variances of the components of vector 8' (the product of the variances of the uncorrelated parameters) divided by the product of the variances of the components of vector 8 (the product of the variances of the correlated parameters) as a measure for the "efficiency" of the information carried by the estimates 8. But we note that according to the definition (447), the value of 6 is severely affected by the number of variables. In order to eliminate the effect of n, we redefine 6 as 1 S= (447') q 11" 'qnn ) . For any set of uncorrelated estimates we would have e=l, which is evidently the largest value this number may assume. In our work, for every model calculated, the efficiency of the set of adjustment parameters and their covariance matrix are calculated. A Practical Example Table 43 lists the observation data for 51 Tau. These data are provided by H. A. Macalister. The author would like to thank Macalister and Heintz for their data which he has used in this dissertation. Theoretically, the first step in our computation should be the reduction of the measured coordinates to a common epoch by the application to the position angles of correc tions for precession and for the proper motion of the system. The distance measures need no corrections. Practically, both corrections are negligibly small unless the star is near the Pole, its proper motion unusually large, and the time covered by the observations long. The precession correction, when required, can be found with sufficient accuracy from the approximate formula Ae = 8 80 = +0?00557sinasec6(tt0) (448) which is derived by differentiating Eq. (31) with respect to 8, and introducing the precessional change A(ncosa) for d6. The position angles are thus reduced to a common equinox tO (for which 2000.0 is currently used), and the resulting node g0 also refers to to, because A8=8980=Qo=AQ. Computing ephemerids, the equinox is reduced back from to to t. The change of position angle by proper motion, 6 80 = 1isina(tt0) (449) where the proper motion component in right ascension % is in degrees, can be neglected in most cases. The two formulae above can be found either in Heintz' book "Double Stars" or in Aitken's book "The Binary Stars". In table 44, all position angles have been reduced to the common equinox 2000.0 and the converted rectangular coordinates (x0,Y0) are also listed. In figure 41, all pairs of rectangular coordinates (x0,YO) are plotted in the x0Y0 plane. We can see at a glance that all observation points are distributed closely to an ellipse. Furthermore, in Figures 2a and 2b, we plot the distance p in seconds of arc and the position angles against the observing epoch, respectively. From Figure 2, we see that the observations fall upon nearly sinelike curves. Thus, we get the impression that these data are rather precise and therefore give all observations equal weights. For these data, an initial approximate solution is easily obtained by Kowalsky's method. The initial approximation a0 is listed in Table 45. Starting from this, the final solution for i is obtained after only three iterations and shown in Table 46. The calculation required only 47962 CPU time on a VAX. The residuals (observational errors) in (p,8) and (x,y) are shown in Table 47. The residuals in (p,8) and (x,y) are plotted against the observing epoch in Figures 43 and 44, respectively. They show a random distribution as expected. Also, Figure 45 shows the comparison of the observation points with the corresponding points after correction in the apparent plane. The "efficiency," the covariance matrix, the correla tion matrix and transformation matrix (which transforms the correlated parameters to the uncorrelated ones) of the adjusted parameters in the final solution are calculated and listed in Table 46. In addition, Table 46 also lists the standard deviations of a) the original and b) the uncorrelated parameters in the final solution. Table 43. The Observation Data for 51 Tau. HR1331 51 Tau HD 27176 SAO 76541 04185+2135 n 1975.7160 106.00 2.0 0.080 0.003 Al 1975.9591 91.9 1.7 0.074 0.003 Al 1976.8574 34.9 1.5 0.069 0.005 A2 1976.8602 33.5 1.5 0.073 0.009 A2 1976.9229 22.9 1.0 0.072 0.008 A3 1977.0868 26.7 1.0 0.083 0.008 A3 1977.6398 8.8 0.101 A6 1977.7420 3.1 0.8 0.110 0.008 A5 1978.1490 352.2 0.5 0.113 0.010 A5 n 1978.6183 340.7 0.8 0.108 0.008 A5 1978.8756 333.3 2.0 0.086 0.013 B4 1978.7735 304.3 0.090 A7 1980.1532 285.9 0.075 A8 1980.7182 259.0 0.079 A8 1980.7263 255.8 0.085 A8 1980.7291 259.1 0.087 A8 1982.7550 191.80 0.1343 C2 1982.7579 192.65 0.1362 C2 1982.7605 190.36 0.1315 C2 1982.7633 192.90 0.1381 C2 1982.7661 193.39 0.1308 C2 1983.0472 186.18 0.1333 C2 1983.0637 187.21 0.1499 C2 1983.7108 182.05 0.1456 C2 1983.7135 179.56 0.1480 C2 1983.9337 181.0 1.9 0.149 0.010 FA 1983.9579 176.7 0.157 RB 1984.0522 175.01 0.1446 C2 1984.0576 174.79 0.1445 C2 1984.0603 172.73 0.1355 C2 1984.779 157.3 0.146 RC 1984.9308 164.5 2.6 0.141 0.013 FB 1985.1063 161.0 2.7 0.137 0.013 FB 1985.2048 158.0 3.0 0.125 0.013 FB 1985.8378 145.69 0.1141 ## 1985.8406 144.53 0.1202 ## 1985.8541 145.71 0.1200 ## Table 44. The Reduced Initial Data for 51 Tau t 80 P0 x0 YO 1975.7160 1975.9591 1976.8574 1976.8602 1976.9229 1977.0868 1977.6398 1977.7420 1978.1490 1978.6183 1978.8756 1979.7735 1980.1532 1980.7182 1980.7263 1980.7291 1982.7550 1982.7579 1982.7605 1982.7633 1982.7661 1983.0472 1983.0637 1983.7108 1983.7135 1983.9337 1983.9579 1984.0522 1984.0576 1984.0603 1984.7790 1984.9308 1985.1063 1985.2048 1985.8378 1985.8406 1985.8541 106.00 91.90 34.90 33.50 32.90 26.70 8.80 3.10 352.20 340.70 333.30 304.30 285.90 259.00 255.80 259.10 191.80 192.65 190.36 192.90 193.39 186.18 187.21 182.05 179.56 181.00 176.70 175.01 174.79 172.73 157.30 164.50 161.00 158.00 145.69 144.53 145.71 0.0800 0.0740 0.0690 0.0730 0.0720 0.0830 0.1010 0.1100 0.1130 0.1080 0.0860 0.0900 0.0750 0.0790 0.0850 0.0870 0.1343 0.1362 0.1315 0.1381 0.1308 0.1333 0.1499 0.1456 0.1480 0.1490 0.1570 0.1446 0.1445 0.1355 0.1460 0.1410 0.1370 0.1250 0.1141 0.1202 0.1200 0.0222 0.0026 0.0565 0.0608 0.0604 0.0741 0.0998 0.1098 0.1120 0.1020 0.0769 0.0508 0.0207 0.0150 0.0207 0.0163 0.1314 0.1329 0.1293 0.1346 0.1272 0.1325 0.1487 0.1455 0.1480 0.1490 0.1568 0.1441 0.1439 0.1344 0.1348 0.1359 0.1296 0.1160 0.0943 0.0980 0.0992 0.0769 0.0740 0.0396 0.0404 0.0392 0.0374 0.0156 0.0061 0.0151 0.0355 0.0385 0.0743 0.0721 0.0776 0.0824 0.0855 0.0176 0.0300 0.0238 0.0310 0.0305 0.0145 0.0190 0.0054 0.0010 0.0028 0.0088 0.0124 0.0129 0.0170 0.0562 0.0375 0.0445 0.0467 0.0642 0.0696 0.0675 LO d7 r4 0 o 0 0 0 1 o o o 00 x o , 00 0. L r. 0 0 E 1 0 rd o (0) o o j 00 O 0 0 0 0 404 0 4. 0 0 000 rM C 5 o o o I I I a) (D JO spuooas) OX r4 q~ I I r (OJD 40 spuo0os) O/ 4T" S0) ro 04 0) 0 00 0 ) 00) 00 ^ 0 *T o 0 0 o x S0 I0 0I 0 x L 00 r rD n 0 O * (OJD 4o spuooas) x sI a 4 U) 0o 0) 4 I .rq 0 0o 44 0 C) .C 0 0 '4W I ri X 0) +1O &'t Table 45. Initial Approximate Solution ac for 51 Tau. PO(yr) TO a0 e0 i0o 0 O0 go 11.18 1966.4 0.128 0.181 127.3 152.9 170.2 78 (0000000 e ooooooo o n N I I I I I C r Ln n H M M I I Il n 0000000 o r I I I I I I I 3 p M M m m M w %D lw f %Oo iI I LA 0000000 u o o n 0D in ll *I n l l N t z ri S III I S1 C; A 0000000 r In 0 0 0 I I I 1I N 0 0 1 1 E4 '0 0 0 0 L 0 C> 0 r I I I I Io oI H 0 00 VO G C I I I I I N 10000000 e OD 1 r o o WWWWWtr S0 0 I I I I I I I 4 E 0o 0 o'' 0) 0 0 He nMnen'.e Hn r 0 0 .. SI I I ooo *r* m < MMiaMMMM 0 O OOOOO LA H H o11a111 * H II I 