An improved algorithm for the determination of the system parameters of a visual binary by least squares

MISSING IMAGE

Material Information

Title:
An improved algorithm for the determination of the system parameters of a visual binary by least squares
Physical Description:
xi, 138 leaves : ill. ; 28 cm.
Language:
English
Creator:
Xu, Yu-Lin, 1945-
Publication Date:

Subjects

Subjects / Keywords:
Double stars -- Orbits   ( lcsh )
Least squares   ( lcsh )
Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 1988.
Bibliography:
Includes bibliographical references.
Statement of Responsibility:
by Yu-Lin Xu.
General Note:
Typescript.
General Note:
Vita.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001098622
notis - AFJ4467
oclc - 19508136
sobekcm - AA00004831_00001
System ID:
AA00004831:00001

Full Text











AN IMPROVED ALGORITHM FOR
THE DETERMINATION OF THE SYSTEM PARAMETERS
OF A VISUAL BINARY BY LEAST SQUARES







By

YU-LIN XU
L_


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN
PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY


UNIVERSITY OF FLORIDA

1988


UNIVERSITY OF FLORIDA LIBRARIES























To my mother and late father















ACKNOWLEDGEMENTS


The author acknowledges his heartfelt gratitude to

Dr. Heinrich K. Eichhorn, his research advisor, for

proposing this subject, the considerate guidance and

encouragement throughout his research, and for the patience

in reading and correcting the manuscript. The author has

benefited in many ways as Dr. Eichhorn's student.

The author is also grateful to Drs. Kwan-Yu Chen,

Haywood C. Smith, Frank Bradshaw Wood and Philip Bacon for

having served as members of his supervisory committee and

for helpful discussions, timely suggestions and the careful

review of this dissertation. Likewise, his deep apprecia-

tion goes to Drs. W. D. Heintz and H. A. Macalister for

having provided data used in his dissertation.

The author is especially grateful to Drs. Jerry L.

Weinberg and Ru-Tsan Wang in the Space Astronomy Laboratory

for their considerate encouragement and support. Without

their support, the fulfillment of this research could not be

possible.

It is a great pleasure to acknowledge that all the

calculations were performed on the Vax in the Space

Astronomy Laboratory.


iii
















TABLE OF CONTENTS


page

ACKNOWLEDGEMENTS....................................... iii

LIST OF TABLES.......................................... vi

LIST OF FIGURES..........................................viii

ABSTRACT................................................. x

CHAPTERS


I


1

4


I INTRODUCTION.... ............... ............

II REVIEW AND REMARKS............................

Review of the Methods for Orbit-Computation...
Definitions of the Orbital Parameters....
The Method of M. Kowalsky and
S. Glasenapp...........................
The Method of T. N. Thiele, R. T. A.
Innes and W. H. van den Bos............
The Method of Least Squares...................
Remarks........................................

:II GENERAL CONSIDERATIONS........................

The Condition Equations.......................
The General Statement of the Least Squares
Orbit Problem..............................
About the Initial Approximate Solution........

IV THE SOLUTION BY NEWTON'S METHOD..............

The Solution by Newton's Method...............
Weighting of Observations.....................
The Orthogonal Set of Adjustment Parameters
and the Efficiency of a Set of Orbital
Parameters .......................................
A Practical Example...........................
Remarks........................................









page

V THE MODIFIED NEWTON SCHEME.................... 85

The Method of Steepest Descent................ 86
The Combination of Newton's Method with the
Method of Steepest Descent--The Modified
Newton Scheme............................... 92
The Application of Marquardt's Algorithm....... 98
Two Practical Examples ........................ 101

VI DISCUSSION................................... 134

REFERENCES............................................. 136

BIOGRAPHICAL SKETCH.................................... 138















LIST OF TABLES


Table Page

4-1 Expressions for All the Partial Derivatives
in f ............................................ 48

4-2 Expressions for All the Partial Derivatives
in f ............................. ................. 53

4-3 The Observation Data for 51 Tau.................. 73

4-4 The Reduced Initial Data for 51 Tau.............. 74

4-5 The Initial Approximate Solution A0 for 51 Tau... 77

4-6 The Final Solution for 51 Tau.................... 78

4-7 The Residuals of the Observations for 51 Tau
in (p,e) and (x,y)................................ 80

5-1 The Observation Data for 3738.................... 105

5-2 The Reduced Initial Data for P738................ 106

5-3 The Initial Approximate Solution A0 for p738..... 109

5-4 The Solution #1 for 3738....................... 110

5-5 The Residuals of the Observations for 3738
in (p,e) and (x,y) in Solution #1................ 112

5-6 Heintz' Result for P738 ........................ 116

5-7 The Solution #2 for 3738.................................. 117

5-8 The Residuals of the Observations for 3738
in (p,6) and (x,y) in Solution #2................ 119

5-9 The Observation Data for BD+1905116.............. 123

5-10 The Reduced Initial Data for BD+1905116......... 124

5-11 The Initial Approximate Solution A0 for
BD+195116 ............................ ........... 127









Table page

5-12 The Final Solution for BD+1905116 by the
MQ Method......................................... 128

5-13 The Residuals of the Observations for BD+1905116
in e,p,x and y................... ....... ....... 130


vii















LIST OF FIGURES


Figure Page

4-1 Plot of the observation data for 51 Tau in
the x0-Y0 plane................................. 75

4-2 Plot of a) the x0- b) y0-coordinates against
the observing epochs of the observation data
for 51 Tau....................................... 76

4-3 The residuals of the observation for 51 Tau
in (p,) ................................ ... 81

4-4 The residuals of the observations for 51 Tau
in (x,y) ........................................ 82

4-5 The original observations for 51 Tau compared
with the observations after correction.......... 83

5-1 Plot of the observation data for 3738 in the
x0-YO plane...................................... 107

5-2 Plot of a) the x0- b) y0-coordinates against
the observing epochs of the observation data
for B738......................................... 108

5-3 The residuals of the observations for B738
in (p,8) according to the solution #1........... 113

5-4 The residuals of the observations for 1738
in (x,y) according to the solution #1.......... 114

5-5 The original observations of 3738 compared
with the observations after correction
according to the solution #1..................... 115

5-6 The residuals of the observations for p738
in (p,6) according to the solution #2........... 120

5-7 The residuals of the observations for 3738
in (x,y) according to the solution #2........... 121


viii









Figure page

5-8 The original observations for p738 compared
with the observations after correction
according to the solution #2.................... 122

5-9 Plot of the observation data for BD+19*5116
in the x0-YO plane.............................. 125

5-10 Plot of a) the x0- b) y0-coordinates against
the observing epochs of the observations for
BD+1905116...................................... 126

5-11 The residuals of the observations for
BD+1905116 in (p,8)............................. 131

5-12 The residuals of the observations for
BD+1905116 in (x,y)............................. 132

5-13 The original observations for BD+1905116
compared with the observations after
correction..................................... 133















Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

AN IMPROVED ALGORITHM FOR
THE DETERMINATION OF THE SYSTEM PARAMETERS
OF A VISUAL BINARY BY LEAST SQUARES

By

YU-LIN XU

April 1988

Chairman: Dr. Heinrich K. Eichhorn
Co-Chairman: Dr. Kwan-Yu Chen
Major Department: Astronomy


The problem of computing the orbit of a visual binary

from a set of observed positions is reconsidered. It is a

least squares adjustment problem, if the observational

errors follow a bias-free multivariate Gaussian distribution

and the covariance matrix of the observations is assumed to

be known.

The condition equations are constructed to satisfy both

the conic section equation and the area theorem, which are

nonlinear in both the observations and the adjustment

parameters. The traditional least squares algorithm, which

employs condition equations that are solved with respect to

the uncorrelated observations and either linear in the

adjustment parameters or linearized by developing them in

Taylor series by first-order approximation, is inadequate in









our orbit problem. D. C. Brown proposed an algorithm

solving a more general least squares adjustment problem in

which the scalar residual function, however, is still

constructed by first-order approximation. Not long ago, a

completely general solution was published by W. H. Jefferys,

who proposed a rigorous adjustment algorithm for models in

which the observations appear nonlinearly in the condition

equations and may be correlated, and in which construction

of the normal equations and the residual function involves

no approximation. This method was successfully applied in

our problem.

The normal equations were first solved by Newton's

scheme. Practical examples show that this converges fast if

the observational errors are sufficiently small and the

initial approximate solution is sufficiently accurate, and

that it fails otherwise. Newton's method was modified to

yield a definitive solution in the case the normal approach

fails, by combination with the method of steepest descent

and other sophisticated algorithms. Practical examples show

that the modified Newton scheme can always lead to a final

solution.

The weighting of observations, the orthogonal para-

meters and the "efficiency" of a set of adjustment

parameters are also considered. The definition of

"efficiency" is revised.















CHAPTER I
INTRODUCTION


The problem of computing the orbit of a visual binary

from a set of observed positions is by no means new.

A great variety of methods has been proposed. As is well

known, only a few of these suffice to cover the practical

contingencies, and the majority fails to handle the input

data efficiently and properly.

For the visual binary case, the determination of an

orbit normally requires a large number of observations. All

measures of position angles and separations are, as are all

observations, affected by observational errors. For the

purpose of our work, these errors are assumed to follow a

bias-free multivariate Gaussian distribution. Under this

assumption, orbit-computing is a least squares adjustment

problem, in which the condition equations are nonlinear in

both the observations and the adjustment parameters. The

condition equations must incorporate all relationships that

exist between observations and the orbital parameters,


*Usually, the term "orbital elements" is used. We prefer
"orbital parameters" instead. Strictly speaking, the
orbital elements are the constants of integration in the
two-body problem and, therefore, do not include the masses
of the components.









that is, must state both the area theorem, which follows

from Kepler's equation, and the condition that the projected

orbit is a conic section. H. Eichhorn (1985) has suggested

a new form for the construction of a complete set of

condition equations for this very problem.

The traditional least squares algorithm, which is based

on condition equations linearized with respect to the

observational errors, will not lead to those orbital

parameters which minimize the sum of the squares of observa-

tional errors, because linearization, in this case, is too

crude an approximation. In some earlier papers, H. Eichhorn

and W. G. Clary (1974) proposed a least squares solution

algorithm, which takes into account the second (in addition

to the first) order derivatives in the adjustment residuals

(observational errors) and the corrections to the initially

available approximation to the adjustment parameters. A

completely general solution was published by W. H. Jefferys

(1980, 1981), who proposed a rigorous adjustment algorithm

for models in which the observations appear nonlinearly in

the condition equations. In addition, there may be non-

linear constraints* among model parameters, and the

observations may be correlated. In practice, the method is

nearly as simple to apply as the classical method of least


**We use the term "constraints" for condition equations
which do not contain any observations explicitly.






3

squares, for it does not require calculation of any deriva-

tives of higher than the first order.

In this paper, we present an approach to solve the

orbit problem by Jefferys' method, in which both the area

theorem and the conic section equation assume the function

of the condition equations.















CHAPTER II
REVIEW AND REMARKS


Review of the Methods for Orbit-Computation

Every complete observation of a double star supplies us

with three data: the time of observation, the position

angle of the secondary with respect to the primary, and the

angular distance (separation) between the two stars. The

problem of computing the so-called orbital elements (in this

paper, called orbital parameters) of a visual binary from a

set of observations superficially appears analogous to the

case of orbits in the planetary system, yet in practice

there is little resemblance between the problems. The

problem of determining the orbit of a body in the solar

system is complicated by the motion of the observer who

shares the motion of the Earth, so that, unlike in the case

of a binary, the apparent path is not merely a projection of

the orbit in space onto the celestial sphere.

In the case of a binary, the path observed is the

projection of the motion of the secondary round the primary

onto a plane perpendicular to the line of sight. The

apparent orbit (i.e., the observed path of the secondary

about the primary) is therefore not a mere scale drawing of

the true orbit in space. The primary may be situated at any









point within the ellipse described by the secondary and, of

course, does not necessarily occupy either a focus or the

center.

The problem of deriving "an orbit" (meaning a set of

estimates of the orbital parameters) from the observations

was first solved by F. Savary in 1827. In 1829, J. F. Encke

quickly followed with a different solution method which was

somewhat better adapted to what were then the needs of the

practical astronomer. Theoretically, the methods of Savary

and Encke are excellent. But their methods utilize only

four complete pairs of measures (angle and distance) instead

of all the available data and closely emulate the treatment

of planetary orbits. They are therefore inadequate in the

case of binary stars (W. D. Heintz, 1971; R. G. Aitken,

1935).

Later, Sir John Herschel (1832) communicated a

basically geometric method to the Royal Astronomical

Society. Herschel's method was designed to utilize all the

available data, so far as he considered them reliable.

Since then, the contributions to the subject have been many.

Some consist of entirely new methods of attack, others of

modifications of those already proposed. Among the more

notable investigators are Yvon Villarceau, H. H. Midler, E.

F. W. Klinkerfues, T. N. Thiele, M. Kowalsky, S. Glasenapp,

H. J. Zwiers, K. Schwarzchild, T. J. J. See, H. N. Russell,

R. T. A. Innes, W. H. van den Bos and others.









One may classify the various methods published so far

into "geometric" methods, which are those that enforce only

the constraint that the orbit is elliptical, and the

dynamicall" ones which enforce, in addition, the area

theorem.

The geometric treatment initiated by J. Herschel peaked

in Zwiers' method (1896) and its modifications, e.g. those

of H. M. Russell (1898) and of G. C. Comstock (1918). Every

geometric method has the shortcoming that it must assume the

location of the center of the ellipse to be known while it

ignores the area theorem and thus fails to enforce one of

the constraints to which the observations are subject. The

growing quantity and quality of observations called for

suitable computing precepts, and the successful return to

dynamical methods began with van den Bos (1926).

Of the many methods for orbit-computation formulated,

some are very useful and applicable to a wide range of

problems, e.g. those by Zwiers, Russell and those by Innes

and van den Bos.

Zwiers' method (1896) is essentially graphical and

assumes that the apparent orbit has been drawn. This method

is therefore useless unless the apparent ellipse gives a

good geometrical representation of the observations and

satisfies the law of areas, and thus will not be further

described here since we are primarily concerned with the

analytical methods.









In the following, we will briefly review Kowalsky's

method and that by Thiele and Innes.

Definition of the Orbital Parameters

Seven parameters define the orbit and the space

orientation of its plane. The first three of these (P,T,e)

are dynamical and define the motion in the orbit; the last

four (a,i,Q,w) are geometrical and give the size and

orientation of the orbit. The parameters are defined

somewhat differently from those for the orbits of planets

and comets.

The first dynamical parameter P is the period of

revolution, usually in units of mean sidereal years; n is

the mean (usually annual) angular motion; since n=2n/P, P

and n are equivalent. The second, T, is the epoch of

periastron passage (usually expressed in terms of years and

fractions thereof). The third, e, is the eccentricity of

the orbital ellipse.

The geometrical parameter a is the angle subtended by

the semi-major axis of the orbital ellipse (usually

expressed in units of arcseconds). The angle i is the

inclination of the orbital plane to the plane normal to the

line of sight, that is, the angle between the plane of

projection and that of the orbit in space. It ranges from 0

to 1800. When the position angle increases with time, that

is, for direct motion, i is between 0O and 90; for retro-

grade motion, i is counted between 90 and 180;









and i is 900 when the orbit appears projected entirely onto

the line of nodes. The "node", 0, is the position angle of

the line of intersection between the tangential plane of

projection and the orbital plane. There are two nodes whose

corresponding values of 9 differ by 1800. That node in

which the orbital motion is directed away from the sun is

called the ascending node. We understand 9, which ranges

from 0 to 3600, to refer to the ascending node. Because it

is, however, one of the peculiarities of the orbit-

determination of a visual binary that it is in principle

impossible--from positional data alone--to decide whether

the node is the ascending or descending one Q may be

restricted to 00501800. The last, w, is the longitude of

the periastron in the plane of the orbit, counted positive

in the direction of the orbital motion and starting at the

ascending node. It ranges from 0 to 360.

These definitions are adhered to throughout our work.

Some of them may somewhat differ a little from those given

by previous authors. But any way in which one defines them

does not affect the principles of the method we describe.

The Method of M. Kowalsky and S. Glasenapp

This old method was first introduced by J. Herschel in

a rather cumbersome form and is better known in its more

direct formulation by M. Kowalsky in 1873 and by

S. Glasenapp in 1889. R. G. Aitken (1935) gives the









detailed derivation of the formulae in his textbook The

Binary Stars.

Kowalsky's method is essentially analytical. It

derives the orbit parameters from the coefficients of the

general equation of the ellipse which is the orthogonal

projection of the orbit in space, the origin of coordinates

being taken at the primary. The projected orbit can be

expressed by a quadratic in x and y, whose five coefficients

are related to those five orbital parameters which do not

involve time.

In rectangular coordinates, the equation of an ellipse,

and thus of a conic section, takes the form



c1x2 + 2c2xy + c3Y2 + 2c4x + 2c5Y + 1 = 0 (2-1)



where the rectangular coordinates (x,y) are related to the

more commonly directly observed polar coordinates (p,8) by

the equations


x = pcose (2-2a)

y = psine (2-2b)



where p is the measured angular distance and 8 the position

angle.









The five coefficients of equation (2-1) can be deter-

mined by selecting five points on the ellipse or by all the

available observations in the sense of least squares.

There is an unambiguous relationship between the five

coefficients (c1,c2,c3,c4,c5) and the five orbital para-

meters (a,e,i,w,n). We can find the detailed derivation of

the formulae in Aitken's The Binary Stars. Here, we will

therefore state only the final formulae without derivation.

The five orbital parameters (a,e,i,w,Q) can be calcu-

lated from the known coefficients (c1,c2,c3,c4,c5) by the

following procedure.

1) The parameter a can be found from the equation


2(c2-c4c5)
tan2 = (2-3)
2 2
(c5 -c42+C1-C3)


To determine in which quadrant 9 is located, we can use

two other equations:



c'sin2n = 2(c2-c4c5) (2-3'a)

c'cos2Q = c52-c42+c1-c3 (2-3'b)



where


tan2i
c' (2-3'c)
p2
p


which is always positive.









More elegantly, we write in Eichhorn's notation,



29 = plg[2(c2-c4c5),c52-c42+Cl-C3] (2-3'd)



H. Eichhorn (1985), in his "Kinematic Astronomy"

(unpublished lecture notes), defines the plg(x,y) function

as follows:

plg(x,y) = arctan(x/y) + 90[2-sgnx(l+sgny)] ,


where arctan is the principal value of the arctangent.

2) The inclination i is found from

2 2
2cosn(c5 +c4 -cl-c3
tan2i = -2 + (2-4)
cos2Q(c52+c4-c1-c3)-(c52-c42+c1-c3)


Whether the i is in the first or second quadrant is deter-

mined by whether the motion is direct or retrograde. If the

motion is direct, the position angle increases with time,

i<900; otherwise, i>900.

3) The equation



w = plg[l-(c4sinQ-c5cosQos i,c4os++c5sinn) (2-5)


gives the value of w.

4) With i,w,Q known, two more parameters (a and e) can be

calculated from









2cos2n(c4sinn-c5coso)cosi
e = (2-6)
2 2 2 2
[cos2Q(c5 +c4 -c1-c3)-(c5 -c4 +c1c3)]sinw


2cos2Q
a = (2-7)
[cos2((c5 +c4 cl-c3)-(c52-c4+cl-c3)](1-e2


5) To complete the solution analytically, the mean motion n

(or the period P) and the time of periastron passage T, must

be found from the mean anomaly M, computed from the observa-

tions by Kepler's equation:



M = n(t-T) = E-esinE (2-8)



where E is the eccentric anomaly. Every M will give an

equation of the form



M = nt + (2-9)



where r=-nT.

From these equations the values of n and T are computed

by the method of least squares.

This is essentially the so-called Kowalsky method. It

looks mathematically elegant. But because it is not only

severely affected by uncertainties of observations but also

ignores the area theorem, it has a very poor reputation

among seasoned practitioners. However, in our work, we use









it for getting the initial approximation to the solution.

It serves this purpose very well.

The Method of T. N. Thiele, R. T. A. Innes and
W. H. van den Bos

T. N. Thiele (1883) published a method of orbit

computation depending upon three observed positions and the

constant of areal velocity.

The radii vectors to two positions in an orbit subtend

an elliptical sector and a triangle, the sector being

related to the time interval through the law of areas.

Gauss introduced the use of the ratio: "sector to triangle"

between different positions into the orbit computation of

planets, and Thiele applied the idea to binary stars.

Although the method could have been applied in a wide range

of circumstances, it became widely used only after Innes and

van den Bos revived it.

In 1926, R. T. A. Innes (Aitken, 1935), seeking a

method simpler than those in common use for correcting the

preliminary parameters of an orbit differentially, indepen-

dently developed a method of orbit computation which differs

from Thiele's in that he used rectangular instead of polar

coordinates. W. H. van den Bos's (1926, 1932) merit is not

merely a modification of the method (transcribing it for the

use with Innes constants) but chiefly in its pioneeringly

successful application. The device became most widely

applied. Briefly, the method of computation is as follows.








The computation utilizes three positions (pi,9i) or the

corresponding rectangular coordinates (xi,Yi) at the times

ti (i=1,2,3). The area constant C is the seventh quantity

required. Thiele employs numerical integration to find the

value of C.

First, from the observed data, find the quantities L by



L12 = t2-tl-D12/C (2-10a)

L23 = t3-t2-D23/C (2-10b)

L13 = t3-tl-D13/C (2-10c)


where Dij = pipjsin(ej-ei), are the areas of the corre-

sponding triangles.

Then, from the equations



nL12 = p-sinp (2-11a)

nL23 = q-sinq (2-11b)

nL13 = (p+q)-sin(p+q) (2-1ic)



the quantities n, p and q can be found by trials, for in the

three equations above there are only three unknowns, i.e.,

n, p and q. The eccentric anomaly E2 and the eccentricity e

can thus be computed from


(D23sinp-D12sinq)
esinE, = (2-12a)
(D23+DI2 -D3)









(D23cosp+D12cosq-D13)
ecosE2 = (2-12b)
(D23+D12-D13)


After E2 and e are obtained, the two other eccentric

anomalies E1 and E3 can be found from



E1 = E2 p (2-13a)

E3 = E2 + q (2-13b)



The Ei are used first to compute the mean anomalies Mi from

equations (2-8), which lead to three identical results for

T as a check, and second to compute the coordinates Xi and

Yi from



Xi = cos Ei e (2-14a)

Yi = sinEijl-e2) (2-14b)



By writing



xi = AXi + FYi (2-15a)

yi = BXi + GYi (2-15b)


the constants A, B, F, G are obtained from two positions,

the third again serving as a check. These four coeffi-

cients, A, B, F, G, are the now so-called Thiele-Innes

constants. In addition to n, T, e, which have already been









determined, the other four orbital parameters a, i, n, w can

be calculated from A, B, F, G through



Q+w = plg(B-F,A+G) (2-16a)

Q-w = plg(B+F,A-G) (2-16b)


t i (A-G)cos(w+Q) (B+F)sin(w+Q)
tan == (2-16c)
2 (A+G)cos(w-D) (B-F)sin(w-Q)

2i
A+G = 2acos(w+Q)cos (2-16d)
2 .


In addition to the brief introduction to this method given

above, a detailed description of it can be found in many

books, e.g., Aitken's book The Binary Stars (1935) and W. D.

Heintz's book Double Stars (1971).

A more accurate solution is obtained by correcting

differentially the preliminary orbit which was somehow

obtained by using whatever method. This correction can be

achieved by a least squares solution.


The Method of Least Squares

The method of least squares was invented by Gauss and

first used by him to calculate orbits of solar system bodies

from overdetermined system of equations. It is the most

important tool for the reduction and adjustment of observa-

tions in all fields, not only in astronomy. However, the

traditional standard algorithm, which employs condition

equations that are solved with respect to the (uncorrelated)









observations and either linear in the adjustment parameters

or linearized by developing them in Taylor series which are

broken off after the first order terms, is inadequate for

treating the problem at hand. An algorithm for finding the

solution of a more general least squares adjustment problem

was given by D. C. Brown (1955). This situation may briefly

be described as follows.

Let {x} be a set of quantities for which an approxima-

tion set {x0} was obtained by direct observation. By

ordering the elements of the sets {x} and {xo) and regarding

them as vectors, x and x0, respectively, the vector v=x-x0

is the vector of the observational errors which are

initially unknown. Assume that they follow a multivariate

normal distribution and that their covariance matrix a is

regarded as known. Further assume that a set of parameters

{a} is ordered to form the vector a. The solution of the

least squares problem (or the adjustment) consists in

finding those values of the elements of {x} and {a} which

minimize the quadratic form vTa-lv while at the same time

rigorously satisfying the condition equations



fi({x}i,{a}i)=0 (2-17)


This is a problem of finding a minimum of the function

vTa-lv subject to condition equations. A general rigorous

and noniterative algorithm for the solution exists only for








the case that the elements of {x} and {a} occur linearly in

the functions fi. When fi are nonlinear in the elements of

either {x} or {a), or both, equations which are practically

equivalent to the fi and which are linear in the pertinent

variables can be derived in the following way.

Define a vector 6 by a=a0+6. The condition equations

can then be written


f(x0+v, a0+6) = 0 (2-18)


where the vector of functions f=(fl,f2 ..., fm)T, m being

the number of equations for which observations are

available. Now assume that all elements of {v) and {86 are

sufficiently small so that the condition equations can be

developed as a Taylor series and written



f0 + fxv + fa6 + 0(2) ... = 0 (2-18'a)


where f0 = f(x0.a0) fx -j and fa = -
0 0xxo0,a 0 @aj x0,a0


If the small quantities of order higher than 1 can be

neglected, we can write the linearized condition equations

as


f0 + fxV + fa6 = 0 .


(2-18'b)








These are linear in the relevant variables, which are the

components of v and of 6.

In order to satisfy the conditions (2-18'b), we define

a vector -2p of Lagrangian multipliers and minimize the

scalar function


S(v,6) = vTo-lv 21(f0 + fxV + fa6) (2-19)


in which the components of v and 6 are the variables.

Setting the derivatives (aS/av) and (3S/l6) equal to zero

and considering equations (2-18'b), we obtain


6 = [faT(fxfxT)fal]-fa T(fxUfT)-lf (2-20a)



=- (fxafxT)-l(fa+fO) (2-20b)


v = afXT (2-20c)


where we have assumed that fxofxT is nonsingular. This is

the case only if all equations fi=0 contain at least one

component of x; i.e., if there are no pure constraints of

the form gi(a)=0. This case (which we shall not encounter

in our investigations) is discussed below in the description

of Jefferys' method.

Note that in constructing the scalar function S in

expression (2-19), first order approximations have









been used. In some cases, the linearized representation of

Eq. (2-18) by Eq. (2-18'b) is not accurate enough. In some

of these cases, the convergence toward the definitive

solution may be accelerated and sometimes be brought about

by a method suggested by Eichhorn and Clary (1974) when a

strictly linear approach would be divergent. Their solution

algorithm takes into account the second (as well as the

first) order derivatives in the adjustment residuals

(observational errors) and the corrections to the initially

available approximations to the adjustment parameters. They

pointed out that the inclusion of second order terms in the

adjustment residuals is necessary whenever the adjustment

residuals themselves cannot be regarded as negligible as

compared to the adjustment parameters, in which cases the

conventional solution techniques would not lead to the

"best" approximations for the adjustment parameters in the

sense of least squares. The authors modified the condition

equations as


M6
f0 + fxv + fa6 + Vv + D6 + or = 0 (2-18")
Nv


Correspondingly, the scalar function to be minimized becomes


M6
S"(v,6) = vTo-1v 21L(fo+fxV+fa6+Vv+D6+or) .(2-19")
Nv








1 T
Here, the i-th line of the matrix D is -o E., and
2



E. = (--- 2-21)
KajBakj x0,a0


the matrix of the Hessean determinant of fi with respect to

the adjustment parameters. Similarly, the i-th line of

1
vector V is -vTWi, and
2



W. =ax ax (2-22)
S axj xk x0,a0 ;


the i-th line of M is vTHi, and



H. axa) = (2-23)



and that of N is THTi, so evidently NM=Nv.

Minimizing S", also 6, v can be calculated.

For this algorithm in detail, one can refer to the

original papers of Eichhorn and Clary. The notation used

here is slightly different from the original one used by the

authors.

Jefferys (1980, 1981) proposed an even more accurate

algorithm which also improves the convergence of the









conventional least squares method. Furthermore, his method

is nearly as simple to apply in practice as the classical

method of least squares, because it does not require any

second order derivatives. Jefferys defines the scalar

function to be minimized as

1 1
s = -vT-1v + fT(,) + gT(a) (2-24)
2


where i=x0+v; x,a are the current "best" approximations to x

and a; and g is another vector function consisting of the

constraints on the parameters; I., y are vectors of

Langrangian multipliers and evaluated at (i, a). Minimizing

S with respect to i and a, he arrives at the normal equa-

tions


o-1i + fT(ii) = 0 (2-25a)

fx( ) + gaT(a) = 0 (2-25b)

f(:,i) = 0 (2-25c)

g(a) = 0 (2-25d)

where


fR = fx fi = fa
Ix,a, x,a.


These equations are exact and involve no approximations.

This is the significant difference between Jefferys' method

and those of Brown and of Eichhorn and Clary.









Remarks

As mentioned before, Kowalsky's method will most likely

not produce the best obtainable results, because the

relative observed coordinates (x, y, t) are subjected only

to the condition (2-1), which involves only five of the

seven necessary orbital parameters as adjustment parameters

and does not enforce the area theorem. It can therefore

never be used for a definitive orbit determination since it

completely ignores the observation epochs.

Yet, Eq. (2-1) has the advantage that it appears to be

simple, in particular it is linear in the adjustment

parameters albeit not in the observations. When it is used

for the determination of orbits, the right-hand sides of the

equations which result from inserting a pair (x, y) of

observed rectangular coordinates into Eq. (2-1) are regarded

as errors with a univariate Gaussian distribution (i.e., as

normally distributed errors). One may then perform a least

squares adjustment which is linear in the adjustment

parameters. As Eichhorn (1985) pointed out, this approach,

while it has the advantage that approximation values for the

adjustment parameters need not be available at the outset,

fails to take into account two facts.

1) It is not the right-hand sides of the condition equations

which are to be considered as normally distributed errors,

but rather the observations (x,y) or (p,e). The condition

equations (2-1) thus contain more than one observation each.









Since the observations occur in the condition equations

nonlinearly, the matrix


fx = f(x,a)
x J x=x0,a=a0


must be found. This requires knowledge of approximate

values ao for a. Approximate values x0 for x are

available--they are the observations themselves.

Approximate values a0 for a may sometimes indeed be obtained

in the classical way by regarding the right-hand sides of

the condition equations as normally distributed errors. In

addition, it should also be taken into account that the

covariance matrix a of the observations is not necessarily

diagonal.

2) In some cases, especially when the binary under study is

very narrow, the errors of the observations are not

negligibly small compared with the adjustment parameters.

This requires either that second-order terms in the observa-

tional errors v be carried in the equations or, as Jefferys

has pointed out, that iterations be performed using in the

evaluation of the matrices f, and fa not only improved

approximations for a but also improved values for the

observed quantities as they become available.

If Kowalsky's methods were so modified, the algorithm

would yield better values for the adjustment parameters a









than the traditional approach. Either way, one can usually

find an initial approximation by Kowalsky's method.

With respect to both the theoretical clarity and the

practical applicability, as far as it is concerned, the

Thiele-Innes-van den Bos method leaves nothing to be

desired. However, the three places selected, even when

smoothed graphically or by some computation, may not suffice

to describe the motion with sufficient accuracy, so that

large and systematic residuals may remain, particularly near

periastron. The method is seriously inadequate even if one

of the ratios sector to triangle is very close to 1 and thus

strongly affected by the measurement errors or if the area

constant C is not initially known to the required accuracy.

The computation may then produce an erroneous orbit with

spuriously high eccentricity, perhaps a hyperbolic one, or

no solution at all. And obviously, different combinations

of the three positions selected from a set of observations

will not likely give the same result. This method therefore

fails to use the information contained in the observations

in the best possible way.

In our work we try to present a fairly general least

squares algorithm to solve the orbit problem. We shall

adopt Jefferys' least squares method as our basic approach.















CHAPTER III
GENERAL CONSIDERATIONS


This chapter contains a general discussion of the least

squares orbit problem. We shall set up condition equations

which simultaneously satisfy the ellipse equation and the

area theorem.

We have seen that it is not sufficient to use Eq. (2-1)

as the only type of condition equation because this would

ignore the observing epochs, cf. last chapter. Completely

appropriate condition equations must explicitly contain the

complete set of the seven independent orbital parameters as

the adjustment parameters. Also, to be useful in practice,

they must impose both the geometric and dynamical condi-

tions, and must lead to a convergent sequence of iterations.

After the condition equations are established, we

present the general outline of the algorithm which solves

the orbit problem by Jefferys' method of least squares.

We also discuss some further suggestions for obtaining

the initial approximate solution required for the least

squares algorithm.









The Condition Equations

Assume that a set of observations {x0} was obtained

consisting of complete data triples (t, p, 6), which

measure the positions of the fainter component (secondary)

with respect to the brighter one (primary): the position

angle 8 is counted counterclockwise from North and ranges

from 0 to 3600; the angular separation p (also called

distance) is usually given in seconds of arc, and t is

the observing epoch. The conversion of (p, 6) to

rectangular coordinates (x, y) in seconds of arc is as

following:



Declination difference 8c-6p = x = pcos9, (3-la)

Right ascension difference (ac-ap)cos6 = y = psin6, (3-1b)



where 8c, ac are the declination and right ascension,

respectively, of the secondary; 6p, ap those of

primary.

Equivalently, the observations can also be regarded

as relative coordinates (t, x, y) of the secondary with

respect to the primary.

It might be worthwhile to point out that 1) even though

the formulae (3-1) are approximations valid only for small

values of p, they may be regarded as practically rigorous

for binaries; 2) we are following the custom in double star








astronomy by having the x-axis along the colure* and the y-

axis tangential to the parallel of declination.

All observations in {x0} are affected by observational

errors. Let {x} be the set of the true values of the

observations, that is, those values the observations would

have had if there were no observational errors. By ordering

the elements of the sets {x0} and {x} and regarding them as

vectors, x0 and x respectively, we have seen that we may

write the vector of observational errors as v = x x0.

These errors are of course unknown, but as mentioned already

in the last chapter, we assume that they follow a multi-

variate normal distribution with known covariance matrix.

For visual binaries, the relative orbit must be an ellipse

(strictly speaking, a conic section) in space as well as in

projection. All pairs (x, y) must therefore satisfy the

condition equations (2-1):



C1x2 + 2C2xy + C3y2 + 2C4x + 2C5Y + 1 = 0 ,


which implicitly involve five of the seven orbital para-

meters but do not enforce the area theorem.

The well-known relationships between the five coeffi-

cients (C1, C2, C3, C4, C5) in Eq. (2-1) and the five


*Following Eichhorn's terminology who uses the term "colure"
generally for any locus of constant right ascension.








orbital parameters (e, a, i, w, 0) by way of the Thiele-

Innes constants, have been discussed in the last chapter.

Consider a right-handed astrocentric coordinate system

K whose X-Y plane is the true orbital plane such that the

positive X-axis points toward the periastron (of the

secondary with respect to the primary). The positive Y-axis

is obtained by rotating the X-axis by 90 on the Z-axis in

the direction of the orbital motion. The axes of a second

astrocentric, right-handed coordinate system k are parallel

to those of the equator system Q. The two systems K and k

are related by the transformation



xK = R3(w)Rl(i)R3(n)Xk (3-2)



From the theory of the two-body problem we know that,

in the system K, the coordinates of the secondary with

respect to the primary are given by


x = aLcosE e
x Y = a sinEV1-eJ (3-3)
Z J0


where E is the eccentric anomaly, which is the solution of

Kepler's equation


n(t-T) = E-esinE .


(3-4)









Here, n and T, the mean motion and the periastron epoch, are

the two orbital parameters not involved in Eq. (2-1).

From Eq. (3-2) we obtain


[A B C" xI

K = F G H y

K L M, z, .


where


Kx + Ly
z =


(3-6)


and


SA B C '

F G H = R3(w)RI(i)R3(0)

K L M ,



or, in detail,

A = cosacosw sinasinwcosi

B = singcosw + cosasinwcosi

C = sinwsini ;

F = -cosnsinw sinncoswcosi

G = -sinnsinw + cosQcoswcosi

H = cososini

K = singsini


(3-5)


(3-7)


(3-8a)

(3-8b)

(3-8c)

(3-8d)

(3-8e)

(3-8f)

(3-8g)








L = -cosasini ;(3-8h)

M = cosi (3-8i)



where in this notation, the traditional Thiele-Innes

constants would be aA, aB, aF and aG.

From Eq. (3-7) and (3-5) we can get


Gx Fy
X = Ax + By + Cz = (3-9a)
M


Bx Ay
Y = Fx + Gy + Hz = (3-9b)
M


Thus, we see that X and Y can be expressed in terms of i,w,Q

and the observations (x, y).

From Eq. (3-3) we obtain


X
cosE = + e (3-10a)
a


Y
sinE = -a 1 (3-10a)
a l-e2


Combining (3-10) with Kepler's equation (3-4), we get


X [ eYl
+ e = cos n(t-T) + (3-11a)
a a e









Y eY
a = sin n(t-T) + a (3-11b)



More succinctly, we have



U = cos[ n(t-T) + eV ] (3-12a)

V = sin[ n(t-T) + eV ] (3-12b)

or

U = cos[ n(t-T) + e1-U2 i (3-12'a)

V = sin[ n(t-T) + eV] (3-12'b)

with


x Y
U = + e V (3-12c)
a al 2


After X and Y are expressed in terms of i, w, 9 and (x, y)

as in Eqs. (3-9), Eqs. (3-11) or (3-12) involve exactly the

seven orbital parameters (n, T, a, e, i, w, 9) and the

observations (t, x, y). The observing epoch t now appears

explicitly, as it must if the area theorem is to be

enforced.

Now, we see that if both equations (3-11) are satis-

fied, Kepler's equation which enforces area theorem would be

satisfied and furthermore, the ellipse equation would also

be automatically satisfied as can be seen if t is eliminated

from Eqs. (3-11) so that these equations are reduced to one

equation in X and Y. If we select the two equations (3-11)









as the condition equations, we need no longer carry

Eq. (2-1) separately. Of course, we can use any one of

Eqs. (3-11) as well as Eq. (2-1) as the condition equations.

However, using Eq. (2-1) is not convenient, for it contains

the five coefficients directly, but not the five orbital

parameters themselves, even though there are unique rela-

tionships between them. Ideally, the condition equations

should have the adjustment parameters explicitly as

variables.

In our work, we use the Eqs. (3-11) as the complete set

of condition equations.


The General Statement of the Least
Squares Orbit Problem

As seen in the last section, the vector of "true

observations" x (presumably having been adjusted from the

observation x0 by the residuals v) and the vector of "true

orbital parameters" a, a=(n, T, a, e, i, w, g)T, must

satisfy the condition equations


X eY
fl(x0+v,a) = + e cos n(t-T)+ = 0 (3-13a)
a _e2




Y eY
f2(x0+v,a) = aY- sin n(t-T)+ = 0 (3-13b)
arLJ-e 2 a -e 2


where









S A B C x

Y = F G H y

S0 K L M z ,
with


Kx + Ly
z = -
M


and



A B C '

F G H = R3(w)RI(i)R3()

K L M



In our problem, there are no constraints between the

parameters which involve no observations so that Jefferys' g

function does not occur. The problem can therefore be

stated as follows.

Assume that the residuals {v} (regarded as vector v) of

a set of observations {x0} (regarded as vector x0) follow a

multivariate normal distribution, whose covariance matrix a

is regarded as known; we are to find the best approximations

of v (for v, the residuals) and i (for a, the parameters)

such that

fl(xo+v,a) = 0

and


f2(xo+v,a) = 0









are both satisfied while at the same time the quadratic form


1 1
S = v -1 (3-14)
2


is minimized.

Following the well-known procedure introduced by

Lagrange, the solution is obtained by minimizing the scalar

function


S = vT -1 + fT(k ) (3-15)
2


where x = xO + v, and i is the vector of Lagrangian multi-

pliers, together with satisfying the equations fl=0=f2.

Denoting the matrix of partial derivatives with respect to a

variable by a subscript, this is equivalent to solving the

following normal equations:


o-1 + f-T(i,a) = 0 (3-16a)

fTil = 0 (3-16b)

f(x,a) = 0 (3-16c)


We have stated before that these equations are exact and

therefore involve no approximations. Before Jefferys, all

authors used first order or second order approximations in

forming S in equation (3-15). This is the significant









difference that distinguishes Jefferys' method from those

employed by previous authors.

It is evident that the solution of the equations (3-16)

would solve the posed problem.


About the Initial Solution

The least squares algorithm requires an initial

solution as starting point. Any approach which leads to

approximate values of the orbital parameters serves this

purpose, because our algorithm does not require a very

accurate initial approximation. As long as the initial

approximation is not too different from the final result,

convergence can always be achieved. To obtain an initial

solution, the following procedures may lead to an initial

solution.

1) As mentioned in Chapter II, Kowalsky's method can produce

a preliminary solution. Inserting the pairs (x, y) of

observed rectangular coordinates into Eq. (2-1), we have a

set of linear equations in which the five coefficients (cl,

c2, c3, C4, c5) are the unknowns. By making a classical
least squares solution based on these linear equations, the

five coefficients can be computed. An estimate of the five

parameters (a, e, i, w, 0) can be obtained in turn from the

unique relationships between them and the five coefficients.

The remaining two parameters (n, T) also can then be

calculated from the known quantities simply by classical

least squares, as described in Chapter II.









As Eichhorn (1985) has pointed out, it is of course

better to use the modified Kowalsky method. Using (2-1) as

condition equations and the five coefficients as the

adjustment parameters, one may iterate by Jefferys'

algorithm toward the best fitting adjustment parameters and

the best corrections to the observations, v, that is, to

arrive at the values of a and v which minimize the scalar

function SO (see equation 3-14) while simultaneously

satisfying the condition equations.

This method is simple to apply in practice. But

unfortunately, especially when the observations are not very

precise, the five coefficients (cl, c2, c3, c4, c5) in some

cases do not always satisfy the conditions for an ellipse;

i.e., they do not meet the requirements



C1>0, C3>0 and C1C3-C22>0.


However, in these cases, it does not mean that there is no

elliptic solution at all and other approaches can be tried.

When this happens, one may for instance take the approach

outlined in Chapter 23 of Lawson and Harsson (1974).

2) By using some selected points among the observation data

instead of using all the data points, sometimes a solution

can be found by Kowalsky's method. Such a solution is

usually also good enough to be a starting approximation.






38

3) Or, carefully selecting three points among the observa-

tion data, one may use the Thiele-Innes-van den Bos method

to calculate an initial approximation. The method has been

described in Chapter II.














CHAPTER IV
THE SOLUTION BY NEWTON'S METHOD


The Solution by Newton's Method

In our problem, the normal equations (3-16) are

nonlinear. They must be solved by linearization and

successive approximations. Assume that approximate initial

estimates of the unknowns in the normal equations have

somehow been obtained (using whatever methods). This

initial approximation may be improved by Newton's method,

which consists of linearizing the normal equations about the

available solution by a first order development and

obtaining an improved solution by solving the linearized

equations. This process is iterated with the hope that

convergence to a definite solution would eventually be

obtained. This expectation is reasonable if the initial

approximation is sufficiently close to the final solution

and if the observational errors are not too large.

Following Jefferys' notation (which is also the

notation we have used in Chapter III), let the initial

approximate solution (and also the current approximate

solution during iteration) be given by (x,a), where

x = x0 + v, x0 being the vector of observation, v the vector

of observational errors (for which we adopt the nullvector








as initial approximation); a is the initial approximation of

the vector of the seven orbital parameters (adjustment

parameters); also let the corrections to both i and i be

denoted by e and 8, respectively.

The normal equations in our problem now become


o-1(r+g) + fRT = 0 (4-la)

fTT = 0 (4-lb)

f + f fe + f = 0 (4-ic)


Here we have ignored (as Jefferys did) products of e and 8

with Lagrangian Multipliers. This does not affect the final

result, as Jefferys also pointed out. A caret in equations

(4-1) above means evaluation at current values of i and a.

Similar to Jefferys' procedure, we solve the equations

(4-1) as follows.

Solving Eq. (4-la) for e we have


a = v of i (4-2)


Substituting (4-2) into (4-1c) for g, Eq. (4-1c) becomes


f fr_ fafTT + f6 = 0 (4-3)


Solving Eq. (4-3) for u, we obtain








4 = w(f f +i) + f68)


where the "weight matrix" w is given by


w = (f7of.T)-l
1


Inserting this solution for 4 into Eq. (4-16), we have


f-Tw(f fjv + fj8) = 0
a


(4-6)


If we now define


$ = f fiv


(4-7)


and rearrange Eq. (4-6), the equation for 8 will have the

form


(f-Twfs)8 = faTw


(4-8)


This set of linear equations is easy to solve for the

corrections 8 by general methods. With this solution for 8,

the improved residuals in are obtained from the equation


(4-9)


(4-4)


(4-5)








which follows from Eqs. (4-la), (4-4) and (4-7). We then

get the new vectors of an and in as



in = a + 8 (4-10a)

Xn = x + Vn (4-10b)


which constitute the improved solution.

After each iteration, we check the relative magnitude

of each component in v and 8 against the corresponding

component in k and a and get the maximum value among all of

these and test if this value is smaller than some specified

number, say 10-8. If the improved solution is still not

sufficiently accurate (i.e. if the above found maximum value

is still not smaller than the specified value), the process

of iterations has to be continued until convergence has been

attained, that is, until subsequent iterations no longer

give significant corrections.

At the outset, the obvious starting point for this

scheme is to adopt *=x0 as the initial approximation for the

"true observation" vector x (in this case, v=0), and to use

as first approximation of a for a a vector iO, an initial

solution of the seven orbital parameters, which has been

obtained somehow.

It is important for convergence that the initial

solution of (xi,) is not too different from the final

solution which is obtained by the process given by Eq. (4-1)








through (4-10). In Chapter III, we discussed how to find a

good approximation as an initial solution.

According to the scheme outlined above, the application

of Newton's method in our problem would consist of the

following steps.

Step 1.

Calculate f, fj and fi from the current values of x

and a ;

Step 2.

Calculate $ from $ = fvr ;

Step 3.

Calculate the "weight matrix" w from w = (fjof-T)-i

Step 4.

Solve the corrections to the parameters, 8, from the

linear equations (f-Twf)8 = f-Tw

Step 5.

Calculate the improved residuals vn from

Vn = af Tw(+f8) ;

Step 6.

Find the new approximate solution from

fi = a + 6

Xn = x0 + Vn
Step 7.

Test the relative magnitude of each component of 8 and

v against i and k, and decide if a further iteration is

needed, in which case all the steps above must be repeated.








In detail, the steps are as follows.

Step 1.

Calculate the vector of functions in condition equa-

tions f, and the vectors of partial derivatives fk and fa

from the current values of x and i, where i=x+O+V.

The dimension of the vectors x0, v, i all are 2m, m

being the number of observed positions.

The observation vector is


xO = (X10,Yl0,...,xi0,Yi0,...,xmOYmO)T (4-11)


The current vector of corrections to observations is


v = (Vxl,Vyl,...,Vxi,Vyi,...,VxmVym) (4-12)


From x0 and v, x can be easily found by =x0 +V,


k = (x10+VxlYi0+Vyl,.. .,x+xi0+xii+vy*,...

...,XmO+VxmYmO+Vym)T. (4-13)


The current approximation for the seven parameters, a, is


S= (n,T,a,e,i,w,n)T, (4-14)


at current values. Insert the known (i,a) into the









condition equations to get the function 2m-vector f. It has

a form



f = [fll,f21**flif2i,.***..,flmf2m ]T (4-15)

where


xi
f = + e cosE (4-16a)
11 i
a



f = sinE (4-16b)
21 i
b


with



b = aJl-e2 (4-16c)



and

eY.
E. = n(ti-T) + (4-16d)
b .


The coordinates X,Y are calculated from Eqs. (3-9),

A,B,F,G,M from Eqs. (3-8).

The first derivatives fi, fi are calculated at current

values of x and a.

The partial derivatives of f with respect to the

observations i, fk, is a block-diagonal 2mx2m square matrix

of the form








a11
ax1

af21
ax1


fl1
ay1

a21
ay1


afli
ax

21
af2i
axi


fli

aYi

af2i
ayi


(4-17)


i.e.,


(4-18)


with


afli
axi

af2i
axi


afli
ayi

af2i
ayi


(4-19)


i=1,m


(4-16), (3-8) and (3-9), we obtain


aflm
axm

f2m
axm


afm
aym

a2m
aym


fx = diag(g1,g2,...,gi,...,gm)


From Eqs.








e
-sinEi
b


1
0 -(1-ecosE.)
b


In particular, we have


ax. ax

axi ax


ay aY
xi ax
axi ax


G

M


B

M ,


axi

ayi



ayi


and therefore


e
-sinE.
b


1
0 -(1-ecosE )
b


(4-21)


aY A

ay M


The expressions for all the partial derivatives in fi are

listed in Table 4-1.

The dimension of fi, the vector of the partial deriva-

tives of f with respect to the seven parameters, is 2mx7.

It has the form


(4-23)


axi
1
axi

aYi

axi
1


axi

ay.
Yi

aYi

ayi


(4-20)


ax

ay


F

M


gi =


F

M


B

M
M


(4-22)


f& = (GI,G2,...,Gi,...,Gm,)









Table 4-1.

Expressions for All the Partial Derivatives in fi


fli


f2i


1

M


-1 esinEi
M a b i



- esinEi
M a b


B
- (1-ecosE.)
b


A
- (1-ecosE.)
b










Sfli fli fli li fli afli

an aT aa ae ai aw


f2i af2i af2i af2i af2i af2i
an aT .aa ae ai aw


The expressions for all the elements in Gi are listed below.


sinE.


-cosEi
1


af2i
aT


( ti-T -n


Xi eYi
sinE.
a ab


Y
(- -(lecosE) ;
ab


YisinEi
+ 1
b(l-e2)


e cosE.
Y2 Yi
b(1-e i


where


G. =
1


f2i1
af
)n


afli
an

af2i
an


(4-24a)


af
li
aa




af
2i _
ae


afli
ae


21i
3e


(4-24b)




(4-24c)



(4-24d)




(4-24e)









af i afi afli

ai aw an

2i 2i af21
ai aw an


e
-sinEi
b


0 -(l-ecosE )
b


ax. aX. ax.

ai aw an

aY. ay. ay.

a1 1aw a
ai aw an'


(4-24f)


From Eqs. (3-8) and (3-9) we can find the expressions for

the following six partial derivatives.


ax.

ai


ax
M Ay Bxi ,
aw
i i

aX.
M Gyi + Fx. ,
an


aY.
ai



ay.
aw
1
a(n


aY.


zcos ;




Fyi Gxi
1 1


- By Ax .
1 1


In terms of these derivatives, we have


f afii af' 1

ai aw an Ma

af2i af2i af2i
Si aw a
ai aw ac b


e
z. sinw
--sinE zsin
Mb

1 e
- -cosEi z.cosw
Mb Mb


Ayi-Bxi Gyi+Fxi



Fyi-Gxi -Byi-Axi


(4-24h)


All expressions in fi are listed in Table 4-2.


(4-24g)










Table 4-2.

Expressions for All the Partial Derivatives in fi


(ti-T)sinEi




-nsinEi



Xi Yi
1 1i
- -- esinEi
a ab


a Y.
1 + sinEi
ae b(1-e )


a z. "sinw e
i -M-- + sinEicosw]
ai M a b



a 1 Ayi-Bxi e
- --- i + -sinE (Fyi-Gx )
aw M a b



a 1 Gy +Fx e
- M -sinE (BYi+Axi
an M a b


-(ti-T)cosEi




ncosEi



Yi
S- (1-ecosEi)
ab


Yi
S(e-cosEi)
b(l-e )


1
-(1-ecosE )z cosw
Mb



1
--(1-ecosEi )(Fy-Gxi)
Mb



1
--(ecosEi-1)(BYi+Axi)
Mb








Step 2.

Calculate $ from f, v and f* by $=f-fj'. The dimension

of the vector d is also 2m. It has the form



1= ( ,02,'* 0*,i m)T (4-25)


where

faf af li
fli Vxxi vyi il
axi ayi

i =2i
ff af2i
2i xi yi i2
ax i ay

(4-25')


Step 3.

Calculate the weight matrix w from fi and a.

The matrix fi has been calculated in step 1. The

covariance matrix a is assumed to be known. The dimension

of a is 2mx2m.

An example for computing a is shown below.

The relationship between the covariance matrix of

rectangular coordinates (x, y) and that of polar coordinates

(p, 6) is


xy a(x,y) a P a(x,y)
xy a(p,e) p a(p,e)








where x=pcose, y=psine and


a(x,y) cose -psine
-= (4-26')
a(p,8) sine pcose .

Thus, we have


y cos8 -psine C a 0 cose sine
xy sine pcose 0 Ce -psine pcose


and therefore



xv 2 2 2 2
SpCOS2e + (e 2sin2 lo ae 2)cosesine
y (= p- oep )cosesine a sin e + 0 p2cos2


(4-26")


In these expressions, ap=A2p, ae=A2e and Ap, A8 are the

observational errors in p and e, respectively. For the

observations, the random errors are of similar order of

magnitude as the systematic ones, larger in separation than

in position angle. The average errors pAe and Ap vary

somewhat with the separation p and can be assumed, for many

series of observations, to follow the form Cp1/3, where C

varies with different observers. For a single good observa-

tion C will not exceed '.'03 in position angle (pAe) and

0'.'08 in separation (Ap) (Heintz, 1971). Errors will be

somewhat larger and difficult to measure if one or both

components are faint. If the errors are expressed in the








dimensionless (relative) forms AO and Ap/p, it is seen that

they increase as pairs become closer.

Based on the considerations above, we can, for example,

put Ap=0'08p1/3 and pA6=0703p1/3, i.e. A6=0.03p-2/3. If we

allow each pair of numbers (xij,i) in an observation to be

correlated, but no correlations between different observa-

tions, a, would be block-diagonal. In this case, we have


a = diag(al,a2,...,oi,...,om) ,


where




i axixi axiyi a fil ai3
ayixi ayiyi ai4 ai2

with Oxiyi=oyixi, i.e., 0i3=ai4

The form of the weight matrix is simpler in this case;

it is also block-diagonal.

According to Eq. (4-5), w=(fiaf)-l1. The matrices fi,

o, f-T now are all block-diagonal. Therefore


w = diag(wl,w2,...,wi,...,wm) (4-29)


with








i il
Wi wi4


Wi3

wi2 *


(4-30)


The computation of w is straightforward. We first find w-1.

If we denote u=w-l=fjaf-T, it is obvious that u has the same

form


u = diag(ul,u2,...,ui,...,um) (4-31)


where


Uil

ui4


ui =


afli
axi

af2i
axi


fli
ayi

af2i
ayi


ui3 =

ui2


ail 3i3



ai4 i2


fli
axi

fli
ayi


af2i
axi
^i

f2i
ayi


(4-32)


i.e.,


ffli fli f li
Uil i 1- 2 a-x +- ai3 +
1 axi ax. Bay
11 1


a i2
i2


(4-33a)








2 2
2i f2 +2 2 I 2i": 0i +af3
fi2 fi 2f2i afi fi 2
ui2 = i + 2 oi3 + 0 (4-33b)
Saxi. axi ayi y3 2
1 1


Sfli af2i + fli f2i fli 2 f2i]
axi ax. i axi ayi ayi ax i3 +


afli af2i
ai2 ; (4-33c)
ayi yi


Ui4 = ui3 (4-33d)


After computing u, we can find its inverse w very easily.
If we denote u=uilui2-ui3ui4=uilUi2-u23, we have

Wil Wi3 ui2 i4

w = = (4-34)
ui3 Uil
wi4 wi2 u u

As seen above, we can calculate the components of w directly
from the components of u.
If the quantities which constitute different observa-
tions were also correlated, we would have to find the
product of the matrices fk, a, f-T and calculate its inverse
to get w. This would be more complicated by far.
Step 4.








Find fgTwfa and -f7T$ from fg, $ and w, then solve the

linear equations



(f-Twf)8s = f-wT



to get the correction vector 6.

The dimension of the matrix f-Twf- is 7x7 and -fTw$, 6

are 7-vectors. Denoting A=faTwf- and B=f-Tw$, we have

m af a d Bf. wf
m fli 1i f2i fli
A(j,k) = il + wi3 +
J= ; ak j kk


afli f21 afi 2 2i i
+ wi3 + wi
a.j ak aaj ak
j k j k


j=1,7;

k=1,7;


(4-35a)


and


m
B(j) = j
1=1


af af
iia i3 il
Sw il + l2i i3 il +
aa aa a




+ w- 3 i2 + wi2 Ji2
aa a .


j=1,7; (4-35b)


where








afi afi afi fli afi afi fli afi
11 11i 11 11i 1i 111 l i
aaa an Da2 aT Da3 3a aa4 De ,


..., and so forth.

To solve the linear equations A6=-B for 8, we can apply any

linear equation solution method, e.g. Gaussian Elimination,

Gauss-Jordan Elimination, the Jacobi Iterative method, etc.

In a private communication, Eichhorn (1985) proposed a

special solution method for this particular problem, because

the covariance matrix, i.e., the inverse of the coefficient

matrix of 8, is fortunately a positive definite matrix. The

method will be described in another section of this chapter.

In our work, we use this special method for solving the

linear equations above for 8.

Step 5.

Calculate the new vector Vn, which now is the improved

correction vector to the observations, from


= f (4+f8).



The dimension of the vector v is 2m. Denote Q=$fi8 and

R=f-TwQ, Q and R both are also 2m-vectors. We can see that



Q = (QIQ2,...Qi,..,Qm)T (4-36)


where








afli
i ax xi
1



f vxi
ax.


afl
--v .
ayi


7

1=1


7
v i +


af2i
ayi


7
fli +
111



7
f2i +
1=1


af
fli
^j



2i
a.
j


Qil


Qi2


8.





Da 3
af
6j
a.



2i
6j
Da
j


(4-36')


In the case of w being block-diagonal,



R = (RI, R2, ..., Ri, ..., Rm)T


(4-37)


where


Ril 1


Ri2


i.e.,


Qi =


Ri








li
S(QilWil+Qi2Wi3) +
ax


lii
(Qilil+Qi2i3) +
i n i

Therefore, in this case,


n= (V1, V2,


where


Vi v
Iyi )


f 2i
- (QilWi4+Qi2Wi2)
axi
i


2i
(QilWi4+Qi2Wi2)
aYi


*..., i, ..., Vm)T





= ilRil + oai3Ri2

i4Ril + Ri2Ri2


S .


(4-37')


(4-38)


(4-38')


Step 6.


Find the new approximate solutions an and in from


an = a + ,
Xn = 0 + Vn


Step 7.
Determine if another iteration is needed.


Calculate all quantities of


Xi(new) xi(old) Yi(new) Yi(old)
Xi(old) Yi(old)


and I
aj





61


(altogether 2m+7 quantities) and pick up the maximum value

among them. If this maximum exceeds a pre-set specified

value, say 10-8, return to the very beginning and take all

of the steps once again. Only when the maximum is less than

the specified value, we assume that good convergence has

been achieved.

Above is the scheme of Newton's method in our orbit

problem. In real programming, some additional considera-

tions have been taken into account. For example, in the

actual program, all variables are dimensionless. For

rectangular coordinate pairs (x, y), two new quantities are

defined,


x y
x = y = (4-39)
x0 Y0


where (x0, Yo) are the "observations" in the initial data,

so that x = x0x', y = YOY'. For the seven parameters, seven

new variables are defined as well. They are


n T a e i
aI a =- a3 = a =- a =
1 2 3 4 5
no TO a e0, i0



w Q
a = a (4-39')
WO 0 g


where (no,T0,a0,e0,io0,0,n0) in oa is the initial

approximate solution and e is defined as e=sino and e0=sino0.








In terms of these new variables, all formulae for

computing the partial derivatives need only slight modifica-

tions, and the whole process remains in principle unchanged.

In addition, we have seen that fi, a, ftT are square

2mx2m matrices. If the number of observations is, e.g.

m=100, the covariance matrix a has 40,000 components, and

these three matrices alone occupy a huge storage, 120,000,

in the computer. In practical programming, we should keep

the required computer storage to a possible minimum. A

block-diagonal covariance matrix a can be stored in a 2mx2

matrix, and the same applies to all other related matrices.

This greatly reduces the need for storage.


Weighting of Observations

As we know, the measures of any binary star will be

affected by the accidental and systematic observational

errors and, occasionally, from blunders, i.e., actual

mistakes. The points, when plotted, will not lie exactly on

an ellipse but will occupy a region which is clustered

around an ellipse only in a general way.

Observations are frequently combined into normal

places. Among the observed quantities, the time is the one

observed most precisely. To investigate the measurements

for discordance before using them to calculate an orbit, the

simplest method is to plot the distance p in seconds of arc

and the position angles 6, separately, as ordinates, against

the times of observation as abscissae. Smooth curves are









drawn to represent the general run of the measures and in

drawing these curves, more consideration will naturally be

given to those observation points which are relatively more

precise (for example, a point based upon several well

agreeing measures by a skilled observer and supported by the

preceding and following observations) than to the others.

The deviation of the observation points is in the ordinate

direction only and gives a good idea of the accuracy of both

observed quantities. The curves will show whether or not

the measures as a whole are sufficiently good to warrant the

determination of a reasonably reliable orbit. Observations

which are seriously in error will be clearly revealed and

should be rejected or given very low weights. The curves

will also give a general idea of how the observations are to

be weighted. The points which show larger deviations should

be given lower weights and the well-determined points should

be given higher weights.

It is hard to recommend a general rule for the

weighting of measurements and normal positions. Some

precept could be considered (W. D. Heintz, 1971): compute a

weight pl according to the number of observations, and P2

according to the "weight" assigned to the observers, then

the normal place receives the weight pP PlP2 (if computations

are made in the quantities dO and dp/p). This precept could

avoid unduly high weights for single measurements as well as

for very many observations by few observers, and would









reduce the influence of residual systematic errors. Its

implicit assumption is a proportionality of the errors pde

and dp with ~p. This holds better for the multi-observer

average, as the share by less accurate data usually

increases at larger separations.

In our work, we given the points on or nearly on the

smooth curves equal unit weights, and assign lower weights

to those farther from the lines.

When we take into account the weights for the observa-

tions, some slight and very simple modification is needed in

the solution process described above.

Let G be the matrix of weights for the observational

errors v. The dimension of G is 2mx2m, and G is diagonal.

Taking into account G, the residual function SO will be

modified as


1 1
SO = vGa-1Gv = vGo-Gv (4-40)
2 2


because GT = G. If we define E-1 = Go-1G, then


1
S = v E v (4-40')
2


which has the exact form as before except a is replaced by

E. Therefore we need only to replace a by E, o-1 by E-1

wherever a of ol- appears.








Because G is diagonal, the calculations of E and E-1

from G and a are also very simple.


The Orthogonal Set of Adjustment Parameters
and the Efficiency of a Set of Orbital Parameters

As mentioned in the first section of this chapter,

Eichhorn (1985) has proposed a special method in a private

communication for solving the linear normal equations A6=-B

of our problem. This method is as follows.

According to its definition, A=ffTwfj is obviously a
symmetrical positive definite matrix.

Given a symmetrical positive definite matrix Q, we look

for its eigenvectors X which have the property that



QX = XX (4-41)


Any value X which satisfies Eq. (4-41) is an eigenvalue of

Q, and since Q is positive definite we know that all X>0.

Obviously X must satisfy the equation IQ-IX| = 0, the

"characteristic equation" of Q. This is a polynomial in X

of the same order n as that of Q. The solutions are

X11 X2,...,Xn. The solution Xk of the homogeneous system
(4-41) with X=XK is the k-th eigenvector. Since this is

determined only with the uncertainty of an arbitrary scale

factor, we may always achieve jXKI = 1 for all k. Let the

matrix P=(Xl,X2,...,Xn) be the matrix of the normalized

eigenvectors of Q. It can be shown that P is therefore









orthogonal, so that PTp-1. Equation (4-41) can be written

as


QP = Pdiag(Xl, ..., Xn)


(4-42)


Writing the diagonal matrix diag(Xl, X2, ..., An)=D, we

rewrite Eq. (4-42) as



QP = PD ,


whence


pTQP = D ,






pTQ-1p = D-1


Q = pDpT ,







Q-l = PD-1pT


The equation (4-44) shows incidentally that the eigenvectors

of a matrix are identical to those of its inverse, but that

the eigenvalues of a matrix are the inverses of the eigen-

values of the inverse.

Let Q be the covariance matrix of a set of statistical

variates X. We look for another set Y as function of X,

such that the covariance matrix of Y is diagonal.


and


(4-43)


(4-44)








Putting dX=a, the quadratic form which is minimized is

aTQ-la, where Q is the covariance matrix of X. If we define



Y = PT (4-45)



then we have



X = PY



and



dX = PdY or a = Pp aT = TpT .



and in terms of 0 = dY, the quadratic form minimized which

led to the values X becomes pTpTQ-1pp, whence the covariance

matrix of Y is seen to be (PTQ-lp)-l-pTQp=D by Eq. (4-44).

It is then shown that Y=PTX is the vector of linear

transforms of X whose components are uncorrelated. A vector

of such components is called a vector of statistical

variates with efficiency 1 (Eichhorn and Cole, 1985). If

any of its components are changed (e.g. in the process of

finding their values in a course of iteration), none of the

other components of Y would thereby be changed in a tradeoff

between changes of correlated unknowns.








Now, back to the normal equations A8=-B. The problem

of solving normal equations above can therefore be attacked

as follows.

1) Find the eigenvalues and eigenvectors of A, i.e., D-1 and

P. (A=Q-1, according to the notation above.)

2) The covariance matrix of 8, Q, can then be calculated

from Eq. (4-43), Q=A-1=PDPT.

3) Let 8'=PTS. This gives AP8'=-B, whence 8'=-PTQB, or from

Eq. (4-43),


8' = DPTB (4-46)



Finding D-1 from D is very easy because D is diagonal. The

vector 8 is then easily calculated directly from 8=P8'.

One of the advantages of this method is that we can

calculate both 8 and 8', the correlated and uncorrelated

elements of the corrections to the adjustment parameters, at

the same time. The vector 8' is the set of the corrections

to the orthogonal adjustment parameters.

Furthermore, using this method, a measure for the

"efficiency" of a set of adjustment parameters can be easily

calculated, cf. Eichhorn and Cole (1985). They point out

that the information carried by the vector 8 of the

correlated estimates (whose covariance matrix is Q) is

partly redundant because of the nonvanishing correlations

between the estimates. What is the information content








carried by these estimates? We see that the matrix pTQp is

diagonal if P is the orthogonall) matrix of the normalized

eigenvectors of Q and that it is also the (therefore

uncorrelated) linear transforms 8'=P8 of 8. We might regard

the number


IQ
e = (4-47)
91-1"'nn ,


that is, the product of the variances of the components of

vector 8' (the product of the variances of the uncorrelated

parameters) divided by the product of the variances of the

components of vector 8 (the product of the variances of the

correlated parameters) as a measure for the "efficiency" of

the information carried by the estimates 8. But we note

that according to the definition (4-47), the value of 6 is

severely affected by the number of variables. In order to

eliminate the effect of n, we redefine 6 as


1

S= --(4-47')
q 11" 'qnn ) .


For any set of uncorrelated estimates we would have e=l,

which is evidently the largest value this number may assume.

In our work, for every model calculated, the efficiency

of the set of adjustment parameters and their covariance

matrix are calculated.









A Practical Example

Table 4-3 lists the observation data for 51 Tau. These

data are provided by H. A. Macalister. The author would

like to thank Macalister and Heintz for their data which he

has used in this dissertation.

Theoretically, the first step in our computation should

be the reduction of the measured coordinates to a common

epoch by the application to the position angles of correc-

tions for precession and for the proper motion of the

system. The distance measures need no corrections.

Practically, both corrections are negligibly small unless

the star is near the Pole, its proper motion unusually

large, and the time covered by the observations long. The

precession correction, when required, can be found with

sufficient accuracy from the approximate formula



Ae = 8 80 = +0?00557sinasec6(t-t0) (4-48)



which is derived by differentiating Eq. (3-1) with respect

to 8, and introducing the precessional change A(ncosa) for

d6. The position angles are thus reduced to a common

equinox tO (for which 2000.0 is currently used), and the

resulting node g0 also refers to to, because

A8=8-980=-Qo=AQ. Computing ephemerids, the equinox is

reduced back from to to t.

The change of position angle by proper motion,








6 80 = 1isina(t-t0) (4-49)



where the proper motion component in right ascension % is

in degrees, can be neglected in most cases.

The two formulae above can be found either in Heintz'

book "Double Stars" or in Aitken's book "The Binary Stars".

In table 4-4, all position angles have been reduced to

the common equinox 2000.0 and the converted rectangular

coordinates (x0,Y0) are also listed.

In figure 4-1, all pairs of rectangular coordinates

(x0,YO) are plotted in the x0-Y0 plane. We can see at a

glance that all observation points are distributed closely

to an ellipse. Furthermore, in Figures 2a and 2b, we plot

the distance p in seconds of arc and the position angles

against the observing epoch, respectively. From Figure 2,

we see that the observations fall upon nearly sine-like

curves. Thus, we get the impression that these data are

rather precise and therefore give all observations equal

weights.

For these data, an initial approximate solution is

easily obtained by Kowalsky's method. The initial

approximation a0 is listed in Table 4-5.

Starting from this, the final solution for i is

obtained after only three iterations and shown in Table 4-6.

The calculation required only 47962 CPU time on a VAX.








The residuals (observational errors) in (p,8) and (x,y)

are shown in Table 4-7. The residuals in (p,8) and (x,y)

are plotted against the observing epoch in Figures 4-3 and

4-4, respectively. They show a random distribution as

expected. Also, Figure 4-5 shows the comparison of the

observation points with the corresponding points after

correction in the apparent plane.

The "efficiency," the covariance matrix, the correla-

tion matrix and transformation matrix (which transforms the

correlated parameters to the uncorrelated ones) of the

adjusted parameters in the final solution are calculated and

listed in Table 4-6. In addition, Table 4-6 also lists the

standard deviations of a) the original and b) the

uncorrelated parameters in the final solution.









Table 4-3.

The Observation Data for 51 Tau.


HR1331 51 Tau HD 27176 SAO 76541 04185+2135 n


1975.7160 106.00 2.0 0.080 0.003 Al
1975.9591 91.9 1.7 0.074 0.003 Al
1976.8574 34.9 1.5 0.069 0.005 A2
1976.8602 33.5 1.5 0.073 0.009 A2
1976.9229 22.9 1.0 0.072 0.008 A3
1977.0868 26.7 1.0 0.083 0.008 A3
1977.6398 8.8 0.101 A6
1977.7420 3.1 0.8 0.110 0.008 A5
1978.1490 352.2 0.5 0.113 0.010 A5 n
1978.6183 340.7 0.8 0.108 0.008 A5
1978.8756 333.3 2.0 0.086 0.013 B4
1978.7735 304.3 0.090 A7
1980.1532 285.9 0.075 A8
1980.7182 259.0 0.079 A8
1980.7263 255.8 0.085 A8
1980.7291 259.1 0.087 A8
1982.7550 191.80 0.1343 C2
1982.7579 192.65 0.1362 C2
1982.7605 190.36 0.1315 C2
1982.7633 192.90 0.1381 C2
1982.7661 193.39 0.1308 C2
1983.0472 186.18 0.1333 C2
1983.0637 187.21 0.1499 C2
1983.7108 182.05 0.1456 C2
1983.7135 179.56 0.1480 C2
1983.9337 181.0 1.9 0.149 0.010 FA
1983.9579 176.7 0.157 RB
1984.0522 175.01 0.1446 C2
1984.0576 174.79 0.1445 C2
1984.0603 172.73 0.1355 C2
1984.779 157.3 0.146 RC
1984.9308 164.5 2.6 0.141 0.013 FB
1985.1063 161.0 2.7 0.137 0.013 FB
1985.2048 158.0 3.0 0.125 0.013 FB
1985.8378 145.69 0.1141 ##
1985.8406 144.53 0.1202 ##
1985.8541 145.71 0.1200 ##









Table 4-4.

The Reduced Initial Data for 51 Tau


t 80 P0 x0 YO


1975.7160
1975.9591
1976.8574
1976.8602
1976.9229
1977.0868
1977.6398
1977.7420
1978.1490
1978.6183
1978.8756
1979.7735
1980.1532
1980.7182
1980.7263
1980.7291
1982.7550
1982.7579
1982.7605
1982.7633
1982.7661
1983.0472
1983.0637
1983.7108
1983.7135
1983.9337
1983.9579
1984.0522
1984.0576
1984.0603
1984.7790
1984.9308
1985.1063
1985.2048
1985.8378
1985.8406
1985.8541


106.00
91.90
34.90
33.50
32.90
26.70
8.80
3.10
352.20
340.70
333.30
304.30
285.90
259.00
255.80
259.10
191.80
192.65
190.36
192.90
193.39
186.18
187.21
182.05
179.56
181.00
176.70
175.01
174.79
172.73
157.30
164.50
161.00
158.00
145.69
144.53
145.71


0.0800
0.0740
0.0690
0.0730
0.0720
0.0830
0.1010
0.1100
0.1130
0.1080
0.0860
0.0900
0.0750
0.0790
0.0850
0.0870
0.1343
0.1362
0.1315
0.1381
0.1308
0.1333
0.1499
0.1456
0.1480
0.1490
0.1570
0.1446
0.1445
0.1355
0.1460
0.1410
0.1370
0.1250
0.1141
0.1202
0.1200


-0.0222
-0.0026
0.0565
0.0608
0.0604
0.0741
0.0998
0.1098
0.1120
0.1020
0.0769
0.0508
0.0207
-0.0150
-0.0207
-0.0163
-0.1314
-0.1329
-0.1293
-0.1346
-0.1272
-0.1325
-0.1487
-0.1455
-0.1480
-0.1490
-0.1568
-0.1441
-0.1439
-0.1344
-0.1348
-0.1359
-0.1296
-0.1160
-0.0943
-0.0980
-0.0992


0.0769
0.0740
0.0396
0.0404
0.0392
0.0374
0.0156
0.0061
-0.0151
-0.0355
-0.0385
-0.0743
-0.0721
-0.0776
-0.0824
-0.0855
-0.0176
-0.0300
-0.0238
-0.0310
-0.0305
-0.0145
-0.0190
-0.0054
0.0010
-0.0028
0.0088
0.0124
0.0129
0.0170
0.0562
0.0375
0.0445
0.0467
0.0642
0.0696
0.0675
















LO
d7
r-4
0
o
0 0
0 1
o o o
00 x

o ,
00 0.

L r.


0

0 E -1
0 rd
o (0) o
o j

00 O





0
0 0

0 404
0
4.
0 0
000 rM C
5 o o o


I I I a)

(-D JO spuooas) OX r4
q~ I I r
















(OJD 40 spuo0os) O/


4T"

S0)





ro








04 0)
0 00



-0 )
00)
00








^
0 *T-

o

0 0
o x

S0 I0 0I
0 x L

00 r rD n 0-
O *


(OJD 4o spuooas) x


s-I
a




4
U)













0o


0)




4 -I
.rq 0





0o









44 0
C)















.C
0 0








'4W


I
ri
X
0)
+1O



















&'t









Table 4-5.

Initial Approximate Solution ac for 51 Tau.


PO(yr) TO a0 e0 i0o 0 O0 go

11.18 1966.4 0.128 0.181 127.3 152.9 170.2








78


(0000000
e ooooooo



o n N I I I I I
C r- Ln n H M M
I I Il



n 0000000
o r I I I I I I I
3 p M M m m M
w %D lw f %Oo




iI I


LA 0000000



u o o n 0D in ll
*I n l l N t z ri





S III I
S1 C; A 0000000 r-


In 0 0 0 I I I 1I
N 0 0 1 1


E-4-
'0 0 0 0 L 0 C>
0 r I I I I Io oI


H 0 00
VO G C I I I I I







N 10000000
e OD 1 r o o WWWWWtr








S0 0 I I I I I I I
4 E 0o 0 o''
0) 0 0 H-e nMnen'.e
Hn r 0 0 ..







SI I I ooo
*r* m < MMiaMMMM







0 O OOOOO
LA H H o11a111







* H II I






:3 (d (df 0) >1<
0 > > 0




C C d w 44 0
44 *1 -'O 4 0
r_ 0 00 0

4* M M 0 0 4 4
9 W MS 9x












r- m Ln W N mCno

m r-4 r-' C% CVo


000000 O
1 1 1 1





Nmn (y) o n

S00000 N
I II







N 0 o% r- 0i 0
0000 -400






i4lO ( 0 r- (n n
ON0










NOC 0r ; ;






r-i So c (n *1 r-4 S
0 0 0ro 0 00
I I












or-i N m rLn








0 o m Wo aM C4
C- r 1-4O rc wmD m
o Lnm -4 r- o
00-010000













0
00**t**O*
III





























-14
0












-I I



V E
0 0
0
dd+dg
III

r0
00
k000
0
d Oddd
I!r

0
00 0
0 E


Swo n wO r %o
m m lw lqr m> %D o



0000000
I I I








cM O cn n r-i
0000000
I II I













0 "%CM r4 N w0
000n 00
II I





0 0 0 H 0 W 00





cM r- in r-4 r-4 Ln
ON r-i V ae
000 rv r 4
000000







I I I I
0000000




oooomoo


0000000


r- C4 r-I o o

0 c' r-4 -W k

00000
I I


LOb
00


I









Table 4-7.

Residuals of the Observations for 51 Tau in (p,e) and (x,y)



t v8 Vp Vx Vy


1975.7160
1975.9591
1976.8574
1976.8602
1976.9229
1977.0868
1977.6398
1977.7420
1978.1290
1978.6183
1978.8756
1979.7735
1980.1632
1980.7182
1980.7263
1980.7291
1982.7550
1982.7579
1982.7605
1982.7633
1982.7661
1983.0472
1983.0637
1983.7108
1983.7135
1983.9337
1983.9579
1984.0522
1984.0576
1984.0603
1984.7790
1984.9308
1985.1063
1985.2048
1985.8378
1985.8406
1985.8541


1.0090
1.1702
2.5044
3.7620
1.2477
-0.0465
-2.0624
0.5795
0.2656
-0.5497
-0.2196
-2.1815
-1.2302
-2.8268
-0.0249
-3.4623
1.9542
1.0538
3.2987
0.7102
0.1717
2.7391
1.4488
-2.8848
-0.4324
-4.9126
-0.9444
-0.5459
-0.3999
1.6232
6.9626
-2.5201
-1.7784
-0.3956
-0.2949
0.8016
-0.6865


-0.0042
-0.0039
0.0082
0.0042
0.0070
0.0009
-0.0015
-0.0083
-0.0058
-0.0015
0.0173
-0.0054
0.0036
-0.0005
-0.0064
-0.0084
-0.0022
-0.0041
0.0007
-0.0058
0.0015
0.0052
-0.0111
0.0019
-0.0005
-0.0004
-0.0083
0.0042
0.0043
0.0133
-0.0028
-0.0004
0.0000
0.0097
0.0015
-0.0047
-0.0050


0.0000
-0.0012
0.0048
0.0007
0.0050
0.0009
-0.0010
-0.0083
-0.0057
-0.0018
0.0152
-0.0059
-0.0008
-0.0038
0.0014
-0.0032
0.0031
0.0045
0.0009
0.0060
-0.0014
-0.0043
0.0115
-0.0020
0.0005
0.0007
0.0085
-0.0040
-0.0042
-0.0136
-0.0031
0.0022
0.0015
-0.0086
-0.0008
0.0037
0.0050


-0.0044
-0.0039
0.0073
0.0064
0.0051
0.0002
-0.0039
0.0004
0.0011
-0.0006
-0.0082
0.0027
-0.0039
0.0014
0.0063
0.0093
-0.0038
-0.0013
-0.0074
-0.0001
-0.0006
-0.0069
-0.0019
0.0075
0.0013
0.0129
0.0022
0.0020
0.0016
-0.0023
-0.0173
0.0060
0.0042
0.0046
0.0014
-0.0039
-0.0016










(OJ D


4o spuooas) dA


I <



l o








,


______ I ____________ 1____________


(saai6ap) OA
















(OJD 10 spuooas) 4A


I 0I PI I
DOa cm u

No a


SF sa a
(3 0

n D DI








DD
0 DM



ED
0



n a
n o



0 0

D n
I I
t: i


S,-
0


(3oJ JO spuoOcs) XA


1 1
























co
LO
14




C)


0 U0 3

V' /'- 4 E-


0 0

O 0
QL. (0 0 /4.,


SU)


4- 0 o9
0 p0 -~







0 0 0
ON
4- 0







I0 C1X O O

00 o 0 0 (
> I I k
V) rq q A







( d dpd d d 2-
_0 q oa

0 0 p

E1









Remarks

Newton's method would converge well if the observa-

tional errors are small enough and the initial approximate

solution is sufficiently accurate. But, when the errors in

the observed distances and position angles are not suffi-

ciently small, or the initial approximation is not close

enough to the final solution, two things will happen:

1) In some iterations, the corrections to the adjust-

ment parameters are too big so that some parameters go to

unreasonable values; e.g., the semi-major axis a becomes

negative or the mean motion n, the eccentricity e becomes

negative, which are unacceptable; and further calculation

becomes pointless.

2) The iterations do not converge; i.e., the residual

function S becomes larger in the next iteration, although

all parameters remain in the ranges of reasonable values.

In these cases, Newton's method will fail to yield a

solution. In the next chapter, two other approaches (the

modified Newton methods) are proposed to deal with these

cases.















CHAPTER V
THE MODIFIED NEWTON SCHEME


The scheme for solving the nonlinear condition equa-

tions by Newton's method has been discussed in the last

chapter. Although rapid convergence can be expected if the

initial approximate solution is sufficiently accurate and

the residuals are small enough, Newton's method often fails

to yield a solution, particularly if the initial approxima-

tion i0 of the vector a is greatly in error or the residuals

are very large. As mentioned in the last chapter, two

problems arise in these cases:

1) Some of the corrections 8 to the adjustment parameters

are too big, which is unacceptable. For example, e0=0.15,

but 6e=-0.16, so that the new value e=-0.01; in this case,

the residual function calculated is no longer meaningful and

any further calculation becomes pointless.

2) Divergence occurs directly; i.e., the value of residual

function SO is larger after the iteration than it was

before.

For both of these cases, the major problem is that the

step size could be too large. If we decrease the step size,

the situation might be improved to some extent. We still

rely on Newton's method indicating the right direction, but









we no longer apply the full corrections which would follow

from the original formalism. In this chapter, we will

discuss the combination of Newton's method with the method

of steepest descent and other sophisticated methods.


The Method of Steepest Descent

We retain A=ffTwfi and introduce a numerical factor 1/f

into the normal equation (4-8), which thus becomes


1T
AT = f.Tw? (5-1)



By choosing an appropriate value of f, we would reduce the

step size and make the residual function SO gradually

smaller and smaller. This is analogous to the basic idea of

"the method of steepest descent" which is to step to the

next in a sequence of better approximations to the solution

by moving in the direction of the negative of the gradient

of So. If the step is not too large, the value of So must

necessarily decrease from one iteration to the next. Our

goal is to arrive at the stable absolute minimum value of So

and to the corresponding set of parameters which is the best

solution.

At this point, two new problems arise:

1) How does one calculate the residual function SO and its

gradient?

2) How does one choose the best value of f?









Suppose we were to pretend that the current values of

at any iteration are the true values of a. This assumption

is of course not true. But it makes it in general possible

to estimate the vector v as a function of a. This is so

because if the present value of a were the true value of a,

there would be a definite set of residuals v which causes

the corrected observations to satisfy the conditions

equations rigorously. It is not difficult to find those

components of i which correspond to certain value of a.

Suppose we have an approximation a and a and v0 of v, we

wish to obtain the "best" approximation of v, assuming a to

be correct. We then need to minimize


1
-T -1- T
S =- v v + f f.
2


relative to the remaining variables v and il which yields



f(x0 + ,ai) = 0 (5-2a)



o-1 + f Tj- = 0 (5-2b)



Setting r=-0+e, and expanding (5-2a) into powers of e, we

have


(5-3)


f(Ro,a) + fxio = 0








and Eq. (5-2) becomes


a0-1(0o+) + fo0T1I = 0


(5-4)


where x0 = x0 + v0. Solving Eq. (5-4) for & yields


I = -V0 ofxOTa "


(5-5)


Substituting now (5-5) into Eq. (5-3) for g, we get


f(io,a) fji00 fioaf:o0T = 0.


(5-6)


From Eq. (5-6) we obtain


a = w[f(io,a) fo0 l] .


(5-7)


From Eqs. (5-5) and (5-7) we arrive therefore at the

expression


v = -ofrow[f(o ,a) foro0] ,


(5-8)


where w=f0oTafi0 and i0 = x0 + i0 still. This equation may

be iterated, if needed, to arrive at a definite v; i.e.,

until the value of v on the left hand side and the value of

v0 on the right hand side are the same. There is thus a









corresponding i to each choice of parameters a and we can

write =-v(a). Although Eq. (5-8) formally depends on v0 in

the first order, it is in fact rather insensitive to the

actual value of V0 used, since it is really quadratic in V0.

To appreciate this, note that


f(i0,a) fi0(i0-x0) = f(x0,O) + O(F02) (5-9)


After we get the best value of v which corresponds to a

definite value of a, S is given by


1 1
T -1 T T -1
S = v v + f = o v
2 2


i.e., S = So,



because in this case, we already have f=0 for this value of

v. Fixing the value of a, we can therefore get the best

corresponding value of v, then calculate the value of the

residual function SO which corresponds to this value of a.

In other words, we may also write SO=SO(a).

Furthermore, when we calculate the best value of v at a

from Eq. (5-8),


V = afkow


where again $ = f(i,a)-fjt