Orbits as Conic Sections

Source: (502) Alan Cooper’s answer to A body with negligible mass approaching a planet/star can have an elliptical, parabolic or hyperbolic orbit. Is there a purely or largely geometric argument for why all possible orbits are conic sections and vice versa? – Quora

The answer to the question is, I think, Yes.

In fact I believe that most of the proofs in Newton’s Principia were expressed in an essentially geometric form rather than making use of the calculus. (He may well have used calculus in his preliminary thinking and in other less formal applications but apparently preferred to avoid it in his most formal presentations).

A possible geometric approach to the conic sections result might be to look at the changes in position, [math]\vec{r}[/math], and velocity, [math]\vec{v}=\dot{\vec{r}}[/math], of the test particle relative to the planet [math]P[/math], and to just show directly that the effect of an inverse square force law giving a centripetal acceleration of magnitude [math]\frac{\mu}{r^{2}}[/math] (so giving [math]\ddot{\vec{r}}=-\frac{\mu\vec{r}}{r^{3}}[/math]) ensures that they satisfy some geometric property of conic sections (rather than following the usual modern approach of deriving the parametric or polar equations of the full trajectory).

There are various geometric conditions that could be used, but the one that seems most promising (because it applies to all of the conic sections in a similar way) is the extension of the Focus and Directrix definition of a parabola. This uses a point [math]P[/math] (the focus), a line [math]D[/math] (the directrix), and a number [math]e[/math] (the eccentricity), with the requirement that for any point on the curve, its distance from [math]P[/math] is [math]e[/math] times the distance from [math]D[/math]. The case of [math]e=1[/math] is of course the parabola, the cases [math]1>e>0[/math] give ellipses, [math]e>1[/math] gives hyperbolas, and the limiting case of [math]e=0[/math] gives just a point if the line [math]D[/math] is fixed but can be a circle if [math]D[/math] goes off to infinity at the same time as [math]e[/math] goes to zero.

An alternative version of this, which avoids treating the circle as an awkward limiting case, is to specify instead of [math]e[/math] and [math]D[/math] an “eccentricity vector”, [math]\vec{e}[/math], of length [math]e[/math], and to require that the component of [math]\vec{r}[/math] in the [math]\vec{e}[/math] direction plus [math]\frac{1}{e}[/math] times the length, [math]r[/math], of [math]\vec{r}[/math] remains constant. This would ensure that, for [math]e\neq 0[/math], the set of points at distance [math]\frac{r}{e}[/math] from the particle in the direction of [math]\vec{e}[/math] is a line perpendicular to [math]\vec{e}[/math] at a distance [math]\frac{r}{e}[/math] from the particle (and so [math]r_p + r_p/e[/math] from the focus, where [math]r_p [/math] is the minimum “periapsis” distance of the path from the focus). When [math]e\neq 0[/math], this line then serves as the directrix [math]D[/math] in the previous definition. But this version also makes perfectly good sense (without any mention of a directrix) even when [math]e=0[/math] (in which case the direction of [math]\vec{e}[/math] does not need to be defined).

Apparently [math]\vec{e}=(\frac{v^2}{\mu}-\frac{1}{r})\vec{r}-\frac{\vec{r}\cdot\vec{v}}{\mu}\vec{v}[/math] does the job. But I haven’t checked whether this is easy to see on traditional geometric grounds or from the picture, let alone whether there is a nice intuitive way to come up with it. (I may add more to this answer if I can get anywhere with that.)

OK. I’ve made some progress. So here goes!

At each stage I’ll start by using modern notation and tools to motivate a proposition and then maybe come back later and try to give or suggest a more geometric proof.

Let’s start by noting that for any central force, directed always at point [math]P[/math], there is no torque about [math]P[/math] so we expect the angular momentum to be constant. Or in terms of vector geometry, if [math]\ddot{\vec{r}}||\vec{r}[/math] then [math]\frac{d}{dt}(\vec{r}\times\dot{\vec{r}})=\dot{\vec{r}}\times\dot{\vec{r}}+\vec{r}\times\ddot{\vec{r}}=\vec{0}[/math]. This leads immediately to Kepler’s area law, but of greater interest to us right now is that it provides an invariant vector [math]\vec{h}=\vec{r}\times\dot{\vec{r}}[/math] – albeit one that is perpendicular to rather than in the plane of our expected orbit.

Before going on, you might object that, by using vectors and derivatives, this is not a geometric argument in the intended sense of Euclidean geometry. But really the use of vectors is just an alternative way of expressing geometric facts and we can’t expect to completely avoid calculus in a problem whose statement is basically about acceleration. So for now I’ll just continue in the same spirit, but I may come back later to see if we can phrase the arguments in a more classically Euclidean form.

To get a vector in the orbital plane we could now just take the cross product of [math]\vec{h}[/math] with anything in that plane, such as either [math]\vec{r}[/math] or [math]\dot{\vec{r}}[/math] for example. Of course we have no reason to expect either of these to be invariant, but let’s see what they are.

Now the case of [math]\vec{r}\times\vec{h}[/math] gives [math]\frac{d}{dt}(\vec{r}\times\vec{h})=\dot{\vec{r}}\times\vec{h}+\vec{r}\times\dot{\vec{h}}[/math] and for any central force we have [math]\dot{\vec{h}}=\vec{0}[/math] which just brings us back to the other case with no opportunity to make use of the inverse square law.

But for the other case we have [math]\frac{d}{dt}(\dot{\vec{r}}\times\vec{h})=\ddot{\vec{r}}\times\vec{h}+\dot{\vec{r}}\times\dot{\vec{h}}[/math] and using the inverse square law and the fact that [math]\dot{\vec{h}}=\vec{0}[/math] gives [math]\begin{align}\frac{d}{dt}(\dot{\vec{r}}\times\vec{h}) &=-\frac{\mu}{|r|^3}\vec{r}\times\vec{h}+\dot{\vec{r}}\times\vec{0} \\&= -\frac{\mu}{|r|^3}\vec{r}\times(\vec{r}\times\dot{\vec{r}})=-\mu\frac{(\vec{r}\cdot\dot{\vec{r}})\vec{r}-(\vec{r}\cdot\vec{r})\dot{\vec{r}}}{(\vec{r}\cdot\vec{r})^{3/2}} \\&= \mu[-(\vec{r}\cdot\vec{r})^{-3/2}(\vec{r}\cdot\dot{\vec{r}})\vec{r}+(\vec{r}\cdot\vec{r})^{-1/2}\dot{\vec{r}}] \\&= \mu[-\frac{1}{2}(\vec{r}\cdot\vec{r})^{-3/2}(\vec{r}\cdot\dot{\vec{r}}+\dot{\vec{r}}\cdot\vec{r})\vec{r}+(\vec{r}\cdot\vec{r})^{-1/2}\dot{\vec{r}}] \\&= \mu\frac{d}{dt}\frac{\vec{r}}{(\vec{r}\cdot\vec{r})^{1/2}} \end{align}[/math]

So [math]\vec{L}=(\dot{\vec{r}}\times\vec{h})-\mu\frac{\vec{r}}{(\vec{r}\cdot\vec{r})^{1/2}}=(\vec{v}\times\vec{h})-\mu\frac{\vec{r}}{r}[/math] is constant.

Furthermore,

[math]\begin{align}\vec{r}\cdot\vec{L}&=\vec{r}\cdot(\vec{v}\times\vec{h}-\mu\frac{\vec{r}}{r})=\vec{r}\cdot(\vec{v}\times\vec{h})-\mu\frac{\vec{r}\cdot\vec{r}}{r}\\&=(\vec{r}\times\vec{v})\cdot\vec{h}-\mu r=\vec{h}\cdot\vec{h}-\mu r=h^2-\mu r \end{align}[/math]

So [math]\vec{r}\cdot\vec{L}+\mu r=h^2[/math], which is also constant.

Thus [math]\vec{e}=\frac{\vec{L}}{\mu}[/math] is a constant vector for which [math]\vec{r}\cdot\vec{e}+|\vec{r}|[/math] is also constant, and dividing by [math]e=|\vec{e}|[/math], we see that the projection, [math]\vec{r}\cdot\frac{\vec{e}}{e}[/math] of [math]\vec{r}[/math] in the direction of [math]\vec{e}[/math] when added to [math]\frac{1}{e}[/math] times the length of [math]\vec{r}[/math] also gives a constant (which is exactly the condition we used to define the eccentricity vector).

Since [math]\vec{e}=\frac{\vec{L}}{\mu}=\frac{\vec{v}\times(\vec{r}\times\vec{v})}{\mu}-\frac{\vec{r}}{r}=\frac{(\vec{v}\cdot\vec{v})\vec{r}+(\vec{v}\cdot\vec{r})\vec{v}}{\mu}-\frac{\vec{r}}{r}=(\frac{v^2}{\mu}-\frac{1}{r})\vec{r}-\frac{\vec{r}\cdot\vec{v}}{\mu}\vec{v}[/math], this does match the definition proposed above.

Also, [math]\vec{r}\cdot\vec{e}=\frac{h^2}{\mu}-r[/math], and if [math]\phi[/math] is the angle between [math]\vec{e}[/math] and [math]\vec{r}[/math] then [math]\vec{r}\cdot\vec{e}=re\cos{\phi}[/math], so [math]re\cos{\phi}=\frac{h^2}{\mu}-r[/math], and [math]r=\frac{h^2/\mu}{1+e\cos{\phi}}[/math] is the polar equation for the orbit relative to origin [math]P[/math] and axis direction [math]\vec{e}[/math].

The case [math]e=0[/math] gives [math]r=\frac{h^2}{\mu}[/math] which is constant so we have a circle. Either using [math]h=v r[/math], or just equating the centripetal and gravitational accelerations, we then get [math]v=\sqrt{\mu/r}[/math], and so the orbital period is given by [math] T=\frac{2\pi r}{\sqrt{\mu/r}}=\frac{2\pi}{\mu}r^{3/2}[/math].


Extras:

[math]\begin{align}\frac{d}{dt}(\vec{r}\cdot\vec{L}) &=\dot{\vec{r}}\cdot\vec{L}+\vec{r}\cdot\dot{\vec{L}}=\dot{\vec{r}}\cdot[(\dot{\vec{r}}\times\vec{h})-\mu\frac{\vec{r}}{(\vec{r}\cdot\vec{r})^{1/2}}]+\vec{0} \\&=-\mu(\vec{r}\cdot\vec{r})^{-1/2}\vec{r}\cdot\vec{r’}=-\mu\frac{d(\vec{r}\cdot\vec{r})^{1/2}}{dt}=-\mu\frac{d|\vec{r}|}{dt}\end{align}[/math]

So [math]\frac{d}{dt}(\vec{r}\cdot{\vec{L}}+\mu|\vec{r}|)=0[/math].

[math]\vec{e}=\vec{0}[/math] so the constancy of [math]\vec{r}\cdot\vec{e}+|\vec{r}|[/math] just gives [math]|\vec{r}|= constant[/math] which is a circle. And the condition that [math](\dot{\vec{r}}\times\vec{h})-\mu\frac{\vec{r}}{|\vec{r}|}=\mu\vec{e}=\vec{0}[/math] tells us that [math]\dot{\vec{r}}\times(\vec{r}\times\dot{\vec{r}})=\mu\frac{\vec{r}}{|\vec{r}|}[/math] and so (since [math]\dot{\vec{r}}\times(\vec{r}\times\dot{\vec{r}})=(\dot{\vec{r}}\cdot\dot{\vec{r}})\vec{r}-(\dot{\vec{r}}\cdot\vec{r})\dot{\vec{r}}[/math], and [math]|\vec{r}|= constant[/math] gives [math](\dot{\vec{r}}\cdot\vec{r})=0[/math]), we find that [math](\dot{\vec{r}}\cdot\dot{\vec{r}})\vec{r}=\mu\frac{\vec{r}}{|\vec{r}|}[/math]