Least Squares analysis returns the coefficients for the Nth degree polynomial that has the least amount of total distance between each of the data points in a set of data.
Does anybody know of an elegant mathematical way to solve for a linear expression that results in the most equal amount of area above and below the line? For example, if you visualize a perfect sine wave, a horizontal line running through the midpoint of the peak and the trough ((Ypeak+Ytrough)/2) of the sine wave would be the line I'd be looking for.
I can use brute force and stupidity and draw a bunch of lines and choose the one with the ratio AreaAbove/AreaBelow that is closest to 1, but I thought I'd see if there is a speedier, more elegant approach before going down that road.
Least Squares But For Equal Area Above/Below the Line?
In least squares the objective function is
f(coefficients) = Sum from i=1 to i=N of (Fitted Polynomial(x)  MeasuredData(x))^2
You could use an augmented objective function which is the sum of the squares of the fitting errors, PLUS some big whopper constant times the difference between the sum of positive errors and the sum of negative errors.
The augmented objective function "penalizes" the solution when the positive errors are different than the negative errors. It encourages the optimizer to find solutions having positive errors equal to negative errors. In the literature of mathematical optimization, it is a Penalty Function.
To use it, you will need software that performs nonlinear minimization upon usersupplied objective functions. In my experience the NelderMead algorithm (called "amoeba" and "downhill simplex" in Numerical Recipes), the HookeJeeves algorithm, and the LevenbergMarquardt algorithm all work quite well.
f(coefficients) = Sum from i=1 to i=N of (Fitted Polynomial(x)  MeasuredData(x))^2
You could use an augmented objective function which is the sum of the squares of the fitting errors, PLUS some big whopper constant times the difference between the sum of positive errors and the sum of negative errors.
The augmented objective function "penalizes" the solution when the positive errors are different than the negative errors. It encourages the optimizer to find solutions having positive errors equal to negative errors. In the literature of mathematical optimization, it is a Penalty Function.
To use it, you will need software that performs nonlinear minimization upon usersupplied objective functions. In my experience the NelderMead algorithm (called "amoeba" and "downhill simplex" in Numerical Recipes), the HookeJeeves algorithm, and the LevenbergMarquardt algorithm all work quite well.

 Roundtable Knight
 Posts: 128
 Joined: Tue Apr 12, 2011 11:46 am
Might save you some time by noting that the Numerical Recipes NelderMead algorithm tends to get stuck in local maxima/minima.
User Satisfaction with Numerical Recipes implementation: Given a sufficiently bumpy surface, every 3rd or 4th solution returned by of NelderMead is hopeless junk.
Speed: Numerical Recipes implementation is faster then Simulated Annealing but slow compared to more modern iterative search heuristics.
Try some of Sluggo's alternate suggestions.
L
User Satisfaction with Numerical Recipes implementation: Given a sufficiently bumpy surface, every 3rd or 4th solution returned by of NelderMead is hopeless junk.
Speed: Numerical Recipes implementation is faster then Simulated Annealing but slow compared to more modern iterative search heuristics.
Try some of Sluggo's alternate suggestions.
L