FPL Statistics Group
DISCLAIMER OF WARRANTIES
THIS SOFTWARE IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND.
THE AUTHOR DOES NOT WARRANT, GUARANTEE OR MAKE ANY
REPRESENTATIONS REGARDING THE SOFTWARE OR DOCUMENTATION IN TERMS
OF THEIR CORRECTNESS, RELIABILITY, CURRENTNESS, OR OTHERWISE.
THE ENTIRE RISK AS TO THE RESULTS AND PERFORMANCE OF THE SOFTWARE
IS ASSUMED BY YOU. IN NO CASE WILL ANY PARTY INVOLVED WITH THE
CREATION OR DISTRIBUTION OF THE SOFTWARE BE LIABLE FOR ANY DAMAGE
THAT MAY RESULT FROM THE USE OF THIS SOFTWARE.
Sorry about that.
This sample code is really kind of ugly. It is not production
quality. We post it simply because it might be useful for someone
interested in producing something similar. If you find errors in the
code or if you are interested in a "simple" modification with which we
could help you, please contact Steve Verrill at firstname.lastname@example.org
Many interactive regressions
Researchers sometimes encounter data sets that consist of many curves.
The researchers may need to "reduce" these curves further before
performing any statistical tests. Some examples:
- A researcher is interested in the effect of various preservative
treatments on the modulus of elasticity of several grades of lumber.
The design is a two way analysis of variance (ANOVA). The data points
in each cell are
modulus of elasticity values obtained from stress-strain curves.
- A researcher is interested in the effect of several chemical
inhibitors on the growth of several fungus species. Again the
design is a two way ANOVA. The data points in each cell are 50%
inhibition doses obtained from colony size versus dose curves.
- A researcher is interested in the effects of linerboard source,
corrugating medium, and adhesive type on the creep behavior of corrugated
fiberboard under cyclic humidity conditions. Here the design is a three
way ANOVA. The data points in each cell are steady-state creep
rates obtained from strain versus time curves.
In all of these examples, before the ANOVAs can be performed, the data
curves must be reduced to single numbers. Sometimes this process is
straightforward and it can be automated. In other cases, especially
when the curves contain "glitches," human intervention is required ---
an investigator needs to look at a plot of each data curve, select a
region to fit, look at the results of the fit, decide whether an
improved fit is needed, and so on. When the data set consists of
hundreds of curves, this can be a painful process. We hope that our
programs will ease the pain.
The version here fits the model y = a + b (x - c)^2. It makes use of
Java translations of LINPACK QR decomposition routines and can be
readily generalized to more complex linear models. We have also
developed versions that handle simple linear
regressions and a nonlinear regression.
Here is a
[Here are Java translations of
LINPACK Cholesky, QR, singular value,
and LU decomposition routines. Here are Java translations of
several high quality public domain optimization packages, including
Minpack nonlinear least squares routines.]
The program is designed to work as follows:
We welcome suggestions for improving this program. Please send
suggestions and bug reports to the e-mail address given below. Here is
the current demonstration applet.
- A user provides it with a data file. The file consists of a
line that contains a curve id, followed by lines containing x,y pairs,
followed by the id of the next curve, followed by its x,y pairs, and
so on. (If you forsee that you might find a use for this program, but
would prefer a different form for the input, please contact me at the
e-mail address given below.) Currently, the demonstration applet
reads a data set
consisting of a collection of quadratic curves.
You could also provide your own data set. Simply
anonymous ftp it to www1.fpl.fs.fed.us and put
it in the pub/data directory. Then, after you start the demonstration
applet, replace "quad7.jdat" in the first applet window with the
name of your data file.
- When the program begins, the user is presented with a window
that contains text fields for the name of the input data file,
the prefix of the output data files, and labels for some of the plots.
After these fields have been filled, the user clicks on the Go
button to proceed.
- After the Go button is clicked, the program loads the first curve
in the data set. A user can advance through the collection of data
curves by pressing the Next button. Alternatively, a user can
specify a particular curve by typing its ID into the text field next
to the Load button, and then pressing the Load button.
- A user excludes rectangular regions of data by pushing a mouse
button down at one corner of the rectangle, holding the button down
while moving the pointer to the opposite corner of the rectangle, and
then releasing the button. While a user is performing this movement,
an animated red rectangle appears that outlines the current exclusion
region. After the button is released the outline of the rectangle
turns black, and it can no longer be moved. However the data in the
rectangle can be reactivated if the rectangle is "cleared." A user
clears a rectangle by clicking a mouse button in it (all rectangles
that contain this point will be cleared) and then clicking
on the Clear button at the bottom of the page. All rectangles
that are shown on the current page (in the current "zoom state" ---
see below) will be cleared if the user clicks
the Clear All button at the bottom of the page.
- If a user wants to look at only the data that has not been
excluded, the user should press the Zoom button. A sequence of
zooms may be performed.
- Currently, the program does not permit a user to unzoom a single
However a user can reactivate all of the data by pressing the
Reset button at the bottom of the page. Alternatively, a user can see all of
the inactivated rectangles by performing a fit (see below) and then by
clicking the appropriate recall button at the top of the page (see below).
- A user fits the active data by clicking the Fit
button at the bottom of the page. In this demonstration program, a
quadratic model is fit. As noted above we have also produced sample
programs that fit a simple linear regression and that fit a particular
- After a fit has been performed, an appropriate summarizing value
appears in one of the boxes at the top of the page. For the
demonstration program, the summarizing value is the b value in the
model y = a + b (x - c)^2. (This was important for one of our scientists.)
Up to 5 fits can be
performed on any one data set (of course, additional fits can be
performed by reloading the data set or by rerunning the program). If
i < 5 fits have been performed, and the Fit button is
pushed, the summarizing value will appear in text field i + 1
at the top of the page. At any
time prior to a push of the Load, Next, or Stop
buttons, this fit can be recalled by pushing the button (buttons 1 -- 5)
with the text field at the top of the page.
- After a fit has been performed (actually, even before the Fit
button has been pushed), a user can view various associated
residual plots by pushing the Residuals button at the bottom of
the page. This opens a second window that contains a different set of
buttons at its bottom. The Data button displays the currently
active data. The Fit button adds the fitted line. The Res.
vs. x button plots the residuals versus the corresponding x
values. The Res. vs. pred. y button plots the residuals versus
the predicted y values. The Histo button produces a histogram
of the residuals. The NPP button produces a normal probability
plot of the residuals. The Bye button deletes the residuals
- The Stop button ends the execution of the program.
- If a fit has been performed, then when the next
Load, Next, or Stop button is pushed, information
is written to two results files. To a file titled
xxx.ests (where xxx is provided by the user in the startup box), the
ID and a, b, c parameter estimates from the most recently displayed
fit are written. To a file titled xxx.inact (where xxx is provided by
the user in the startup box), the ID, the number
of inactivated rectangles associated with the most recently displayed
and the pairs of x,y values that define them are written.
files can be retrieved via anonymous ftp from the pub/data directory.
Obtaining the source code
The program is available as both a Java applet and a Java
Alpha source code is available in compressed tar, Windows 95 zip form, and
Windows 98 and later zip form.
Installation of the application
Given the way that the code is currently written you will need to
place the following directories and associated files
under a directory in your CLASSPATH. (On a Unix box,
this is done automatically if you uncompress and then untar the
compressed tar file in a directory that lies in your CLASSPATH.)
You will then need to place the other classes (
together in a directory
while you are in that directory to run the application. (You
would have to type this at a DOS command line under Windows.)
Of course for this to work,
you will need a Java runtime environment (JRE) on your machine.
If your machine does not currently include a JRE, you can obtain one
for free at
http://java.sun.com/j2se/1.3/jre (Solaris, Linux, or Windows) or
http://devworld.apple.com/java/ (Apple). (There are free sites on the
Web for other machines such as IBM and HP Unix boxes.) You will need
to follow the associated installation instructions to get the JRE set
up on your system.
Differences between the applet and the application
The application permits you to choose input and output files with a
file chooser. To do so click on one of the Browse buttons in
the startup box. Also, if your printer can handle postscript,
the application permits you to print plots via the
Print button at the bottom of the page.
Format.java is software described in Cornell
and Horstmann's Core Java (SunSoft Press/Prentice-Hall).
(We like this book.)
We use it to write to and read from the text fields.
Here is what Cornell and Horstmann
have to say about the use of code from their book:
NOTE: People have often asked what the licensing requirements for
using the sample code in a commercial situation are. You can freely
use any code from this book for non-commercial use. However, if you
do want to use the code as a basis for a commercial project, we simply
require that every person on the development team for that project own
a copy of Core Java.
The rubberband material is software described in Geary and McClellan's
Graphic Java (Sunsoft Press/Prentice-Hall). (We also like this
book.) We use this material to produce the rectangles that exclude
Here is what Geary and McClellan
have to say about the use of code from their book:
The CD that accompanies this book includes:
Virtually all these programs are discussed throughout the book. Feel
free to borrow, adapt, or extend these for your own purposes.
- All the source code for the Graphic Java Toolkit.
- Unit test applets for all GJT components, including HTML files for
unit test applets.
- HTML documentation for all GJT classes.
- Numerous image files in .gif format developed by Pixelsite
If you have questions about this software,
or suggestions for improvement,
please contact Steve Verrill at
or 608-231-9375. Also see this link to the
FPL Statistics Group.
As of 3/3/01 no bugs have been reported. However, we assume that
they exist. Bugs will be corrected as they are reported and
descriptions of any bugs and bug fixes will be documented here.
We welcome suggestions for improving this program. Please send
suggestions and bug reports to the e-mail address given below.
For further information, please contact Steve Verrill at
[FPL Statistics Group]
Last modified on 3/3/01.
As of last midnight, this page had been accessed