FPL Statistics Group





DISCLAIMER OF WARRANTIES

THIS SOFTWARE IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. THE AUTHOR DOES NOT WARRANT, GUARANTEE OR MAKE ANY REPRESENTATIONS REGARDING THE SOFTWARE OR DOCUMENTATION IN TERMS OF THEIR CORRECTNESS, RELIABILITY, CURRENTNESS, OR OTHERWISE. THE ENTIRE RISK AS TO THE RESULTS AND PERFORMANCE OF THE SOFTWARE IS ASSUMED BY YOU. IN NO CASE WILL ANY PARTY INVOLVED WITH THE CREATION OR DISTRIBUTION OF THE SOFTWARE BE LIABLE FOR ANY DAMAGE THAT MAY RESULT FROM THE USE OF THIS SOFTWARE.

Sorry about that.


Warning

This sample code is really kind of ugly. It is not production quality. We post it simply because it might be useful for someone interested in producing something similar. If you find errors in the code or if you are interested in a "simple" modification with which we could help you, please contact Steve Verrill at sverrill@fs.fed.us or 608-231-9375.


Many interactive regressions

Researchers sometimes encounter data sets that consist of many curves. The researchers may need to "reduce" these curves further before performing any statistical tests. Some examples:

In all of these examples, before the ANOVAs can be performed, the data curves must be reduced to single numbers. Sometimes this process is straightforward and it can be automated. In other cases, especially when the curves contain "glitches," human intervention is required --- an investigator needs to look at a plot of each data curve, select a region to fit, look at the results of the fit, decide whether an improved fit is needed, and so on. When the data set consists of hundreds of curves, this can be a painful process. We hope that our programs will ease the pain.

The version here fits the model y = a + b (x - c)^2. It makes use of Java translations of LINPACK QR decomposition routines and can be readily generalized to more complex linear models. We have also developed versions that handle simple linear regressions and a nonlinear regression.

Here is a demonstration applet.

[Here are Java translations of LINPACK Cholesky, QR, singular value, and LU decomposition routines. Here are Java translations of several high quality public domain optimization packages, including Minpack nonlinear least squares routines.]

The program is designed to work as follows:

  1. A user provides it with a data file. The file consists of a line that contains a curve id, followed by lines containing x,y pairs, followed by the id of the next curve, followed by its x,y pairs, and so on. (If you forsee that you might find a use for this program, but would prefer a different form for the input, please contact me at the e-mail address given below.) Currently, the demonstration applet reads a data set consisting of a collection of quadratic curves. You could also provide your own data set. Simply anonymous ftp it to www1.fpl.fs.fed.us and put it in the pub/data directory. Then, after you start the demonstration applet, replace "quad7.jdat" in the first applet window with the name of your data file.
  2. When the program begins, the user is presented with a window that contains text fields for the name of the input data file, the prefix of the output data files, and labels for some of the plots. After these fields have been filled, the user clicks on the Go button to proceed.
  3. After the Go button is clicked, the program loads the first curve in the data set. A user can advance through the collection of data curves by pressing the Next button. Alternatively, a user can specify a particular curve by typing its ID into the text field next to the Load button, and then pressing the Load button.
  4. A user excludes rectangular regions of data by pushing a mouse button down at one corner of the rectangle, holding the button down while moving the pointer to the opposite corner of the rectangle, and then releasing the button. While a user is performing this movement, an animated red rectangle appears that outlines the current exclusion region. After the button is released the outline of the rectangle turns black, and it can no longer be moved. However the data in the rectangle can be reactivated if the rectangle is "cleared." A user clears a rectangle by clicking a mouse button in it (all rectangles that contain this point will be cleared) and then clicking on the Clear button at the bottom of the page. All rectangles that are shown on the current page (in the current "zoom state" --- see below) will be cleared if the user clicks the Clear All button at the bottom of the page.
  5. If a user wants to look at only the data that has not been excluded, the user should press the Zoom button. A sequence of zooms may be performed.
  6. Currently, the program does not permit a user to unzoom a single zoom. However a user can reactivate all of the data by pressing the Reset button at the bottom of the page. Alternatively, a user can see all of the inactivated rectangles by performing a fit (see below) and then by clicking the appropriate recall button at the top of the page (see below).
  7. A user fits the active data by clicking the Fit button at the bottom of the page. In this demonstration program, a quadratic model is fit. As noted above we have also produced sample programs that fit a simple linear regression and that fit a particular nonlinear model.
  8. After a fit has been performed, an appropriate summarizing value appears in one of the boxes at the top of the page. For the demonstration program, the summarizing value is the b value in the model y = a + b (x - c)^2. (This was important for one of our scientists.) Up to 5 fits can be performed on any one data set (of course, additional fits can be performed by reloading the data set or by rerunning the program). If i < 5 fits have been performed, and the Fit button is pushed, the summarizing value will appear in text field i + 1 at the top of the page. At any time prior to a push of the Load, Next, or Stop buttons, this fit can be recalled by pushing the button (buttons 1 -- 5) associated with the text field at the top of the page.
  9. After a fit has been performed (actually, even before the Fit button has been pushed), a user can view various associated residual plots by pushing the Residuals button at the bottom of the page. This opens a second window that contains a different set of buttons at its bottom. The Data button displays the currently active data. The Fit button adds the fitted line. The Res. vs. x button plots the residuals versus the corresponding x values. The Res. vs. pred. y button plots the residuals versus the predicted y values. The Histo button produces a histogram of the residuals. The NPP button produces a normal probability plot of the residuals. The Bye button deletes the residuals window.
  10. The Stop button ends the execution of the program.
  11. If a fit has been performed, then when the next Load, Next, or Stop button is pushed, information is written to two results files. To a file titled xxx.ests (where xxx is provided by the user in the startup box), the ID and a, b, c parameter estimates from the most recently displayed fit are written. To a file titled xxx.inact (where xxx is provided by the user in the startup box), the ID, the number of inactivated rectangles associated with the most recently displayed fit, and the pairs of x,y values that define them are written. These files can be retrieved via anonymous ftp from the pub/data directory.
We welcome suggestions for improving this program. Please send suggestions and bug reports to the e-mail address given below. Here is the current demonstration applet.

Obtaining the source code

The program is available as both a Java applet and a Java application. Alpha source code is available in compressed tar, Windows 95 zip form, and Windows 98 and later zip form.

Installation of the application

Given the way that the code is currently written you will need to place the following directories and associated files under a directory in your CLASSPATH. (On a Unix box, this is done automatically if you uncompress and then untar the compressed tar file in a directory that lies in your CLASSPATH.)

You will then need to place the other classes ( Pdown.class, Pdown_application.class, Pup.class, Pup_application.class, Quad.class, Quad16.class, Quad161.class, Quad_application.class, Quad_application1.class, Quad_application_canres.class, Quad_application_canvas.class, ResPlot.class, ResPlot_application.class, SPVCanRes.class, SPVCanvas.class) together in a directory and type

java Quad_application

while you are in that directory to run the application. (You would have to type this at a DOS command line under Windows.)

Of course for this to work, you will need a Java runtime environment (JRE) on your machine. If your machine does not currently include a JRE, you can obtain one for free at http://java.sun.com/j2se/1.3/jre (Solaris, Linux, or Windows) or http://devworld.apple.com/java/ (Apple). (There are free sites on the Web for other machines such as IBM and HP Unix boxes.) You will need to follow the associated installation instructions to get the JRE set up on your system.

Differences between the applet and the application

The application permits you to choose input and output files with a file chooser. To do so click on one of the Browse buttons in the startup box. Also, if your printer can handle postscript, the application permits you to print plots via the Print button at the bottom of the page.

Format.java

Format.java is software described in Cornell and Horstmann's Core Java (SunSoft Press/Prentice-Hall). (We like this book.) We use it to write to and read from the text fields.

Here is what Cornell and Horstmann have to say about the use of code from their book:

NOTE: People have often asked what the licensing requirements for using the sample code in a commercial situation are. You can freely use any code from this book for non-commercial use. However, if you do want to use the code as a basis for a commercial project, we simply require that every person on the development team for that project own a copy of Core Java.

Rubberband.java

The rubberband material is software described in Geary and McClellan's Graphic Java (Sunsoft Press/Prentice-Hall). (We also like this book.) We use this material to produce the rectangles that exclude data points.

Here is what Geary and McClellan have to say about the use of code from their book:

The CD that accompanies this book includes: Virtually all these programs are discussed throughout the book. Feel free to borrow, adapt, or extend these for your own purposes.

Support

If you have questions about this software, or suggestions for improvement, please contact Steve Verrill at sverrill@fs.fed.us or 608-231-9375. Also see this link to the FPL Statistics Group.

Bugs

As of 3/3/01 no bugs have been reported. However, we assume that they exist. Bugs will be corrected as they are reported and descriptions of any bugs and bug fixes will be documented here.

We welcome suggestions for improving this program. Please send suggestions and bug reports to the e-mail address given below.



For further information, please contact Steve Verrill at sverrill@fs.fed.us or 608-231-9375.
[Forest Service] [Forest Products Lab] [FPL Statistics Group]


Last modified on 3/3/01.




As of last midnight, this page had been accessed times.