This page last changed Wednesday, 01-Oct-2008 10:48:50 EDT

PCA/Linear Mixture Modeling, and Image Presentation IDL Source Code

History

This page is here to document and distribute the source code I have been using to to my multi-spectral image analysis. The code began its existance as a FORTRAN program written by Paul Johnson which he eventually converted to IDLv2.0. That code was then heavily modified by then UW graduate students Tim Titus and David Waddill, with the first modular, menu driven version having been assembled by David Waddill. This code was then worked over by Paul Johnson to streamline it, snd then to convert it to IDL4.0. This was the state of the code when I received it. I made several of my own modifications, mainly going over a few bugs, adding a few features to the menus, adding newer outputs, and putting in the nifty display program. But enough history.

Programs

Here are the files you will need to run the package:

Note: All of the above are text files (IDL source code) and so just clicking on them will result in your browser displaying the file rather than downloading it to your computer. Right clicking on them will allow you the option to saving them to your disk. If you do use these programs, especially if you change or modify them, please send me a copy of the new source code. I'll take care of getting copies to all the other co-authors who will want the new and improved versions. Just send them to me at klassen@rowan.edu.

Documentation

The analysis package is menu driven program designed to perform all the PCA and Mixture Modeling, and even create some nifty looking output images. It also allows one to graph the various eigenvectors (principal components), create eigen-images (principal components as a function of position), plot mixture endmember spectra, and create mixture endmember images, among other things. The output graphs are all drawn to screen and optionally written to an ASCII file for graphing up in your favorite presentation graphing program. All the output images are written in both FITS and GIF format for future analysis or display respectively. The GIF images are also written out in both greyscale form and a color form.

The program requires a bit of setup before you can actually run it. You must have on your disk all the images in your set in FITS format. Second you must have a data file that contains many of the paramters needed. This file is a simple text file with the following format:

line 01: vector
This word indicates to the program that the image names will be given to it via string array variable.
line 02: Dim1 Dim2
These numbers are the x and y dimensions of your images
line 03: WB lgrid
These two numbers are the number of wavelength bands in the images set and the wavelength grid spaceing. If you want to use every image, lgrid=1, otherwise only every lgird images will be used in the analysis
line 04: l r t b xygrid
These five numbers control how much of the spatial information is used in the analysis. The first four, x,y,b,t are trim values and tell the program how man columns to trim from the left and right and how many rows to trim from the bottom and top, respectively. The last number is an x-y spacing grid; set it to 1 to use every pixel, 2 to use every other pixel (in both x and y), 3 for every third, etc. This would have a similar effect as using REBIN to make your image smaller.
line 05: badmin badmax normalize
These numbers control clipping in the image intensity direction. The first two give values of the images below which and above which (respectively) the program should ignore and the third number will cause all analysis to be scaled relative to the first band image. Set normalize=0 to ignore that feature
line 06: swapbyte fltdata
These two numbers are here for compatibility with some of the quirks in using IDL across platforms (namely PC to UNIX). Set swapbyte=1 if your FITS images were created under UNIX and you are processing them under Win32 or vice versa. Otherwise, set swapbyte=0. If your data are floating point images, set fltdata=1, otherwise set it to 0.
line 07: WaveT
This is a string that will be used as the label for the x-axis in all your plots. Normally this should be something like Wavelength(nm), or whatever other units you wish to use. The text can have spaces in it, but, in the data file, do not put quotes of any kind around it.
line 08-* Wavel
The last lines actually contain the (integer) values of your wavelength dimension, one value per line.

OK, now that you have a data file created, and all your images-to-be-processed in FITS format, you may start up IDL and begin. The first thing you need to do, for some strange IDL memory allocation quirk, is to compile the program - twice. That is, type .run pcatool at the IDL prompt, then do it again. If you don't, it will crash on the data read subroutine thinking it is some kind of undeclared variable instead of a program. Don't ask me...

Next you need to construct a string array vector containing the names of all the images. This can be most easily done if all the FITS images in your directory are only the ones you wish to process, then just type imgvect = findfile('*.fit') or some such line like that. Then all you need do is type pcatool,imgvect.

Since the program is menu driven the first thing you will see is the main menu. To begin, press the Initialize button. You will first be asked the name of your data file. Then the files will be read into menu and you will be asked a couple questions about whether or not to use the full image and if you want to zap bad pixels. Normally, your processing phase should have taken care of most bad pixels, but this could be useful for, say, cutting out the nucleus of a galaxy image. The trim can also be done within the input data file. If you want to skip any or all of these questions you can set various switches on the command line. If you know you want to use the full image, add /full, if you know you want to use all the image pixels, add /all, and if you want to include the name of the data file on the command line the just add data='filename'. So, a full command line could look like:

pcatool, imgvect, /full, /all, data='scan.dat'

After the images have been read in, the PCA will be performed and three graphs will be plotted on the screen (and written to PostScript images). These plots are the data cubes in the new PCA space of the 1st eignevector vs. the 0th, the 2nd vs. the 0th and the 1st vs. the 2nd. Usually, higher eigenvectors will not be very significant, but if you wish to plot them then press the NEW PCAPLOT button and you will be asked which plot window you want to change and which eigenvectors you want as the axes.

The program also puts an image of the first band on the display for your reference when chosing mixture endmembers. If the first band is not the most informative, you can change which image is displayed by clicking the DISPLAY New Frame button and typing in the number of the frame you wish to use as your reference. You can even change the range of the image stretch using the CHANGE Image Display Cutoffs button.

Other usefull features on this menu are the DISPLAY PCA Pixels on Image which lets you draw a bounding box on any one of the PCA plots and then have the corresponding pixels on the image (and the other 2 PCA plots) become highlited in color. Just a note, the color palette used at this point runs through a 7 color spectrum then starts a greyscale from black to white. These colors are stepped through one at a time for this feature so after you see white, you should stop or all the subsequent colors will be very dark. For more information see the routine lct_tnt.pro.

Similarly you can DISPLAY IMAGE Pixels on PCA Plot which lets you draw a bounding box on the image and see where those pixels map in PCA space on the three plots. If at anytime you do not like the current display color palette, you can Change COLOR PALETTE and enter in a number from 0-41 where 0-40 are the standard IDL color palettes and 41 is the palette created by lct_tnt.pro. Also, when your image or plots are getting cluttered you can REFRESH Image and PCA plots and they will be redrawn.

The heart of the Linear Mixture Modeling (LMM) comes about when you push the ANALYSIS Tools Menu button. There you can begin the process of the LMM, view PCA results and view LMM results. The SELECT Endmembers button allows you to use the PCA plots or the image, or even the keyboard (if you happen to know the image coordinates or your endmember pixels) to choose the LMM endmember spectra. You choose pixels by clicking on the appropriate plot/image. A short menu comes up that lets you disregard that last click and rechoose your endmember, change the size of the box which the program will average over to decide on the exact pixel to use, or to accept your pixel choice. If you enter the coordinates via keyboard, you do not get these choices! The program will give you the image coordinates of your endmember and also display on the other plots/image where the pixel lies. Then you are asked if you wish to enter a name to associate with your endmember choice (or just hit enter to accept the default).

The next step is to hit the MEASURE Fractional Abundances button and the program will begin all kinds of nasty, ugly multi-dimensional calculations and then ask you what sigma levels you wish to declare as modeled. You will be asked for three numbers, the first is the number of sigmas away from the model a pixel can be and still be considered modeled. The other two are secodary sigma levels used for display purposes only. Any pixel outside of the first sigma level entered (usually 2) will be set aside as not-fully-modeled and thus ready for use in the modeling-of-residuals phase. After you enter these three numbers the program continues its calculations and presents you with the fractional abundance maps and also writes them to disk in FITS, greyscale GIF and color GIF formats.

The rest of the buttons allow you to look at the results in a number of ways. First, you can Plot EIGENVECTOR Spectra, to screen as well as output the information to an ASCII file, Display Eigen-image to the screen and, optinally, write it out to FITS/GIF format, Plot ENDMEMBER Spectra, and, again, output those numbers to an ASCII file, Display RESIDUAL Spectral Image which allows you to see the residual at any wavelength band or if you enter -1 as the band, you see an RMS residual image, or even Display ALL RESIDUAL Spectral Images. Finally you can move on to either Analyze Pixel SUBSET Chosen From Fracmaps which lets you look perform PCA on those pixels that have, say a high value of a certain endmember, or to Analyze RESIDUAL Data. If you feel you are finished, then press the RETURN to Main Menu button and from there press the EXIT PCA Tool button.

This should give you a pretty good handle on all the main uses and fetures of this analysis tool package. The two other programs above that were not integrated into the pca library files my be useful on their own. The first, lct_tnt.pro is just another color palette that allows you to pretty much see things in greyscale, yet make annotations in some very basic colors. The second, gifbar.pro will take an image that is already loaded into the IDL environment and draw it in a window, color-coding the image values and drawing a reference color bar to the right of it. The image will be scaled to a size as close to 256×256 as possible before display to create an image that is easy to view.

The minimum command line argument to gifbar.pro is just the image variable name. Optionally one may add an image minimum and maximum value on the command line and these will be the extreme values set to the extreme colors. If both numbers are not set, then the program will calculate its own minimum and maximum values. One can also change the color table used in the display by assigning an integer value to the command line variable ct. Since the program uses the loadct IDL routine, the number for the ct variable must be an integer from 0 to 39 and will correspond to one of the 40 canned IDL color environments. If you use a different color table you may run into the problem that the numeric text turns out to be the same color as the 0 level image background color and thus becomes unreadable. To change the text color just assign an integer from 0 to 255 to the tc command line variable. This number is the color number within the color table so you can use the plotted color bar as a guideline for which number to assign to tc. Additionaly, some color tables do not have good 0 color for the overall backgroud; you can change that by assigning an interger to wc to wash the background with that color before drawng the image and text. Finally, when you get it looking just right you can set the fname command line variable to save the image to both EPS and PNG formats. An example command line using all the options would look like:

gifbar, img, 0.00, 0.05, ct=0, tc=255, wc=0, fname='img'

This line would display the image stored in the variable img using the greyscale color table with 0 being assigned to black and 0.05 being assigned to white, with white text, a black wash, and then that image being written out to the file img_00.png in Portable Network Graphics format and img_00.eps in Encapsulated PostScript format. Note, the _00 appended to the base name tells you the color table used; if you set a different color table, say 27, then _27 would be appended. This means you can create several versions of your image without overwriting them or having to change their names before creating them. Also note, the PNG image is created using the tvrd() function on the image as drawn to the x-buffer device with a size equal to the window used on the video display, so it is at your screen resolution. The EPS image is written directly to the file, using postscript fonts, 8-bits-per-pixel color, and a size equal to the window frame from the original display. It generally looks much nicer than the PNG version.

That about covers everything about these programs. Again, contact me if there are any comments, questions, or bugs.