PAN - spectral line fitting

PAN (Peak ANalysis) is an IDL-based general-purpose curve-fitting utility with a graphical user interface. It uses the Levenberg-Marquardt least-squares minimisation technique to fit a model to the data. PAN can be used to fit spectra with any one of a library of in-built functions (straight line, Gaussian, Lorentzian, etc.) or you can create custom functions to fit your data. There is no limit to the number of functions to which you can fit simultaneously but more functions result in more computation time.

The primary features of this program that set it apart from most other curve fitting routines and make it useful for IFS data are:

• The initial parameter guesses (amplitude, position and width) can be specified visually with the mouse. PAN was written such that the amount of typing done by the user is minimized.
• It has the ability to read in multiple spectra at once in an array format, and fit one initial guess to these spectra automatically. This is ideal for IFS data with potentially hundreds of spectra.
• The spectrum+fit, the residuals and the numerical results are displayed on-screen for each spectrum, meaning you can review the quality of the fit for each spectrum in a very visual manner.

PAN was originally written by Rob Dimeo (NIST Centre for Neutron Research, USA) in 2002 for the analysis of neutron scattering experiments. The original program and documentation can be found here. It as subsequently been adapted and modified by Mark Westmoquette for astronomical requirements. It can read in FITS data, write out the results in a tabular format, and has custom functions for a number of common optical line doublets.

Error arrays

For the least-squares minimisation routine to work in PAN, each data point must have an associated uncertainty. Since almost no IFS data reduction pipelines produce associated error arrays, the generation of an estimated error array is handled automatically by the FITS read subroutine within PAN when you read your FITS data in. At the moment, it applies a fixed error of +/- 1/500th of the peak data value of the array to every value (after rejecting all data points above 1000x the median to account for remaining CRs, etc.). Be aware: this calculation may require some alteration/modification to work for your data. Also be aware: that for some data this error-calculation method results in pathetic fits to your spectral lines, and the only way around it is to create a more meaningful error array. This can be done fairly simply by creating a Poissonian error image from your raw image (sqrt(counts)) and then more-or-less processing this error image through the same data reduction steps as your actual data (i.e. add on errors for the bias subtraction, flat-fielding, then extract the error spectra, apply the same calibrations, and add on calibration errors, etc.). You can then add this on to the fits image as the second extension in IRAF with:
imcopy file1 file2[0/1,append]
(where 0 is the science image and 1 the error image).

The errors on the fit results calculated by PAN are highly dependent on accurate input uncertainties on your data points. Therefore if you've read in your data using the simplistic approach, the errors on the fit results will be fairly meaningless (particularly on the fluxes).

How many components should I fit?

One of the challenges of fitting multiple components to spectral lines is deciding how many are required. The fit χ2 value will always go down with the addition of more and more components (since you end up fitting every noise feature), so how do you know when to stop?

One way is to calculate the reduced χ2 statistic, that is the χ2 divided by the degrees of freedom (where degrees of freedom is (N-P), N being number of data points and P being number of parameters in the fitting function that is being used).

Alternatively, you can use the statistical F-test. The F-distribution is a ratio of two χ2 (chi-squared) distributions, denoted by the degrees of freedom for the numerator χ2 and the denominator χ2. For statistically comparing the quality of two ﬁts, this function allows one to calculate the significance of a variance (χ2) increase that is associated with a given conﬁdence level, for a given number of degrees of freedom. The test will output the minimal increase of the χ2 ratio that would be required at the given conﬁdence limit for deciding that the two ﬁts are different. If the χ2 ratio is higher than this critical value, the ﬁts are considered statistically distinguishable. Note that unfortunately the F-test is extremely non-robust to non-normality, meaning that even if the data displays only modest departures from the normal distribution, the test is unreliable. A table of F-test critical values can be found here and is usually included in most statistical texts.

Practically speaking, this means that for the F-test to be useful, your data have to have accurate associated uncertainties. However, if you have fitted your data with estimated uncertainties (e.g. with PAN and the error estimation method described above), then the method of using a threshold χ2 ratio for determining the significance of a χ2 increase can still be useful, if not so statistically robust. You can determine a suitable threshold by comparing the χ2 ratios to the fits results shown by e.g. PAN.

An approach to fitting emission lines could be:

• Using PAN, fit the line in each spectrum with a single-, double- and triple-Gaussian initial guess (or more if necessary), and record the results.
• plot the ratio of χ2 values for each spectrum (single / double, double / triple), and apply a threshold (e.g. from the F-test) that separates doubles from singles and triples from doubles. Check the suitability of this threshold by comparing to the fit results shown by PAN and using your experience.
• Use a script to extract out the right results (single-, double- or triple-Gaussian fit) for each spectrum from the 3 input files and output to a master results file.

You might also want to think about applying a number of physically-motivated tests or filters to further improve the accuracy of the results. For example:

• to accept a fit, the measured FWHM must be greater than the associated error on the FWHM result (a common symptom of a bad ﬁt)
• the FWHM must be smaller than 15${\rm \AA}$ (to guard against ﬁtting continuum or spurious features).
• additional ﬁltering based on flux, FWHM or wavelength in order to ensure consistency of component assignment, i.e. keeping the ﬁrst Gaussian component in the results file corresponding to your deﬁnition of the ﬁrst component. This should only apply when the line is sufficiently well resolved and well-fit.