Further Reduction Tasks

General outline


This section contains a description of a number of further data reduction steps that can be performed on data-cubes, regardless of the technique used to obtain the data.

— The data you get out of a pipeline will be in a cube format (3D data-set) or RSS (2D data-set). The cube is a slightly better format for investigate spectral images and for combining data, the RSS however is better for spectral fitting.
— Mosaicking is something you will do if you wish to combine exposures of neighbouring FoVs, or combine exposures of the same FoV but taken with different gratings; in this case it is possible (maybe even likely) that there will also be slight position shifts between the images. Before mosacking you need to consider whether your overlapping fields are shifted by an integer number of spaxels or have a fractional offset. For an integer shift you can add the data-cubes much as you would an image, with the same considerations as to what to do with the overlapping regions and what to do if the atmospheric conditions varied much between your individual images. For a fractional shift you will need to resample your spatial plane, and if you are doing that anyway you may want to consider slightly improving the spatial sampling.
— In our Resampling section we discuss the this spatial resampling.
— Correcting for the DAR, the wavelength-dependent shifts that the atmosphere adds to spectra, will be necessary if your wavelength range is not very short and if no ADC was used while observing.
— Finally we include some thoughts on the effect that a PSF varying across the FoV (on the detector plane) can have on your data; but as this is something we have not found discussed in the literature, nor have we done ourselves, we can offer no practical solutions.

RSS and cube formats (and converting between the two)

Usually after having reduced your data you will be left with a frame in RSS format: the spectra, each 1 pixel wide, are stacked together in the same order that they had originally on the detector. In order for you to then create an image from these spectra (e.g. to see what it is you were observing — make a map of your FoV), you need a "position table". This is a table of the absolute or relative spatial/sky positions of each fibre listed in the same order as they are stacked in your RSS data file. This position table should be provided to you by the observatory; it may even be in the fits headers of your data files (e.g. for VLT data). Note: be aware that any apertures(/spaxels) that were missing in your observation (e.g if they just missed the edge of the detector because of instrument flexure, or were broken) should probably be present as a blank spectrum in your RSS image; if not, be sure that the position table likewise skips over the missing apertures.

To create spectral images of your FoV you simply need to sum the flux within your desired spectral range (e.g. over an interesting emission line) for each aperture and map these flux values vs. their positions (read from the position table). There are GUI-based visualisation tools which allow interaction with the data, which we cover in the data analysis page, or more simple, quick-look scripts:

  • IRAF sbands task
  • Write your own script. We will be providing some scripts here.

An alternative is to turn your RSS file into a data cube: a 3-dimensional frame with axes of X and Y (RA, Dec) and wavelength. Again you will need the position table. The advantage of holding the data in this format is that scrolling through the image along the wavelength direction (e.g. in IDL or any fits viewer such as DS9 or Gaia) is easy. However, if you need to carry out detailed analysis on the spectra, such as line fitting, it is probably necessary to have the data in RSS format.

Some scripts that turn RSS into a cube:

  • Gemini pipeline includes a task to do this (gfcube) (for GMOS data)
  • For the VIMOS IFU see our Vimos Extras page. This can be adapted for other IFUs, where it will be necessary to change the dimensions and the name of the position table file
  • IDL routine to convert a cube to RSS format: cube2rss.pro

One thing to consider: if you plan on turning your IFS data into cube format and the spaxels are not squares (e.g. Gemini GMOS-IFU hexagonal spaxels) then you will need to perform a spatial resampling (interpolation) before in the conversion process.

Mosaicking/combining multiple exposures

You will often need to combine individual exposures: (i) because they have been dithered by small, usually sub-spaxel offsets to improve the spatial sampling of the instrumental PSF; (ii) because they have been dithered by whole spaxel-sizes, to mitigate the affect of dead fibres, dead rows on the CCD, or gaps in the CCD coverage; (iii) because they have been taken in a mosaic pattern to increase the FoV covered. (See also this reference here.) Or it may be that you simply wish to combine multiple exposures of the same FoV but where the exposures have accidental shifts between. To do this it may also be necessary to perform a spatial resampling of your data, either before you combine or in the process of combining. To what extent you will resample will depend on your science goals and on the amount of shift/offset between the individual images. Remember that it is best to change the data as little as possible before you make your science measurements!

Combining exposures with integer shifts

If your data have the same FoV or have already been shifted to the same/overlapping spatial grid, then combining and mosaicking is simply a matter of adding together the frames (e.g. using IRAF's {{imcombine}). Consideration of changes to the atmospheric conditions between your individual images needs to be paid, as this will affect your PSF (i.e. the seeing), the noise levels in your spectra, and the "colour" that is added by the atmosphere (especially if no flux calibration was carried out). There may not be much you can to to "correct" for this, but at least make sure you understand the effect it will have on your final result. If the overlapping regions are a considerable fraction of the total area, bear in mind that the combined regions will have an improved SNR and this may have consequences for any later analyses you do (e.g. if you want to use Voronoi binning: see below).

It is simple to combine cubes with integer-spaxel shifts between them; this can be included as an option in routines like imcombine:
Here we show an example IRAF code to combine 2 data cubes that have been spatially dithered in order to remove missing/dead fibres/spaxels. First we make two bad pixel masks that have the value 1 everywhere except for where the dead spaxels are (value 0) by dividing the cube by itself, and assign these to the cubes' headers. We then combine the two cubes using imcombine, specifying an offset file containing the relevant offsets. (Note that imcombine only accepts integer pixel offsets.) You can include a cosmic-ray removal at this stage of the combining process if you wish.

imarith cube1.fits / cube1.fits bpm1.pl #in IRAF 0/0=0 (luckily)
imarith cube2.fits / cube2.fits bpm2.pl
hedit images="cube1.fits" fields="BPM" value="bpm1.pl" add+ addonly+ verify- show+ update+
hedit images="cube2.fits" fields="BPM" value="bpm2.pl" add+ addonly+ verify- show+ update+
imcombine ( "cube1.fits,cube2.fits", "combined_cube.fits", combine="average", offsets="shift.dat", / 
  masktype="badvalue", maskvalue=0 )

Combining with fractional spaxel offsets

Simple approach
A simple way to do this is to first resample your individual cubes to a smaller spaxel resolution (i.e. reduce the size of your spaxels), a resolution small enough that the shifts between cubes become (practically) integer (i.e if you shift is 1.5xspaxel_size and you make your new spaxels 0.5xspaxel_size then the shift is 3xnew_spaxels). You can then combine the cubes as explained in the section above. This is a somewhat crude approach, although if you already have a resampling script (see below), this approach has the advantage of being fast and allowing you to see immediately how well you can resample and combine images — it may be therefore worth doing to see whether you really do need a more sophisticated resampling/mosaicking procedure.

More complex approach
Another, more sophisticated approach is to create a new spatial grid that covers the entire FoV that all your cubes have. This spatial grid can have a spatial resolution <, =, or > that of your individual images. The conceptual approach here would be to overlay your individual cubes on to this new spatial grid, work out what fraction of the new spaxels are covered by the old spaxels, and to transfer that fraction of old flux to the new. This would have to be done for each RA-Dec slice of the image at each wavelength point your spectra have.

A still more sophisticated approach would be to calculate the amount of flux to transfer to the new spaxel by 2D fitting the fluxes of the old spaxels (again, dealing with each wavelength slice of your cube separately). The concept here is to take each spaxel, let us say spaxel in position (5,5), and then fit with polynomials the fluxes of the spaxels in the surrounding box, e.g. spaxels (3:7, 3:7). You could do this just along the X and Y axis or do a full 2D fitting. This will then tell you what the flux should be at any point along the fit, i.e. you can then work out what the flux should be in any new spaxel, of any size and any position. Bear in mind that you do need to conserve flux, i.e. the total flux of the whole image, at each and every wavelength-slice, must be the same as the flux was in the original cube unless that original cube has non-contiguous spaxels.

Note that what is described above is not a million miles away from resampling your cubes, so we recommend you read that section also.

Already prepared scripts
PYFU from James Turner (Gemini), which is currently unofficial, is a script that runs under PYRAF (IRAF running via python) and combines frames according to the WCS in the header. The WCS (offsets) can be worked out with another script that does peak fitting on the image plane to all input cubes. (NOTE: when we can add a link here, we will.)

A note on calculating offsets
What to do if you don't have any point sources in your image, and/or the WCS is not precise enough to account for the FoV shifts of the cubes you wish to combine? One trick we used when working with data of the ISM is to match the shape of the ISM in two lines of HI ($H\alpha$ and $H\beta$), considering that the spatial appearance of the gas cloud in these two lines should be the same. A more sophisticated approach would be to do a 2D cross-correlation between your cubes (again, wavelength-slice-by-slice). Automated approaches to this may only work if you have sufficient spatial features in your images to cross-correlated on, or features sufficiently bright that they outweigh the noise.

Always, but always, check your spectra after you have resampled and combined cubes, especially if you have spectral lines you want to work with. Spectral lines can be sensitive creatures, prone to being messed up if treated badly. Make sure your spectra are not substantially changed by the resampling you do on them. Improved, yes, but changed, no.

Wavelength-based data-cube combining

If you are mosaicking cubes wavelength-slice-by-slice, then of course the wavelength grid for all input cubes will have to be the same. But what if you want to combine images that are of the same, or almost the same, FoV but have a different wavelength coverage? Well, either you standardise the wavelengths so that where they overlap they are the same, or you can perform a resampling of the wavelength axis as you combine the cubes. (Overlay the spectra to see what sort of overlap you have and thus what the new dispersion you need to set, i.e. how small should your new Angstroms-per-pixel be? The principle for resampling+combining then is the same as for spatial resampling — see below.) If you want to combine different wavelength ranges and different FoVs, well, you have a job on your hands! Note here that if you have to resample the spatial plane to change the spaxel size (or shape), not just their positioning, we recommended you do the wavelength resampling first — the spatial resampling is usually best done wavelength-slice-by-slice, and so the wavelength slices should be the same for all input cubes.

You have exposures of the same target with the same set-up, taken apart in time

In this case, the geocentric velocities of your target and the airmasses will be different. You will therefore need to correct the wavelengths to the heliocentric frame and DAR correct each cube before combining. If you have a spatial dither (as I did), then that adds further complication. My solution using IRAF was the following:

  1. convert your cube to row-stacked spectrum (RSS) format
  2. correct wavelengths to heliocentric reference frame using DOPCOR (be careful about the dispaxis setting), i.e. dopcor exp1.fits exp1_dopcor.fits 12.5 isvel+
  3. convert to cube format
  4. DAR correct using the diffatmref code, making use of the Xoffset and Yoffset options to account for any dither offsets
  5. convert back to RSS format
  6. co-add the exposures using SCOMB with the group="apertures" and first="yes" options. Masking out blank/bad fibres can also be done with a bit of thought at this stage.

Note: The longslit task LSCOMB could also be used.

Note: James Turner's pyfmosaic also handles non-integer spaxel shifts and wavelength offsets. Contact him for more information. Also see this IRAF.net forum post.

Resampling, drizzling, interpolating

If you wish to combine images where sub-spaxel shifts are present, or perhaps the spaxel sizes and shapes are different as the data come from different instruments, it will be necessary to resample the spatial plane — either before you combine or in the process of combining (although to be honest, these two methods are practically the same).

Spatial resampling is also done on single images. Reasons include: (i) making your spectral images look nicer, maybe for a paper or in order to better see subtle variations of spatial structure across your target, (ii) better pin-pointing shifts in pointing, for example when working on the DAR correction, (iii) your spaxels are not square (Integral and GMOS spring to mind) and you wish to make them square. Resampling always involves changing the data, so do it as little as possible. Also this is something generally done for each wavelength slice of your data, which necessarily have to be held in a cube format.

Possible scenarios for resampling and/or combining frames are:

  • Keep your new spatial sampling the same as the old (i.e the new spaxels have the same size as the old) but the grid larger enough to include the FoV of all images. Make a new grid that covers the total region your input data will cover. Add in each image to this new grid by simply working out the percentage of overlap of each old spaxel with each new spaxel.
  • Increase the spatial sampling (i.e. the new spaxels are smaller than the old). Again you need to work out the percentage of flux in each new spaxel from each old. However, here you could consider here not just transferring e.g. 1/4 of the flux of the old spaxels to your 1/4-sized new spaxels, but rather carrying out 2D (polynomial) fits to calculate the amount of flux that should be transferred to each new spaxel….and doing this for each wavelength slice. (see here for a slightly more lengthly explanation.)
  • You are changing spaxel shapes; the idea is the same (i.e. you are doing a spatial resampling) — chose your new spaxel size (e.g. make the new and old cover the same area) and shape and interpolate the flux for each wavelength slice between the two.

These points are much as was described in the section just above ("Combining with fractional spaxel offsets").

Some resampling software

  • Voronoi binning: An adaptive spatial binning method to maintain a constant signal-to-noise ratio per bin, using the method developed by Cappellari & Copin (2003, MNRAS, 342, 345). You can also check out http://www.astro.umontreal.ca/fantomm/reduction/. Voronoi binning is a way to improve the sampling in a way that makes scientific sense, rather than to be used as an aid to combing offset cubes.
  • Drizzling, e.g. drizzle and the STSDAS dither package in IRAF (see http://www.stsci.edu/~fruchter/dither/dither.html and links therein). Drizzle is a technique much used on HST images (to correct for the spatial distortions). You can also cut up your 3D cube into wavelength slices, treat each as an image and use drizzle on them. Drizzle will not change spaxels from non-square to square.
  • The task gemcube in the Gemini IRAF package (written by Frank Valdes from NOAO) implements a multi-dimensional version that can map data directly from a 2D image to a 3D cube. This is used by nifcube for NIFS reduction.
  • The Euro3D visualisation tool here. The resampling it does actually works very well, can change spaxel shapes from non-square to square, and is easy to use, but the tool itself is very difficult to install and very difficult to get working.

Whatever method you choose, you must be careful that the flux is conserved and the spectral features are not altered, so compare the before- and after-spectra. If you plan on writing your own code, then good luck and please consider leaving a link here :).

A few notes about 2D spatial fitting

A few pieces of advice we can offer.

Some why and how of resampling (offered without any guarantee)
Consider this. A telescope image (i.e. what the telescope actually sees) is continuous. An IFU cube, fibres, are discrete points convolved with the telescope PSF. If the telescope image consisted of a set of points in the shape of the IFU grid (rather than being continuous) and you then convolved that grid with the PSF, the value you would end up with at each point would be a weighted sum of the original points with coefficients determined by the PSF. Recontructing the image would be a matter of reverse engineering the IFU image using the PSF. But since the image that the telescope, that the IFU, is looking at is in reality continuous, this reverse engineering is not quite so simple. When working on resampling a data-cube, you are first trying to reconstruct the continuous telescope image that the IFU was originally looking at, so that you can then resample it as if the IFU were a perfect square grid in the first place. That can done by some approximation to a sinc function. You could use a Gaussian (e.g. a PSF-like shape) but it would filter the spatial frequencies ($\approx$the spatial grid) less optimally.

Sinc fitting
Fitting functions to your 2D spatial image at each wavelength, as a step towards improving/changing the spatial sampling, does have some drawbacks. A good function to use in your spatial fitting is a since function that is finite, such as a spline. However, if your data are undersampled (e.g. the PSF size equals 1 spaxel size) using a sinc function can result in a "ringing" effect around point sources. A sinc function really needs to be infinite to work properly.) Using a purely-positive interpolant, such as a Gaussian would this avoid ringing, but at the expense of greater smoothing. Actually, it is not possible to resample the image without losing some information, no matter what type of function you use in your fitting: you cannot create information that was not there in the first place!

Once you start looking at the ~1% level it may be that images with >2 pixels/FWHM are still technically undersampled for a seeing-limited PSF, producing some low-level artifacts when using splines for your fitting. However, what you need to consider are what you actually want and need to do with the data, and to what level of accuracy you have to go. If you need to resample, well, then you need to resample…just make sure you check your before and after spectra, to know what the effect has been on your data, and use methods that have been tested and reported on by others. In practice, it seems that most people we have talked to about this haven't really noticed problems using splines or piecewise polynomials when applied to fairly well-sampled data such as GMOS; for less well-sampled data you just need to experiment with other functions if you don't like the look of what you get with spline fits. It may be more of an issue for high contrast data such as QSOs, because 1% of a large number (a large flux) is significant.

Resampling the results of fits rather than the data-cube

One alternative to consider is to first make your science measurements (e.g. fit all emission lines) and only then combine (add or mosaic) the results from the individual frames. You will still have to do a spatial resampling — unavoidable if your FoV are not exactly the same or you wish to mosaic — but at least you are then resampling not on complex spectra but only on already-measured values. But, of course, it will not always be that your science programme suits this early analysis.

DAR correction

The refraction of light by the Earth's atmosphere is wavelength-dependent. This effect is termed differential atmospheric refraction (DAR), and results in the image of an object to appear at different positions in the telescope focal plane depending on the wavelength of observation. If you are imaging at a specific wavelength (even using broad-band filters) this is a non-existent or insignificant problem, but if your wavelength range is large or the telescope guiding not spot-on, it probably is a problem. A few telescopes now have an atmospheric dispersion corrector (ADC) installed to automatically correct for DAR — you are lucky if this is the case!

The amplitude of the shift due to DAR is a function of zenith distance ($\sec z$). The result is that the red and blue ends of an object spectrum will appear at different spatial positions in the reconstructed cube. You will have to identify the amount of this shift, at each wavelength point (or grouping of wavelength points) in your data. If you suspect the shifts are fractional spaxels, you may need to improve the spatial sampling of your data to first determine the shift due to DAR.

Furthermore, if the telescope was being guided during the observing run at a different wavelength to your observations, the observed position of your target in the field-of-view will change with elevation. This means that if you have a sequence of spectra taken over a wide range in elevation, then as the target moved across the sky, the spaxels on which it was centred will have changed with time.

Fortunately the effects of DAR are well documented:

Unfortunately, the theoretical description for the direction and amount of the refraction is only just that: theoretically derived for the case of a plane parallel atmosphere. The real atmosphere about your telescope probably behaves a bit differently, and may be time dependent. The ADC of your instrument will invariably only do a correction using the theoretical approach. With IFU observations it is possible to check the accuracy of this.

You have some choices to correct for the DAR in the data cubes:

1) Use the theoretical prediction of Filippenko (1982) to calculate the shift for each single monochromatic slice in your data cube. To calculate this you need to know the temperature and pressure of the atmosphere at the telescope at the time of the observation.

2) If you have an object with continuum emission in your data cube you can model the PSF as a function of wavelength, or for extended objects cross correlate each slice with respect to a reference slice in the data cube.

3) Once you know the shift of the position in X,Y as a function of the wavelength in the data cube, you can shift each slice back to its correct reference position.

Having done this, each spaxel will sample the spectrum of that point correctly.

A major problem with this approach is that the resampling of the data cube distributes the noise from bad pixels in the data cube over more than one pixel.

  • IRAF/FORTRAN DAR correction routine from Jeremy Walsh (currently only works for ESO FLAMES and VIMOS and Gemini GMOS data)
  • DAR correction using IRAF drizzle tasks from Lise Christensen: still to come
  • For GMOS data, the latest version of gfcube in the Gemini IRAF package (v1.10) includes an option to compensate for atmospheric dispersion using the refraction model from SLALIB.

Some thoughts on position- or wavelength-dependent point spread function

If the PSF varies across the FoV/CCD or with wavelength, then the profile of any emission or absorption lines in your spectra will also, the spatial and spectral PSFs being related. Variations to the PSF will occur (i) if the focus is not flat over the detector or (ii) if you have a wide wavelength range in your data.
The first point will usually see the line profiles being slightly skewed at the four edges of the detector. The second point is only important if your wavelength range is wide enough for the PSF to change significantly from one end to the other (and due, of course, to the $\lambda/d$ nature of telescope resolution and changes in the seeing with wavelength). In this case it is not an instrumental problem, rather a feature of the data.
Changes to the PSF with position on the detector is generally only going to be a problem for science programmes that need to measure emission or absorption lines.

The first point above is particularly an issue for Integral on the WHT, but all instruments will have some level of PSF variation across the detector; the question is, is this small enough to be unimportant? For most instruments and most programmes it probably is unimportant, however it is easy enough to check. If your arc frames have plentiful lines spread out over all parts of the detector, you can map the emission line profiles (for unblended arcs lines, naturally) with spatial position and so map the PSF "field". Another, although less fun way to measure the PSF field would be to measure the spatial profile of the apertures, on the raw detector images, as changes to these will be accompanied by changes in the profiles of your target's emission lines.
A clue to the presence of PSF variations can be if you find, after your sky spectrum subtraction, funny "P Cyg"-type residuals in the subtracted sky emission lines. If your instrument has PSF changes across the detector and your sky lines were all on one part of the detector, and thus one part of the PSF field, then these types of residuals will result naturally.

If you do find an unsympathetic level of skew and/or change in skew of your PSF you may want to try to compensate. This is something that we have found little about in the literature. Dealing with this when it comes to the sky subtraction is one issue. It may be best to measure the averaged sky spectrum and subtract the lines as values from your later spectral fits, rather than do an actual sky spectrum subtraction. Alternatively you could identify where the sky lines are and fit these as features on your astronomical spectra, so removing their effect from your spectral analysis. Another idea, particularly if you are interested in removing the sky continuum, would be to remove the sky lines from the sky spectrum and then subtract the sky continuum, dealing then separately with the sky emission lines.

In the IR there are absorption features from the atmosphere and this complicates matters. Particularly for the highest resolution spectroscopic observations at the diffraction limit, as expected on future large ground-based telescopes such as the E-ELT, performance of the instrument may well be limited by the ability to correct for the telluric absorption features (where a detailed knowledge of the instrumental line profile across the field can be key), rather than pure instrumental sensitivity.

How to deal with variations of the PSF field if you wish to measure very small variations of line profiles across your source is a separate issue. The best advice we can offer is to measure the PSF field from the arc spectra and use it as input either in a straightforward FWHM correction (i.e. if your target's spectral lines are resolved, you need to subtract in quadrature the instrument FWHM from your measured FHWM to recover the target FWHM), or to created spatially-varying models of the PSF and input these in the line profile models you use to fit the astronomical emission or absorption lines.

For AO-assisted instruments, the instrumental line profile can also vary with the quality of the AO correction, as well as field position and wavelength.

On the second point raised at the top: your PSF may changes a lot if your wavelength range is wide, for example you are combining optical with IR observations of the same field. Here you may wish to keep the intrinsic PSF for each data-set, rather than convolving with a PSF model to widen the better PSF to match the worse. In this case, if you are doing any mosaicking of these data, you will have to consider what your new spatial sampling should be: wavelength-dependent (e.g. designed to sample the PSF well) or the same for the whole range? A PSF varying across your data-cube is something that PACS data from the Herschel observatory will have to consider, as the wavelength range of these sub-mm data is sufficient to have considerable PSF differences from the blue to the red.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License