User Tools

Site Tools


huibintema:spam:pipeline

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

huibintema:spam:pipeline [2020/11/19 15:29] (current)
huibintema created
Line 1: Line 1:
 +===== Running the SPAM pipeline =====
 +
 +This section (created with significant help from Pratik Dabhade) describes how to run the pipeline on continuum observations at 150, 235, 325 or 610 MHz, with 16 or 32 MHz of bandwidth, obtained with the GMRT software correlator backend (GSB). ​
 +
 +-----
 +
 +==== Basic pipeline run ====
 +
 +Let's assume you have requested and downloaded a single night observation from the GMRT archive in LTA format. You will end up with 3 files:
 +   * <project name>​_<​observe date>​.lta - The visibility data in LTA format (e.g., 29_043_24aug2015.lta)
 +   * <project name>​_<​observe date>​.lta##<​observation number>​.obslog - The operator log file in ascii format; contains useful information about how the observations went.
 +   * <project name>​_<​observe date>​.lta##<​project_name>​.FLAGS[.<​index>​] - The online flag information in ascii format; contains status information of individual antennas, used for flagging
 +
 +We will assume that the observation contains at least one of the primary calibrators 3C48, 3C147 or 3C286.
 +
 +First we convert LTA to UVFITS format:
 +<code python>
 +lta_file_name = "​./<​project name>​_<​observe date>​.lta"​
 +convert_lta_to_uvfits( lta_file_name )
 +</​code>​
 +This creates a UVFITS file in the fits subdirectory,​ having the same name as the LTA file name but with the extension "​.UVFITS"​ added to it.
 +
 +Then we derive calibration and flagging information from the primary calibrator(s),​ pick the best one, transfer those informations to all other source visibilities present in the observation,​ and export each of these pre-calibrated visibility data sets to UVFITS:
 +<code python>
 +uvfits_file_name = "​./​fits/<​project name>​_<​observe date>​.lta.UVFITS"​
 +pre_calibrate_targets( uvfits_file_name,​ flags_file_name = lta_file_name + "##​*.FLAGS*"​ )
 +</​code>​
 +
 +The fits subdirectory in our project directory now contains pre-calibrated visibility data sets per source. The file name convention is as follows:
 +<source name>​_<​observation reference date>​_<​polarization(s)>​_<​sideband>​.UVFITS
 +
 +The main pipeline will take the pre-calibrated visibilities of any source you specify, and produce a series of images with increasingly more flagging and calibrations.
 +<code python>
 +target_uvfits_file_name = "​./​fits/<​source name>​_<​observation reference date>​_<​polarization(s)>​_<​sideband>​.UVFITS"​
 +process_target( target_uvfits_file_name )
 +</​code>​
 +
 +This may typically take 6-8 times the duration of the actual observation to complete. The good news is that you can walk away and check the results later. All processing output is captured in a log file:
 +./​datfil/​spam_<​source name>​_<​start date>​_start_time>​.log
 +The final image is:
 +./​fits/<​source name>​.SP2B.PBCOR.FITS
 +
 +The pipeline run can be summarized by typing:
 +<code python>
 +summarize_spam_log( "​./​datfil/​spam_<​source name>​_<​observation reference date>​_<​polarization(s)>​_<​sideband>​*.log"​ )
 +</​code>​
 +
 +
 +There'​s many options available to make this process work for different / more complicated data sets. Some of the most common situations are discussed below. Note that more than one of these situations may reflect your data, and therefore multiple of the options discussed below may need to be combined.
 +
 +-----
 +
 +==== Combining multiple LTA files of one observation ====
 +
 +Sometimes a single observation has been broken up into multiple LTA files because the data recording was interrupted (e.g., because of a power failure or system reset). Note that this is different from having a single project spread out over multiple observations. The multiple LTA files can be individually converted to UVFITS before combining:
 +<code python>
 +lta_file_name_1 = "​./<​project name>​_<​observe date>​.lta"​
 +convert_lta_to_uvfits( lta_file_name_1 )
 +lta_file_name_2 = "​./<​project name>​_<​observe date>​.lta.1"​
 +convert_lta_to_uvfits( lta_file_name_2 )
 +</​code>​
 +Then the resulting UVFITS files can be combined as follows:
 +<code python>
 +uvfits_file_name_1 = "​./​fits/<​project name>​_<​observe date>​.lta.UVFITS"​
 +uvfits_file_name_2 = "​./​fits/<​project name>​_<​observe date>​.lta.1.UVFITS"​
 +uvfits_file_name = "​./​fits/<​project name>​_<​observe date>​.lta.combined.UVFITS"​
 +combine_uv( uvfits_file_name_1,​ uvfits_file_name_2,​ uvfits_file_name )
 +</​code>​
 +The output UVFITS can then be passed to //​pre_calibrate_targets()//​ etc.
 +
 +-----
 +
 +==== Combining observations of the same target ====
 +
 +Combining observations on the same target from multiple observations is only possible when the frequency setup was exactly the same during the multiple observations. This is (almost?) always the case when a target was observed during a single project spread over multiple nights.
 +
 +First, convert and pre-calibrate the observations per night. Then make sure that the multiple pre-calibrated UVFITS files of the target are all located in the same (fits) directory, and have a common naming scheme. E.g., <​target_name>​_GMRT<​frequency>​_<​unique_observe_date>​_<​polarizations>​_<​sideband>​.FITS. Then you can run the main pipeline using a regular Linux wildcard (e.g., '​*'​ or '?'​) in the uvfits_file_name:​
 +<code python>
 +uvfits_file_name = "​./​fits/<​target_name>​_GMRT<​frequency>​_*_<​polarizations>​_<​sideband>​.FITS"​
 +process_target( '​path/​Jabc_GMRT325_*_RRLL_USB.UVFITS'​ )
 +</​code>​
 +This will read in all UVFITS files that match the wildcard query, concatenate them, and process them as one.
 +
 +-----
 +
 +==== Changing image weights ====
 +
 +The user can change some of the imaging parameters (AIPS IMAGR style), including the image weights. By default, the main pipeline uses no UV range cuts, and sets the Briggs robust parameter to -1 to compensate for the broad PSF wings due to the centrally condensed UV coverage. This can be changed as follows:
 +<code python>
 +imagr_params = { '​robust'​ : 0., '​uvrang'​ : [ 0.5, 10. ] } # AIPS IMAGR parameters are passed as a Python dictionary
 +process_target( target_uvfits_file_name,​ imagr_params = imagr_params )
 +</​code>​
 +
 +-----
 +
 +==== Keeping main pipeline intermediate files ====
 +
 +The main pipeline takes the data through various repetitions of calibration,​ flagging, and imaging. If so desired, one can keep all the processed fits and uvfits of the target you are processing. This way, for instance, you can monitor the progress in image quality in steps. This is activated by disabling the minimize_storage option:
 +<code python> ​
 +process_target( target_uvfits_file_name,​ minimize_storage = False )
 +</​code>​
 +
 +-----
 +==== Dual-frequency observations ====
 +
 +In case of dual-frequency observations,​ the 235 MHz visibilities are located in the LL polarization and 610 MHz visibilities in the RR polarization. To process one or both, they can be split in the following way:
 +<code python>
 +convert_lta_to_uvfits( lta_file_name,​ uvfits_file_name_235,​ stokes_list = [ "​LL"​ ]  )
 +convert_lta_to_uvfits( lta_file_name,​ uvfits_file_name_610,​ stokes_list = [ "​RR"​ ]  )
 +</​code>​
 +The two output UVFITS files can then be processed further as usual. Please note that you may have to manually limit the frequency channel range for the 235 MHz observations,​ as in most cases the correlated bandwidth is 32 MHz while the 235 MHz receiver bandwidth is limited to 16 MHz. Please see below.
 +
 +-----
 +
 +==== Limited bandwidth observations ====
 +
 +In some cases, especially for 150 and 235 MHz observations,​ a bandpass filter is activated during observations to suppress RFI near the observing band. This means that parts of the correlated bandwidth (often 16.7 or 33.3 MHz) are not usable. A preferred frequency range can be manually selected by running
 +<code python>
 +pre_calibrate_targets( uvfits_file_name )
 +</​code>​
 +and inspecting the resulting bandpass plots (postscript format) in the ./prtfil subdirectory via the shell command
 +<code bash>
 +gv ./​prtfil/​*_BANDPASS.PS
 +</​code>​
 +Select a channel range over which the bandpass phases are well-behaved (approximately linear) for most antennas, like in the following example plot.
 +{{ :​bandpass_example_1page.png?​500 |}}
 +
 +Note down the lower- and upper-limit of the channel range. Then re-run //​pre_calibrate_targets()//​ as follows:
 +<code python>
 +channel_range = [ 150, 425 ] # example channel range to keep
 +pre_calibrate_targets( uvfits_file_name,​ channel_range = channel_range )
 +</​code>​
 +
 +-----
 +
 +==== Old hardware-correlator observations ====
 +
 +If you downloaded data from cycle 17 or earlier, it is likely correlated using the GMRT Hardware Backend (GHB; a.k.a. the hardware correlator). For frequencies of 325 MHz and higher, the 32 MHz bandwidth is typically split over an upper-side band (USB) and lower-side band (LSB), both captured in separate LTA files (typical extensions are .lta and .ltb). In SPAM, both sideband LTA files need to be pre-processed separately:
 +<code python>
 +convert_lta_to_uvfits( lta_file_name )
 +convert_lta_to_uvfits( ltb_file_name )
 +</​code>​
 +The next step is run using the //​keep_channel_one//​ option to enforce frequency continuity between the sidebands:
 +<code python>
 +pre_calibrate_targets( uvfits_file_name_lta,​ flags_file_name = lta_file_name + "##​*.FLAGS*",​ keep_channel_one = True )
 +pre_calibrate_targets( uvfits_file_name_ltb,​ flags_file_name = ltb_file_name + "##​*.FLAGS*",​ keep_channel_one = True )
 +</​code>​
 +The resulting UVFITS files for USB and LSB per target can be combined:
 +<code python>
 +uvfits_file_name_usb = "​./​fits/<​target>​_GMRT<​frequency>​_<​date>​_<​stokes>​_USB.UVFITS"​
 +uvfits_file_name_lsb = "​./​fits/<​target>​_GMRT<​frequency>​_<​date>​_<​stokes>​_LSB.UVFITS"​
 +uvfits_file_name = "​./​fits/<​target>​_GMRT<​frequency>​_<​date>​_<​stokes>​.UVFITS"​
 +combine_usb_lsb( uvfits_file_name_usb,​ uvfits_file_name_lsb,​ uvfits_file_name )
 +</​code>​
 +The output UVFITS file can be processed further in the main pipeline.
 +
 +Regarding the main pipeline, there are two options that may be relevant to get to better results. The first option related to the situation explained above, where two sidebands (USB and LSB) are joined together to cover 32 MHz of bandwidth. In that case, it may help to turn on an image-based flagging option that treats the joined USB and LSB separately. Reason for this is that the USB and LSB have separate signal chains, and thus there can be system problems that relate only to one of the two sidebands.
 +<code python>
 +process_target( target_uvfits_file_name,​ flag_image_usb_lsb = True )
 +</​code>​
 +
 +The second option is to turn on baseline-based calibration,​ BUT ONLY AFTER VERY CAREFUL CONSIDERATION. In an ideal world, baseline-based calibration would never be necessary. However, I have witnessed situations using hardware-correlator data in which residual gain errors on baselines did not seem to be solely antenna-based. Baseline-based calibration is something to try as a last resort, and can be switched on in the pipeline as shown below. Please note that this is implemented under strict limitations:​ baseline-calibration is applied only in the final stages of the processing, and is determined on a per-observation base (meaning only one correction per baseline per observe session).
 +<code python>
 +process_target( target_uvfits_file_name,​ do_blcal = True )
 +</​code>​
 +
 +-----
 +
 +==== Using different calibration models ====
 +
 +By default, a point source model based on the NVSS, WENSS, VLSSr, SUMSS, and MGPS-2 catalogs is used to bootstrap the phase calibration at the start of the main pipeline (//​process_target()//​),​ and is also used to correct the astrometry during later stages of the pipeline run after self-calibration and during peeling. This may not always give the desired result. It is possible to use other reference models. The simplest is to switch to a point source model based on TGSS. 
 +<code python>
 +process_target( target_uvfits_file_name,​ use_tgss = True )
 +</​code>​
 +
 +In case of observations of the same field in multiple GMRT bands, it is often useful to start with processing the lowest frequency data (which means the widest field-of-view,​ but also lowest resolution),​ and use the resulting (primary beam corrected) image as a calibration model for the next lowest, etc. Easiest is to run PyBDSM/F on the *.SP2B.PBCOR.FITS image and save the extracted gaussian list in ASCII format. ​
 +
 +<code python>
 +catalog_name = "<​project_dir>/​fits/<​field_name>​.SP2B.PBCOR.pybdsm.gaul"​
 +catalog = read_pybdsm_ascii_catalog( catalog_name )
 +source_list = create_source_list_from_catalog( catalog )               
 +resolution = 15. # representative resolution of model image in arcsec
 +process_target( target_uvfits_file_name,​ model_source_list = source_list,​ model_resolution = resolution )
 +</​code>​
 +
 +-----
 +
 +==== EXPERIMENTAL:​ Processing uGMRT wideband data ====
 +
 +SPAM has some options to process uGMRT wideband data. SPAM does not support the processing of large fractional bandwidths (df/​f>​~0.2) in one run, but instead the bandwidth can be split up into smaller chunks (subbands) that can be processed independently. If done carefully, the calibrated output visibilities of SPAM pipeline runs on multiple subbands can be jointly imaged with a wideband imager (WSClean) as a final step. This approach has produced good results when applied on bands 3 (250-500 MHz) and band 4 (550-850 MHz) data. Processing band 2 data (120-250 MHz) has given mixed results. Since there is not yet a good way to apply the wideband primary beam corrections,​ this approach only works for observations where the target'​s angular size is small with respect to the primary beam size (size <~ 0.1 * FWHM).
 +
 +The first step is to convert LTA to UVFITS format:
 +<code python>
 +lta_file_name = "​./<​project name>​_<​observe date>​.lta"​
 +convert_lta_to_uvfits( lta_file_name )
 +</​code>​
 +Next, we split the UVFITS file into smaller frequency chunks (subbands):
 +<code python>
 +uvfits_file_name = "​./​fits/<​project name>​_<​observe date>​.UVFITS"​
 +split_wideband_uvdata( uvfits_file_name )
 +</​code>​
 +The width of the frequency chunks is automatically set to a sensible value. The resulting 4 or 6 UVFITS files are also located in the fits subdirectory and named "​./​fits/<​project name>​_<​observe date>​.BAND<​xx>​.UVFITS",​ where <xx> is a 2-digit simple counter starting at 1 for the lowest frequency chunk. ​
 +
 +From here, each frequency chunk is processed independently in a similar fashion as a narrow-band GMRT observations,​ but using slightly different function calls. The first step is the pre-calibration:​
 +<​code>​
 +uvfits_file_name = "​./​fits/<​project name>​_<​observe date>​.lta.BAND<​xx>​.UVFITS"​
 +reference_frequency = 450.e6
 +pre_calibrate_wideband_targets( uvfits_file_name,​ flags_file_name = lta_file_name + "##​*.FLAGS*",​ reference_frequency = reference_frequency )
 +</​code>​
 +Setting a fixed reference frequency ensures that the frequency averaging of all frequency chunks is the same, which is important when jointly imaging the SPAM output visibilities later. Sensible values seem to be:
 +<​code>​
 +reference_frequency = 450.e6 ​ # for uGMRT band 3 observations
 +reference_frequency = 650.e6 ​ # for uGMRT band 4 observations
 +</​code>​
 +The pre-calibrated visibilities per target are located in the fits subdirectory and are named per subband.
 +
 +Next comes the SPAM main pipeline run. This is best done in separate project directories per subband. If possible, use a good, single reference sky model for all runs. For example, this reference model can be obtained from first running SPAM on the narrow-band GMRT (GSB) data that was recorded alongside the uGMRT wideband data, and extracting a sky model from the final SP2B image using PyBDSF.
 +<​code>​
 +target_uvfits_file_name = "​./​fits/<​source_name>​_UGMRT<​band>​-<​subband>​_<​observation_reference_date>​_<​polarization(s)>​_<​sideband>​.
 +UVFITS"​
 +catalog_name = "<​narrowband_project_dir>/​fits/<​source_name>​.SP2B.PBCOR.pybdsm.gaul"​
 +catalog = read_pybdsm_ascii_catalog( catalog_name )
 +source_list = create_source_list_from_catalog( catalog )               
 +resolution = 10. # representative resolution of model image in arcsec
 +process_wideband_target( target_uvfits_file_name,​ model_source_list = source_list,​ model_resolution = resolution )
 +</​code>​
 +
 +If all went well, each SPAM pipeline run on a subband yielded a final image and a calibrated visibility data set (.SP2B.CAL.RR.UVFITS). For use in WSClean, the calibrated visibilities all need to be collected in one directory and converted into measurement sets using CASA. Then WSClean can be used to do a final wideband imaging run. Here is an example:
 +<​code>​
 +wsclean -weight briggs 0 -pol RR -size 5000 5000 -scale 1.5asec -niter 15000 -auto-threshold 0.5 -auto-mask 3 -gain 0.25 -mgain 0.8 -weighting-rank-filter 3 -join-channels -channels-out 6 -j 8 -mem 80 -name SOURCE_UGMRT3 SOURCE_UGMRT3-01.MS SOURCE_UGMRT3-02.MS SOURCE_UGMRT3-03.MS SOURCE_UGMRT3-04.MS SOURCE_UGMRT3-05.MS SOURCE_UGMRT3-06.MS
 +</​code>​
 +Here, SOURCE_UGMRT3-0x.MS are the input measurement sets as produced by CASA. Make sure that "​-scale"​ is set to the proper pixel scale, and that "​-channels-out"​ matches the number of input measurement sets. WSClean produces a lot of output files, but probably you are interested in the final wideband image "​*-MFS-image.fits"​.
 +
 +
 +
 +-----
 +
 +Feedback: [[intema@strw.leidenuniv.nl|Click here]]
  
huibintema/spam/pipeline.txt ยท Last modified: 2020/11/19 15:29 by huibintema