Prev: P10.1 Next: P10.3

P10.2: Atemkeng, Marcellin
Marcellin Atemkeng (Department of Physics and Electronics, Rhodes University)
Elhadji Moustapha SECK (African Institute for Mathematical Sciences, Cameroon)
Oleg Smirnov (SKA/Department of Physics and Electronics, Rhodes University)
Sphesihle Makhathini (SKA/Department of Physics and Electronics, Rhodes University)




Theme: Databases and Archives: Challenges and Solutions in the Big Data Era
Title: Baseline-dependent dimensional reduction techniques for radio interferometric big data compression

Modern radio interferometers like MeerKAT, ASKAP, and LOFAR produce large amounts of data. The high data rates are driven by the fine time and frequency sampling of these instruments, as well as high angular resolution. For MeerKAT this amount of data is around anywhere from 64GB to 0.5 TB of row visibilities per second. The SKA will generate orders of magnitudes higher than this. Therefore, data compression is required to save memory and improve processing time. Offringa (2016) showed that lossy compression can compress LOFAR data by a factor that is greater than 5. A natural method to compress the data is through averaging, either baseline-dependent averaging or windowing as described in Atemkeng et al. (2018). Kartik et al. (2017) showed that the data can be compressed using a Fourier dimensionality reduction model (FDRM). This is in the gridded visibilities and not in the continuous visibilities. The gridded visibilities lie on a regular grid where data for all baselines have been interpolated together making it difficult to gauge an acceptable variance thresholding of the non-zero singular values (see Kartik et al. (2017)). Since each baseline sees the sky differently, decorrelation is baseline dependent and the noise variance is different per-visibility: these effects cannot be taken into account in the FDRM. Some applications (e.g. transient search, or to build up a wide-FoV interferometric archive) would require storing the raw data from all the baselines and not the gridded data. This work will studies the different algorithms in literature for big data dimensional reduction and apply them to visibilities data. The reduction should be baseline-dependent.

Link to PDF (may not be available yet): P10-2.pdf