Experiments attempting to directly detect dark matter have reached a high level of sensitivity, where hunting a single dark matter particle interacting within a detector demands filtering out millions -or even billions- of events originating from background sources. The Darkside-50 collaboration is an international experiment conducted at the Laboratori Nazionali del Gran Sasso in Italy, where low-radioactivity liquid argon is utilized within a dual-phase time projection chamber to detect weakly interacting massive particles (WIMPS), one of the leading candidates for dark matter. The Darkside-50 experiment faces two main data-analysis challenges: extreme class imbalance and large datasets. In this paper we show how machine learning techniques can be employed, even under the presence of samples exhibiting extreme class-imbalance (i.e., extreme signal-to-noise ratio). In our data-analysis study, the ratio of negative or background events to positive or signal events is highly imbalanced by a factor of 10^7. This poses a serious challenge when the objective is to identify a signal that can be easily misclassified as background. We develop and employ a modification of the Synthetic Minority Over-Sampling (SMOTE) technique to alleviate the class-imbalance problem, by artificially generating new signal samples using a k-nearest neighbor approach; we enhance the mechanism behind SMOTE to restrict the pool of k-nearest neighbors to minimize the overlap between signal and background events. To expedite the analysis of large quantities of data, we undersample background events. The result is a new sample distribution that facilitates building accurate predictive models. Experimental results on real data obtained from the Darkside-50 experiment show the benefits of our proposed approach.
Link to PDF (may not be available yet): P1-15.pdf