Preview

Scientific and Technical Journal of Information Technologies, Mechanics and Optics

Advanced search

Probabilistic matrix clustering with feature priors for unbiased control selection

https://doi.org/10.17586/2226-1494-2025-25-5-999-1001

Abstract

We propose a probabilistic matrix-clustering method that leverages a prior distribution of features and dimensionality reduction (Singular Value Decomposition, SVD). The approach identifies, within a large control pool, a cluster statistically comparable to the test cohort, thereby reducing systematic bias in downstream comparative analyses. We show that the method correctly selects control groups in scenarios where standard nearest-neighbor matching produces false positives. The method has been used to construct control groups in studies based on the Russian Biobank at the Almazov National Medical Research Centre (Ministry of Health of the Russian Federation).

About the Author

D. A. Usoltsev
Institute for Genomic Medicine, Nationwide Children’s Hospital; ITMO University
United States

Dmitrii A. Usoltsev — Senior Researcher; PhD Student

sc 57279360300

Columbus, 43205

Saint Petersburg, 197101



References

1. Artomov M., Loboda A.A., Artyomov M.N., Daly M.J. Public platform with 39,472 exome control samples enables association studies without genotype sharing // Nature Genetics. 2024. V. 56. N 2. P. 327–335. https://doi.org/10.1038/s41588-023-01637-y

2. Pearce N. Analysis of matched case-control studies // BMJ Online. 2016. V. 352. P. i969. https://doi.org/10.1136/bmj.i969

3. Ghosh A., Ghosh A.K., SahaRay R., Sarkar S. Classification using global and local Mahalanobis distances // Journal of Multivariate Analysis. 2025. V. 207. P. 105417. https://doi.org/10.1016/j.jmva.2025.105417

4. Brunton S.L., Kutz J.N. Singular Value Decomposition (SVD) // Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. 2019. P. 3–46. https://doi.org/10.1017/9781108380690.002

5. Rovetta A. Raiders of the lost correlation: a guide on using pearson and spearman coefficients to detect hidden correlations in medical sciences // Cureus. 2020. V. 12. N 11. P. e11794. https://doi.org/10.7759/cureus.11794

6. Wang Z., Li G., Hu F., Chi N. Toeplitz concatenated matrix aided ICA algorithm for super-Nyquist multiband CAP VLC systems // Optics Express. 2020. V. 28. N 20. P. 29876–29894. https://doi.org/10.1364/OE.404925

7. Tolkunova K., Usoltsev D., Moguchaia E., Boyarinova M., Kolesova E., Erina A., et al. Transgenerational and intergenerational effects of early childhood famine exposure in the cohort of offspring of Leningrad Siege survivors // Scientific Reports. 2023. V. 13. N 1. P. 11188. https://doi.org/10.1038/s41598-023-37119-8


Review

For citations:


Usoltsev D.A. Probabilistic matrix clustering with feature priors for unbiased control selection. Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2025;25(5):999-1001. (In Russ.) https://doi.org/10.17586/2226-1494-2025-25-5-999-1001

Views: 14


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2226-1494 (Print)
ISSN 2500-0373 (Online)