Probabilistic matrix clustering with feature priors for unbiased control selection
https://doi.org/10.17586/2226-1494-2025-25-5-999-1001
Abstract
We propose a probabilistic matrix-clustering method that leverages a prior distribution of features and dimensionality reduction (Singular Value Decomposition, SVD). The approach identifies, within a large control pool, a cluster statistically comparable to the test cohort, thereby reducing systematic bias in downstream comparative analyses. We show that the method correctly selects control groups in scenarios where standard nearest-neighbor matching produces false positives. The method has been used to construct control groups in studies based on the Russian Biobank at the Almazov National Medical Research Centre (Ministry of Health of the Russian Federation).
About the Author
D. A. UsoltsevUnited States
Dmitrii A. Usoltsev — Senior Researcher; PhD Student
sc 57279360300
Columbus, 43205
Saint Petersburg, 197101
References
1. Artomov M., Loboda A.A., Artyomov M.N., Daly M.J. Public platform with 39,472 exome control samples enables association studies without genotype sharing // Nature Genetics. 2024. V. 56. N 2. P. 327–335. https://doi.org/10.1038/s41588-023-01637-y
2. Pearce N. Analysis of matched case-control studies // BMJ Online. 2016. V. 352. P. i969. https://doi.org/10.1136/bmj.i969
3. Ghosh A., Ghosh A.K., SahaRay R., Sarkar S. Classification using global and local Mahalanobis distances // Journal of Multivariate Analysis. 2025. V. 207. P. 105417. https://doi.org/10.1016/j.jmva.2025.105417
4. Brunton S.L., Kutz J.N. Singular Value Decomposition (SVD) // Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. 2019. P. 3–46. https://doi.org/10.1017/9781108380690.002
5. Rovetta A. Raiders of the lost correlation: a guide on using pearson and spearman coefficients to detect hidden correlations in medical sciences // Cureus. 2020. V. 12. N 11. P. e11794. https://doi.org/10.7759/cureus.11794
6. Wang Z., Li G., Hu F., Chi N. Toeplitz concatenated matrix aided ICA algorithm for super-Nyquist multiband CAP VLC systems // Optics Express. 2020. V. 28. N 20. P. 29876–29894. https://doi.org/10.1364/OE.404925
7. Tolkunova K., Usoltsev D., Moguchaia E., Boyarinova M., Kolesova E., Erina A., et al. Transgenerational and intergenerational effects of early childhood famine exposure in the cohort of offspring of Leningrad Siege survivors // Scientific Reports. 2023. V. 13. N 1. P. 11188. https://doi.org/10.1038/s41598-023-37119-8
Review
For citations:
Usoltsev D.A. Probabilistic matrix clustering with feature priors for unbiased control selection. Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2025;25(5):999-1001. (In Russ.) https://doi.org/10.17586/2226-1494-2025-25-5-999-1001































