Reference-based diffusion model for super-resolution
https://doi.org/10.17586/2226-1494-2025-25-2-321-327
Abstract
This article is devoted to digital image processing algorithms, namely, super-resolution task. Currently, various methods of image restoration based on deep learning are actively developing. These methods are used to solve image restoration problems, such as inpainting, denoising, and super-resolution. One important class of super-resolution methods is reference-based super-resolution that allows restoring the missing information in the main image using reference images. Methods of this class are mainly represented by convolutional neural networks which are widely used in computer vision problems. Despite the significant achievements of existing methods, they have one significant drawback: the image area not represented in the reference image often has worse quality compared to the rest of the image, which is clearly visible to the observer. In addition to convolutional neural networks, diffusion models are actively used in image restoration problems. They are capable of generating images with high quality and diverse fine details but suffer from a lack of fidelity between the generated details and the real ones. The aim of this work is to improve the quality of the reference-based image restoration method using the diffusion model. A hybrid architecture of the diffusion model denoising neural network is proposed consisting of three main blocks: the basic denoising module, the reference-based module, and the fusion module for the final result generation. Three models were trained: a diffusion model, a referencebased convolutional neural network, and a proposed hybrid model. All three models were trained and evaluated on the Large-Scale Multi-Reference Dataset dataset. Based on the results of the trained models testing, a qualitative (visual) and quantitative comparison of the three models was done. The hybrid model demonstrated higher image quality, clarity, and consistency compared to the convolutional neural network using references and better restoration of real details compared to the diffusion model. According to the quantitative evaluation, the hybrid model also showed higher results compared to pure models. The results of this work can be used to increase the resolution of any images using reference information.
About the Authors
A. K. DenisovRussian Federation
Aleksei K. Denisov — Assistant.
Saint Petersburg, 197101, sc 57210698353
S. V. Bykovskii
Russian Federation
Sergei V. Bykovskii — PhD, Associate Professor.
Saint Petersburg, 197101, sc 57216469537
P. V. Kustarev
Russian Federation
Pavel V. Kustarev — PhD, Dean.
Saint Petersburg, 197101, sc 35317916600
References
1. Dong C., Loy C.C., He K., Tang X. Image Super-Resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, vol. 38, no. 2, pp. 295–307. https://doi.org/10.1109/TPAMI.2015.2439281
2. Kim J., Lee J.K., Lee K.M. Accurate image Super-Resolution using very deep convolutional networks. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1646– 1654. https://doi.org/10.1109/CVPR.2016.182
3. Lim B., Son S., Kim H., Nah S., Lee K.M. Enhanced deep residual networks for single image Super-Resolution. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 1132–1140. https://doi.org/10.1109/CVPRW.2017.151
4. Ledig C., Theis L., Huszár F., Caballero J., Cunningham A., Acosta A., Aitken A., Tejani A., Totz J., Wang Z., Shi W. Photo-realistic single image Super-Resolution using a generative adversarial network. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 105–114. https://doi.org/10.1109/10.1109/CVPR.2017.19
5. Wang X., Xie L., Dong C., Shan Y. Real-ESRGAN: training real-world blind Super-Resolution with pure synthetic data. Proc. of the IEEE/ CVF International Conference on Computer Vision Workshops (ICCVW), 2021, pp. 1905–1914. https://doi.org/10.1109/ICCVW54120.2021.00217
6. Zhang Z., Wang Z., Lin Z., Qi H. Image Super-Resolution by neural texture transfer. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 7974–7983. https://doi.org/10.1109/CVPR.2019.00817
7. Jiang Y., Chan K.C.K., Wang X., Loy C.C., Liu Z. Robust Referencebased Super-Resolution via C2-Matching. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 2103–2112. https://doi.org/10.1109/CVPR46437.2021.00214
8. Cao J., Liang J., Zhang K., Li Y., Zhang Y., Wang W., Van Gool L. Reference-based image Super-Resolution with deformable attention transformer. Lecture Notes in Computer Science, 2022, vol. 13678, pp. 325–342. https://doi.org/10.1007/978-3-031-19797-0_19
9. Zhang L., Li X., He D., Li F., Ding E., Zhang Z. LMR: a large-scale multi-reference dataset for Reference-based Super-Resolution. Proc. of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 13072–13081. https://doi.org/10.1109/ICCV51070.2023.01206
10. Li G., Xing W., Zhao L., Lan Z., Sun J., Zhang Z., Zhang Q., Lin H., Lin Z. Self-Reference image Super-Resolution via pre-trained diffusion large model and window adjustable transformer. Proc. of the 31st ACM International Conference on Multimedia, 2023, pp. 7981–7992. https://doi.org/10.1145/3581783.3611866
11. Ho J., Jain A., Abbeel P. Denoising diffusion probabilistic models. arXiv, 2020, arXiv:2006.11239. https://doi.org/10.48550/arXiv.2006.11239
12. Song J., Meng C., Ermon S. Denoising diffusion implicit models. arXiv, 2020, arXiv:2010.02502. https://doi.org/10.48550/arXiv.2010.02502
13. Rombach R., Blattmann A., Lorenz D., Esser P., Ommer B. HighResolution image synthesis with latent diffusion models. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 10674–10685. https://doi.org/10.1109/CVPR52688.2022.01042
14. Li H., Yang Y., Chang M., Chen S., Feng H., Xu Z., Li Q., Chen Y. SRDiff: Single Image Super-Resolution with diffusion probabilistic models. Neurocomputing, 2022, vol. 479, pp. 47–59. https://doi.org/10.1016/j.neucom.2022.01.029
15. Yu F., Gu J., Li Z., Liu J., Kong X., Wang X., He J., Qiao Y., Dong C. Scaling Up to Excellence: practicing model scaling for photorealistic image restoration in the wild. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 25669–25680. https://doi.org/10.1109/CVPR52733.2024.02425
16. Zhang R., Isola P., Efros A.A., Shechtman E., Wang O. The unreasonable effectiveness of deep features as a perceptual metric. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 586–595. https://doi.org/10.1109/CVPR.2018.00068
17. Wang J., Chan K.C.K., Loy C.C. Exploring CLIP for assessing the look and feel of images. Proc. of the 37th AAAI Conference on Artificial Intelligence, 2023, vol. 37, no. 2. pp. 2555–2563. https://doi.org/10.1609/aaai.v37i2.25353
18. Heusel M., Ramsauer H., Unterthiner T., Nessler B., Hochreiter S. GANs trained by a two time-scale update rule converge to a local nash equilibrium. Proc. of the 31st International Conference on Neural Information Processing Systems (NIPS ‘17), 2017, pp. 6629–6640.
Review
For citations:
Denisov A.K., Bykovskii S.V., Kustarev P.V. Reference-based diffusion model for super-resolution. Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2025;25(2):321-327. (In Russ.) https://doi.org/10.17586/2226-1494-2025-25-2-321-327