26 января 2026

Disease diagnosis from unstructured medical texts using machine learning techniques.

73

Ermak A.D., Makarova E.A., Kaftanov A.N., Gavrilov D.V., Novitskiy R.E., Gusev A.V.

Abstract

Modern machine learning methods open new opportunities for analyzing medical texts. The use of unstructured data enables improved clinical decision support and the development of personalized patient treatment approaches.

The aim of the study: to develop an optimal algorithm for disease prediction using multi-label classifi cation based on medical texts from selected patient treatment cases. Materials and methods. The study utilized anonymized electronic medical records of 387 590 patients. Textual data were processed using lemmatization and vectorization based on a pretrained
FastText model. A multi-label classifi cation model was developed to predict 156 diagnostic categories grouped by major disease classes. Neural network architectures and decision tree ensembles were applied for model building.

Results. The proposed models demonstrated high eff ectiveness. The use of various text vector aggregation methods improved prediction quality. The model showed stability and clinical interpretability, supporting its applicability in real-world medical practice.

Conclusion. The developed approach to analyzing unstructured medical texts using machine learning methods is a promising tool for disease diagnosis support. Further research will focus on improving model interpretability and adapting models to diverse clinical data sources

Download pdf|560,6 КБ

Ermak A.D., Makarova E.A., Kaftanov A.N., Gavrilov D.V., Novitskiy R.E., Gusev A.V. Disease diagnosis from unstructured medical texts using machine learning techniques. National Health Care (Russia). 2025; 6 (4): 55–63. https://doi.org/10.47093/2713-069X.2025.6.4.55-63

Share

Subscribe to our newsletter

Are you interested in digital healthcare and artificial intelligence for medicine? Join our mailing list!

Join us

We are in social networks