Application of machine learning in predicting early diagnosis of rheumatic diseases
Machine learning is a technology much older than the most recent and famous successes associated with it. The fundamental principles at the heart of the architecture of the most powerful artificial neural networks known today were already being used in the late 1960s. The performance of this technology relies heavily on the volume and quality of data available to perform the learning. The volume of data of all types has shown very strong growth during the past two decades, including in medicine, and many attempts have been made to use the learning resources represented by these data to train artificial intelligence (AI) to diagnose more or less complex pathologies through machine learning strategies. Among these pathologies, rheumatic diseases occupy a special place, because they represent pathologies for which patients are regularly monitored, and can be described by very different data, such as X-ray images, flow cytometry profiles, and activity scores.
Machine learning and disease diagnosis and classification
The most obvious use of AI in a medical setting is certainly diagnostic assistance. In the context of machine learning, this means that a set of data – consisting of a cohort of patients for whom a collection of clinical or biological parameters and their diagnoses are available – is used to train a program. The program is trained to recognize, within the parameters that characterize the patient in the dataset, a more or less complex combination of variables on which to rely in order to predict the correct diagnosis. In the case of clinical research, such approaches are used to identify new biological signatures and thus highlight potential biomarkers of a pathology. This approach is based on the resolution of classification problems; the trained programs are called “classifiers” and they are trained to classify patients according to their diagnosis.
Trends in publications highlighting machine learning approaches to diagnosing rheumatic diseases
Using Bibliography BOT (BIBOT), a software written in Python 2.7 language built to automatically identify and interpret important words in large numbers of abstracts, for a quick review of the literature, we have identified articles about the use of machine learning approaches to predict the diagnosis of rheumatic diseases (Figure 1). There has been an explosion in the number of articles referring to the automatic learning of the diagnosis of rheumatic diseases and since 2011, at least two articles are published every year on this topic.
An automatic extraction of keywords present in these abstracts and an analysis of their distribution reveals that the most studied pathology through machine learning approaches is osteoarthritis, followed by rheumatoid arthritis (Figure 2). It is not surprising to note that these two pathologies are highly studied among rheumatic diseases, as X-rays or ultrasound images are used for diagnosis of these conditions, and machine learning has been a huge success in image processing and classification for over 10 years.
The machine learning techniques used to diagnose osteoarthritis are based on the analysis of radiographic images (mainly of the knee) and learning to differentiate patients with osteoarthritis from people in good health. The classifier can predict an early diagnosis of osteoarthritis.
Mention is also made of the use of machine learning to diagnose rheumatoid arthritis. The approaches developed to diagnose this pathology also rely on image processing, with the detection of synovial regions and their gradation. The most common machine learning approach to this kind of image processing is the use of a particular neuron network architecture: convolutional neuron networks.
Limitations of machine learning in disease diagnosis
The use of machine learning in the biomedical context, however, has limitations. These are as follows.
- The performance of a machine learning program is very dependent on the dataset on which it was trained. If it contains too few examples in view of the complexity of the phenomenon studied, there is a risk of over-fitting.
- If the dataset contains skewed observations, measurement errors, or too many artifacts, learning will also result in an unsatisfactory solution.
- There is a major problem in the context of biomedical decision-making. The steps of the reasoning of the AI must be clearly exposed to the clinician, and in an intelligible way, but it is common for these algorithms to include “black boxes”; particularly complex information processing steps that make very little sense to a human. In other words the algorithm can sometimes learn complex rules which gives it great precision, but whose meaning is very hard to understand, even for a domain expert.
As we enter the era of “Big Data”, the use of machine learning is becoming more widespread and the performances achieved by this approach are becoming more and more spectacular. Medicine is not immune to this trend and many applications of these machine learning technologies are used in this field, including diagnosis prediction. The performance of these techniques depends largely on the volume of data available. The large amounts of information available in rheumatology explain the interest in these technologies within this field.
The use of machine learning currently has limitations due to the "black box" phenomenon and its high sensitivity to the quality of the training dataset. In the near future, we can hope to see new approaches to reduce the dependence of the performance of these programs on the quality of training data, for example by the use of knowledge external to these datasets via natural language processing approaches.
About the authors