Recently, MIT researchers used machine learning—computer algorithms that automatically learn and get better over time—to predict that a number of COVID-19 vaccines currently in development may not adequately protect people of Black and Asian descent. In a study published in late November, the researchers say their machine learning models predicted the likelihood of vaccines properly stimulating immune responses in different ethnic groups. The research was based on patient data and immune system protein models. People have different “alleles”, which are alternative versions of a gene caused by mutations. These alleles are responsible for matching cell surfaces to viral peptides. Because of this variation, some individuals create cell receptors that are more likely to bind with the virus than others. The computer model indicated that vaccine trials should include a greater diversity of peptides for vaccines to be effective.
“There are obviously many other factors to consider, but our preliminary results suggest that, on average, people of Black or Asian ancestry could have a slightly increased risk of vaccine ineffectiveness,” MIT professor David Gifford, senior author of the research paper, told MIT CSAIL. “Our work shows that clinical trials need to carefully consider ancestry in their study designs to ensure that efficacy is measured across an appropriate population.”
Machine Learning’s Big Weakness: “Underspecification”
Thanks to machine learning, machines are getting better at a number of tasks, including natural language processing and object recognition. However, machines still often make mistakes. For example, machines can misidentify an image due to a slight change in a way that a human wouldn’t do. Google researchers recently spotted a new vulnerability that potentially affects a large number of machine learning applications, which they call “underspecification.”
For example, machine learning models can help predict if and when hospitals will become overwhelmed with COVID-19 patients. Once programmed with the parameters that will determine the course of the viruses’ spread—including rate of infection (RO) and duration (D)—the model can then use data gleaned during the early days of the pandemic to predict its course. There’s just one problem: during the initial phase of the pandemic, RO and D are “underspecified”. There are many pairs of RO and D values that can adequately determine the same early exponential growth. However, they can also result in extremely different predictions. This is because the machine learning model cannot accurately choose among the different pairs.
In the case of the pandemic, underspecification can be avoided by including data such as how long patients are infectious (which impacts D) and patient contacts with other people (which affects RO).
There are many other situations where underspecification can occur in deep learning models that may not be so easy to solve. This include clinical diagnoses that rely on electronic health records, image analysis, and natural language processing.
The Google engineers demonstrated how minute changes, such as altering the random seeds used in the machine learning training process, can lead a model to dramatically different conclusions.
To prevent underspecification, it needs to be understood early on what can go wrong. According to the team, one way to identify this is through a design “stress test”. It would demonstrate the model’s abilities using observational data and identifying possible problems. However, this comes with its own dilemma. In order for the test to be effective, you need to understand all the ways the model can make mistakes.
“Designing stress tests that are well-matched to applied requirements, and that provide good ‘coverage’ of potential failure modes is a major challenge,” the team told Discover Magazine.
While machine learning has the potential to change the world, this research demonstrates the need to have a more complete understanding of the technology before it can be fully trusted.
Machine learning is a vital aspect of artificial intelligence (AI). Because machine learning allows AI systems to learn from experiences without needing explicit programming, it’s key for the future of AI technology.
Check out these new courses on machine learning, available on the IEEE Learning Network today.
- Machine Learning in the Age of Enterprise Big Data
- Machine Learning in a Data-Driven Business Environment
Ray, Tiernan. (2 December 2020). MIT machine learning models find gaps in coverage by Moderna, Pfizer, other Warp Speed COVID-19 vaccines. ZD Net.
Conner-Simons, Adam. (2 December 2020). MIT study: Covid-19 vaccines may be less effective for Asian Americans. CSAIL.
The Physics arXiv Blog. (30 November 2020). Google Reveals Major Hidden Weakness In Machine Learning. Discover Magazine.