Breast Cancer Diagnosis Triggers Machine Learning Mission

Regina Barzilay gets riled up when data collects dust. It got personal for the MIT professor when she was diagnosed with breast cancer, and was struck by how much uncertainty patients faced around prognosis and treatment—while mountains of data from other patients sat idle in images and reports.

Barzilay received a 2017 MacArthur Fellowship—sometimes called the “genius grant”—and is the Delta Electronics Professor of Electrical Engineering and Computer Science in MIT’s Computer Science and Artificial Laboratory. She specializes in natural language processing and machine learning—she knows how to turn algorithms loose on huge troves of data to gain insight.

What she discovered in her own case, and that of other women, was that treatment decisions are based on data from clinical trials that enroll only 3% of the approximately 1.7 million people diagnosed with cancer in the United States every year (figures from CancerLinQ). This throws up a formidable roadblock to the effective use of big data to advance diagnosis and treatment protocols.

“I decided I was going to change this,” Barzilay said, in a Washington Post video broadcast. Working with doctors from Massachusetts General Hospital, which treated her, and her students, she is overturning the traditional method where people encode treatment data by hand as the basis for studies. Instead, Barzilay and her team translated the information into a database, to enable things like automatically identifying the women whose cancer returned after five years. The database currently has 160,000 pathology reports spanning three decades, and is being used in studies that look into cancer progression and the development of a tumor marker TPS (Tissue Polypeptide-Specific Antigen).

The research is an extension of Barzilay’s work in natural language processing, which enables machines to search, summarize, and interpret textual documents. She teaches a popular Introduction to Machine Learning course at MIT; more than 700 students enrolled this past Spring. She gets her point across—she was awarded the Jamieson Prize for Excellence in Teaching, awarded annually to two MIT Sloan faculty members, for her contributions to machine learning and natural language processing.

Barzilay has also set her sights on mammograms. She noted that the images are read by a person who is limited in how much data they can detect, and summarized into brief text. “A lot of information is really lost here. Machines are really good at reading millions of these mammograms, and answering questions which humans cannot answer today—like how to do personalized risk assessment of breast cancer, given the specific tissue of the woman. You can train the machine to take this image and say, ‘What is the likelihood in five years that this woman may develop cancer?’”

The machine can also assess if a patient like herself, on medication, responds to the treatment or not.

“Instead of using deep learning to recognize different types of cats, we are trying to use it to learn about the development of cancer,” she said with a wry smile.

John Martin

John Martin writes about technology, business, science, and general-interest topics. A former U.S. correspondent for The Economist (Science & Technology), he writes for the private sector, universities, and media, and can be reached at