Literature Mining with Large Language Models to Assist the Development of Sustainable Building Materials (Papers Track)

Yifei Duan (Massachusetts Institute of Technology); Yixi Tian (Massachusetts Institute of Technology); Soumya Ghosh (IBM Research); Richard Goodwin (IBM T.J. Watson Research Center); Vineeth Venugopal (Massachusetts Institute of Technology); Jeremy Gregory (Massachusetts Institute of Technology); Jie Chen (IBM Research); Elsa Olivetti (Massachusetts Institute of Technology)

Paper PDF Poster File Cite
Natural Language Processing Buildings Chemistry & Materials


Concrete industry, as one of the significant sources of carbon emissions, drives the urgency for its decarbonization that requires a shift to alternative materials. However, the absence of systematic knowledge summary remains a challenge for further development of sustainable building materials. This work offers a cost-efficient strategy for information extraction tasks in complex terminology settings using small (2.8B) large language models (LLMs) with well-designed instruction-completion schemes and fine-tuning strategies, introducing a dataset cataloging civil engineering applications of alternative materials. The Multiple Choice instruction scheme significantly improves model accuracies in entity inference from non-Noun-Phrase sources, with supervised fine-tuning benefiting from straightforward tokenized representations of choices. We also demonstrate the utility of the dataset by extracting valuable insights into promising applications of alternative materials from knowledge graph representations.