SmileyLlama: Modifying Large Language Models for Directed Chemical Space Exploration
๐ Abstract
The article demonstrates that a Large Language Model (LLM) can serve as a foundation model for a Chemical Language Model (CLM) that performs at or above the level of CLMs trained solely on chemical SMILES string data. Using supervised fine-tuning (SFT) and direct preference optimization (DPO) on the open-source Llama LLM, the authors show that an LLM can be trained to generate molecules with properties of interest for drug development. This framework allows an LLM to function as a CLM that can generate molecules with user-specified properties, rather than just a chatbot client for chemistry and materials tasks.
๐ Q&A
[01] Supervised Fine-Tuning
1. What is the procedure used for fine-tuning Llama? The authors constructed a fine-tuning dataset with SMILES strings of approximately 2 million molecules from the ChEMBL Dataset v28. For each molecule, they randomly picked a number of pharmaceutical properties to calculate using RDKit, such as ranges of hydrogen-bond donors, hydrogen-bond acceptors, molecular weight, logP, number of rotatable bonds, fraction of carbons, presence/absence of macrocycles, covalent warhead-related SMARTS patterns, undesirable SMARTS patterns, BRICS single-pass substructure, and chemical formula. They then constructed a prompt containing values of these properties, with the "correct" completion being the SMILES string that these properties were calculated from, to train the model to generate molecules with the specified properties.
2. How does the fine-tuned model, SmileyLlama, perform compared to Llama with zero, one, or few-shot prompts, and other state-of-the-art CLMs? The authors find that SFT significantly improves SmileyLlama's ability to generate drug-like molecules compared to Llama with no fine-tuning or few-shot prompts. SmileyLlama performs on par with or better than other state-of-the-art CLMs in terms of validity, uniqueness, and novelty of generated molecules.
[02] Direct Preference Optimization
1. How does the authors use DPO to improve SmileyLlama's ability to generate molecules with specified properties? The authors use DPO to pair up molecules that correctly followed the prompt with those that didn't, and then use a single epoch of DPO to improve the model's results. They also use DPO to add new capabilities to SmileyLlama, such as generating molecules with high QED, GSK3B, JNK3, and DRD2 scores as assessed by machine learning models.
2. How does the DPO-optimized model, SmileyLlama-Opt, perform compared to the SFT-only SmileyLlama model? The authors find that SmileyLlama-Opt significantly improves results on the benchmark tasks compared to the SFT-only SmileyLlama model, albeit with a higher optimal temperature.
[03] Optimizing for Target Affinities and Implicit Multi-Objective Optimization
1. How does the SmileyLlama model perform when optimized for generating molecules with high scores for multiple objectives (QED, GSK3B, JNK3, DRD2)? The authors find that the SmileyLlama model, when optimized using DPO for the individual objectives, is able to combine its knowledge to generate molecules that score well on multiple objectives, even though it was not explicitly trained on that task.
2. What are the implications of the model's ability to perform well on multi-objective tasks without explicit training? The authors suggest that this property could enable the development of a "foundation model" for molecular generation that can be efficiently fine-tuned or optimized for a wide range of specific objectives, rather than requiring separate models for each task.