of Compiler Optimization
๐ Abstract
The paper introduces LLM Compiler, a family of large language models (LLMs) specifically designed for code and compiler optimization tasks. LLM Compiler builds upon the foundational Code Llama model by extending its capabilities to encompass compiler intermediate representations (IRs), assembly language, and optimization techniques. The models have been trained on a vast corpus of 546 billion tokens of LLVM-IR and assembly code, and have undergone instruction fine-tuning to interpret compiler behavior. LLM Compiler is released under a bespoke commercial license to enable wide reuse by both academic researchers and industry practitioners. The paper also presents fine-tuned versions of the model, demonstrating its enhanced capabilities in optimizing code size and disassembling from x86_64 and ARM assembly back into LLVM-IR.
๐ Q&A
[01] Large Language Models (LLMs) for Code and Compiler Optimization
1. What are the key challenges in applying LLMs to code and compiler optimization tasks?
- Training LLMs is resource-intensive, requiring substantial GPU hours and extensive data collection, which can be prohibitive.
- Prior approaches to using machine learning for compiler optimization have used incomplete or specialized representations of the input program, losing some information.
- Publicly available LLMs can make minor tweaks to programs but easily become confused and make mistakes when attempting more substantial optimizations.
2. How does LLM Compiler address these challenges?
- LLM Compiler extends the capabilities of the foundational Code Llama model by pretraining on a vast corpus of 546 billion tokens of LLVM-IR and assembly code, and further fine-tuning on compiler emulation tasks.
- This allows LLM Compiler to have a strong understanding of compiler intermediate representations, assembly language, and optimization techniques.
- LLM Compiler is released under a bespoke commercial license to enable wide reuse and collaboration by both academic researchers and industry practitioners.
[02] LLM Compiler Model Architecture and Training
1. What are the key stages in the training pipeline for LLM Compiler?
- The models are first pretrained on 401 billion tokens of LLVM-IR and assembly code.
- They are then further fine-tuned on 145 billion tokens of compiler emulation data to better understand and emulate compiler behavior.
- LLM Compiler FTD models are then trained on an additional 164 billion tokens of data for two downstream tasks: flag tuning and disassembly.
2. How do the LLM Compiler models differ in size and capabilities?
- LLM Compiler is available in two model sizes: 7 billion and 13 billion parameters.
- The larger 13B model generally outperforms the 7B model on the downstream tasks, but both demonstrate significant improvements over prior LLMs like Code Llama and GPT-4 Turbo.
[03] Evaluation of LLM Compiler
1. How does LLM Compiler perform on the flag tuning task?
- LLM Compiler FTD models are able to generate optimization pass lists that achieve 77% of the optimizing potential of an autotuning search, without requiring any additional compilations.
- This significantly outperforms the performance of Code Llama - Instruct and GPT-4 Turbo on the same task.
2. How does LLM Compiler perform on the disassembly task?
- LLM Compiler FTD models are able to correctly disassemble assembly code back into LLVM-IR 14% of the time in a lossless round-trip.
- This is a substantial improvement over the performance of Code Llama - Instruct and GPT-4 Turbo, which struggle to generate syntactically correct LLVM-IR.
3. How do the LLM Compiler models perform on general software engineering tasks compared to prior LLMs?
- While the additional compiler-focused training causes a slight regression in performance on general Python programming tasks compared to the Code Llama base model, all LLM Compiler variants still significantly outperform the Llama 2 model on these tasks.