Assured LLM-Based Software Engineering

robustness
Assured LLMSE uses semantic filters to improve code with Large Language Models independently.
Author

Nadia Alshahwan, Mark Harman, Inna Harper, Alexandru Marginean, Shubho Sengupta, Eddy Wang

Published

February 6, 2024

Summary:

  • The paper discusses the use of Large Language Models (LLMs) to improve code independently of human intervention, while ensuring that the improved code does not regress the properties of the original code and improves the original in a verifiable and measurable way.
  • Assured LLM-Based Software Engineering is proposed as a generate-and-test approach, inspired by Genetic Improvement, to apply semantic filters that discard code that fails to meet guarantees.
  • The paper outlines the content of a keynote by Mark Harman at the International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering.

Major Findings:

  1. Assured LLM-Based Software Engineering is proposed as a generate-and-test approach, inspired by Genetic Improvement, to apply semantic filters that discard code that fails to meet guarantees.
  2. The distinction between online and offline LLMSE is discussed, with online LLMSE requiring real-time results and offline LLMSE allowing for the computation of verifiable and measurable assurances.
  3. Assured LLMSE is compared to Genetic Improvement, with LLMs used as the operator for generating candidate solutions and filters playing a similar role to fitness functions for generate-and-test approaches to GI.

Analysis and Critique:

  • The paper provides a comprehensive overview of the proposed Assured LLM-Based Software Engineering, but it lacks empirical evidence or case studies to support the effectiveness of the approach.
  • The distinction between online and offline LLMSE is well-explained, but the limitations and challenges of each approach are not thoroughly discussed.
  • The open research problems outlined at the end of the paper provide valuable insights into future directions for research, but the paper could benefit from more in-depth discussions on potential solutions to these problems.

Appendix

Model gpt-3.5-turbo-1106
Date Generated 2024-02-26
Abstract https://arxiv.org/abs/2402.04380v1
HTML https://browse.arxiv.org/html/2402.04380v1
Truncated False
Word Count 5765