Analysis of Possible Applications of AI in Code Refactoring
This report examines the performance of four modern AI-based coding agents in modernising existing applications. The aim is to upgrade the use of outdated dependencies, Java 8 and Tomcat to current versions. The test coverage of the application served as the basis for evaluation in order to reveal regression errors. Junie from JetBrains (Claude Sonnet 3.7) and Sweep AI (automatic model) in IntelliJ IDEA, GitHub Copilot (GPT-4.1) in agent mode in Visual Studio Code, and Kiro (Claude Sonnet 4) as a standalone AI-based development environment were used.
The assistants were tasked to autonomously perform specific modernisation tasks. This revealed significant differences in planning depth, success rate, error rate and reliability. Kiro proved to be a particularly structured assistant, systematically planning tasks and creating extensive documentation of the planned changes, but suffered from implementation gaps. GitHub Copilot demonstrated autonomous capabilities in agent mode, but still suffered from teething problems. Junie and Sweep were able to perform tasks directly in the IDE, but were subject to limitations when implementing more complex updates. All agents tested suffered from hallucinations, especially regarding version numbers. Similarly, a regression in a dependency update could not be recognised as such despite a failed test. The report documents differences in the architecture, LLM usage and IDE integration of the tools. The results show that modern coding agents can be valuable aids in refactoring and modernisation, but require clear human control and test-based validation.