Topic: Optimizing the Performance of Open-Source Large Language Models (LLMs) for specific Software Engineering tasks through Fine-Tuning with Proprietary Software Development data
About Myself

I am Md. Mahade Hasan, currently working as a Doctoral Researcher at the GPT-Lab, Tampere University. In August 2024, I started my doctoral training under the supervision of Professor Pekka Abrahamsson. My research focuses mainly on fine-tuning open-source Large Language Models (LLMs) for specific Software Engineering tasks. Prior to starting my doctoral studies, I earned a master’s degree in Computing Science, specializing in Data Science, from Tampere University. During the final year of my master’s studies, I worked as a Trainee Software Engineer (ML) at NOKIA and earned some valuable industry experience. In 2017, I started my career as a Lecturer in the Department of CSE at Daffodil International University, Dhaka and continued till September 2021. In this role, I was actively involved in both teaching and research, contributing to the academic development of students and advancing knowledge in the field of computer science.
Outside of work and studies, I enjoy reading books, playing games like cricket, badminton, and spending time outdoors. Also, I like to explore new technologies and stay updated with the latest trends in AI. Additionally, I enjoy traveling and learning about different cultures whenever I get the opportunity.
Research
What I am doing?
My research focuses on the Large Language Models (LLMs) and how they can be integrated in different Software Engineering (SE) tasks. However, these base variant LLMs often fall short in performance when dealing with private data. They fail to capture the context, as a result it can not provide good results. Here, I will mainly investigate how can we improve open-source LLMs performance with fine-tuning using proprietary data for company-specific use cases. The main goal is to modify the model in a way so that it can understand the company’s way of working and provide accurate results.
This research will explore the currently available open-source LLMs and compare their performance in different SE tasks to find out the trade offs between model size and performance. Besides, It will show the effects fine-tuning in the model generated outputs. Lastly, in this work, task specific benchmarking techniques will be provided to evaluate the performance of the models.
Why this is important?
In today’s evolving technological environment, SE stands at the center of innovative solutions. But it’s efficiency is challenged in many different SE tasks. The current practices in SE manually tackles these issues, which can be inefficient and prone to error. In the recent past the emergence of LLMs has shown promising results on automating various SE tasks. Despite having the capabilities , software companies often find it difficult to use LLMs into their company-specific processes. These difficulties need to be addressed soon to use the LLMs full potential in SE tasks.
Integrating proprietary LLMs in company-specific SE tasks raises data privacy and security concerns, because they send all the data to their own server to process. As a result, research on alternative approaches like using open-source LLMs is necessary. LLMs often come in different sizes. Huge models such as 70B parameter models bring performance with heavy computational and environmental costs, which is not sustainable for the environment. So, investigating the capabilities of smaller models is necessary. Another challenge to using these models is that they are all trained with generalized training, which introduces performance degradation, hallucination, and bias in company-specific tasks. Further measures such as fine-tuning methodologies need to be explored for performance improvements. Finally, task-specific evaluation procedures are necessary to increase the confidence of the end users.
How I planned to do so?
In this research, I’m using a combination of quantitative experiments and qualitative analysis to evaluate how effectively open-source LLMs can perform software engineering tasks. First, I’ll dive into the existing literature on LLMs, fine-tuning, data preprocessing, and benchmarking to set a foundation. Then, I’ll design experiments to test different-sized models across various tasks like code generation and testing, using metrics like accuracy and resource use. I’ll also work on optimizing data preprocessing and representation techniques to enhance the models’ training. Through fine-tuning and prompt engineering, I’ll tailor these models for specific tasks to see how much we can improve their performance. Lastly, I’ll develop benchmarking methods to measure the reliability and effectiveness of these models, ensuring that users can trust the results and feel confident in deploying them in real-world scenarios.
Timeline

Expected Contribution:
This research is expected to optimize the performance of open-source LLMs for software engineering tasks through fine-tuning with proprietary data. The expected contributions of this research are as follows:
- A guideline for open-source LLMs selection based on specific Software Engineering (SE) tasks.
- Detailed instructions for effective data preprocessing and representation techniques for fine-tuning.
- Efficient use of Prompt Engineering and Fine-Tuning Techniques.
- Development of task-specific benchmarking frameworks.
- Guidance for Industrial Application of LLMs in Software Engineering.
- Contribution to the Academic and AI Research Community.