Accelerating Functional Test Automation with GPT-Engineer: A QA Game-Changer?

Photo of Łukasz Kosiorowski

Łukasz Kosiorowski

Updated Nov 8, 2024 • 10 min read
Two  colleages discussing ideas using a tablet and computer-1

If you want to do software development right, test automation has to be the default for quality and risk reduction in production.

Traditional automation has served us well, but it comes with challenges: time consuming setup, deep technical expertise and ongoing maintenance. GPT-Engineer is a new tool that can change how we create functional automated tests.

This post will show how GPT-Engineer can help QA engineers and developers automate functional tests faster.

We’ll go into its strengths, weaknesses and why you should care about this.

Why GPT-Engineer for automated tests?

GPT-Engineer brings a new approach to AI test automation tools, generates test scripts with minimal input, providing a ready to use codebase for your testing needs, and handles tasks across the entire codebase based on user prompts.

At its core, the tool leverages Generative Pre-trained Transformers (GPT) to create functional test scripts based on user input. This capability is particularly appealing for teams looking to streamline the testing process and reduce the time and effort needed to write comprehensive tests. The tool can generate a complete codebase, handle various testing tasks, and learn from interactions to improve its output over time.

  • Streamlined test case creation: GPT-Engineer can analyze natural language input and generate human-like test cases, including edge cases, based on user stories. This eliminates the need for manual test case creation, saving time and reducing the likelihood of human error.
  • Improved test script writing: GPT-Engineer can assist in writing scripts that cover a wide range of scenarios, ensuring comprehensive test coverage. This is particularly useful when dealing with complex software products with multiple features and functionalities.
  • Enhanced collaboration: By interacting with ChatGPT, non-technical stakeholders can participate in the testing process, making it more inclusive and efficient. Testers can also analyze ChatGPT’s responses to identify areas for improvement, enhancing the system’s accuracy and relevance over time.
  • Real-time test status updates: GPT-Engineer can provide continuous integration and automated testing, offering real-time updates on test execution status, reducing the need for manual intervention and improving overall testing efficiency.
  • Automation of manual tasks: GPT-Engineer can automate manual test case creation and test results generation, freeing up resources for more strategic testing activities.

But how does GPT-Engineer compare to traditional test automation?

Traditional Test Automation vs. GPT-Engineer

GPT-Engineer has many advantages over traditional automated testing tools, including eliminating the need to write code manually.

GPT-Engineer uses advanced GPT models to code for you, and acts as an AI assistant for software developers. As part of the setup process, you need to create a file called ‘prompt’ in your project directory so the program runs correctly.

Setting up an API key is a critical step in configuring GPT-Engineer, so you have secure and fast access to its features. It may produce logical inconsistencies and doesn’t understand business logic fully, especially in deeper test flows.

So human intervention is still required for fine tuning.

Traditional Automation

GPT-Engineer Automation

Time-consuming and tedious

Rapid code generation

Prone to human error

Reduced risk of manual mistakes

Requires deep technical expertise

Lower entry barrier, even for beginners

Continuous updates required

Automatically adapts to changes, but still needs refinement

Real world examples: Testing web apps with GPT-Engineer

We tested GPT-Engineer with various test cases using Playwright test automation framework (for UI testing) and Supertest (for API testing) frameworks.

1. UI Testing: Playwright

We initiated our testing by using Playwright, a popular framework for browser automation, to create automated tests for a web application.

Results:

  • Basic UI flows: GPT-Engineer successfully automated the basic login flow and other fundamental interactions, generating a complete project structure that included Page Objects, helper functions, and test scripts. It even incorporated necessary utility items like a .gitignore file and scripts in package.json.
  • Complex user flows: While GPT-Engineer performed well with straightforward flows, it faced challenges when automating more complex user scenarios, such as verifying dynamic content or handling conditional navigation paths.
  • Refactoring needs: Some of the tests failed on the first run due to logical inconsistencies, requiring manual adjustments to align the test cases with specific business logic.

2. API Testing: Supertest

Next, we evaluated GPT-Engineer’s capabilities in API testing using Supertest, which is well-suited for testing REST APIs.

Results:

  • Basic API tests: GPT-Engineer generated structured API test scripts with clear assertions and hooks, demanding minimal manual intervention for adaptation.
  • Complex API scenarios: It effectively handled more complex API workflows, including token-based authentication and chained API calls, demonstrating robust performance in these areas.
  • Adaptation requirements: Despite its strengths, developers still needed to fine-tune the generated code to match specific project requirements, particularly concerning customized authentication flows.

Overall, these are my thoughts:

1. Basic UI Tests: GPT-Engineer handled basic flows, such as logging in, quite well. It explored the app independently and tried to automate flows beyond the login, generating acceptable project structure, including page objects, helpers (like data generators), and tests. It also added helpful items like .gitignore and scripts to run the tests in package.json.
While GPT-Engineer was able to automate the flow and cover all the logic from the test cases, some refactoring is still needed. The tests didn't pass on the first run, and it struggled with automating more complex user flows and deeper business logic.

2. API Testing: The results were even better. GPT-Engineer generated API test scripts with well structured assertions and hooks, required minimal manual intervention to adapt the generated code to the specific requirements. But developers still need to adapt the new code to their project needs.

3. Pre-Configured Projects: When asked to build tests in existing projects that were already configured with Playwright, GPT-Engineer struggled to integrate properly with the existing codebase. It failed to utilize the pre-configured Playwright components, leading to the creation of redundant or conflicting code.

Instead of leveraging existing setup and configurations, it introduced its structure, which caused inconsistencies and required heavy refactoring. This resulted in additional effort to align the generated tests with the established architecture and tools.

Each test showed that while GPT-Engineer is a great tool, we still need to find a balance between human oversight and machine efficiency.

If you want to try this out on your own, go to: https://github.com/gpt-engineer-org/gpt-engineer

Troubleshooting Common Issues

Common issues you may encounter include:

  • Inconsistent File Structures: GPT-Engineer sometimes generates files in a non-standard format, necessitating refactoring for consistency.
  • Outdated Libraries: Be vigilant about updating any old libraries or frameworks that GPT-Engineer may utilize.
  • Logical Errors in Test Flows: Pay attention to oversights in the logic of generated tests, especially in more complex scenarios.

The future of automated testing in software development with GPT-Engineer

For high level engineers and QA professionals, integrating tools like GPT-Engineer is exciting. It can generate tests and execute them fast, accurately and with minimal errors.

GPT-Engineer can translate project descriptions into a codebase, no manual coding required, and speed up development.

But GPT-Engineer is not a silver bullet. It still requires manual intervention to make sure tests are comprehensive, accurate and aligned with business goals and to implement improvements as needed.

To get the most out of GPT-Engineer in software development:

  • Define a test automation strategy. The more details you provide the better.
  • Be prepared to refactor. Don’t expect perfection from the first run—GPT-Engineer can generate good starting points but you’ll need to polish the results.
  • Stay up to date. GPT-Engineer uses outdated libraries or frameworks, so you need to verify and update dependencies regularly.
  • Set up a virtual environment. Make sure all dependencies are installed and the environment is activated to avoid conflicts and to work smoothly.

Is GPT-Engineer for you?

GPT-Engineer offers a new way to generate automated testing solutions. Instead of writing code from scratch, GPT-Engineer lets you describe your project needs and it will generate the code for you. It’s a great tool for software developers who are familiar with basic programming and testing concepts but want to speed up their workflow.

Not fully hands off yet, but worth it.

The future is here. AI and humans. Now.

Photo of Łukasz Kosiorowski

More posts by this author

Łukasz Kosiorowski

Senior QA Engineer at Netguru
Lost with AI?  Get the most important news weekly, straight to your inbox, curated by our CEO  Subscribe to AI'm Informed

Read more on our Blog

Check out the knowledge base collected and distilled by experienced professionals.

We're Netguru

At Netguru we specialize in designing, building, shipping and scaling beautiful, usable products with blazing-fast efficiency.

Let's talk business