Having worked on mobile infra for many years now for a couple very large iOS teams, excited to learn more and kudos for putting yourselves out there.
1. Integration tests are notoriously slow, the demo seemed to take some time to do basic actions; is it even possible to run these at scale?
2. >Flaky UI tests suck; they can be flaky but it's often due to bad code and architecture. Any data to backup your tool makes the tests less flaky? I could see a scenario where there are 2 buttons with the same text, but under the hood we'd use different identifiers in-code to determine which button should be tapped in UI.
Overall I'm a bit skeptical because most UI tests are pretty easy to write today with very natural DSLs that are close to natural language, but definitely want to follow and hear more production use cases.
Great questions.
1. Yes, running tests in parallel helps. We also cache actions so subsequent runs are much faster (this is disabled in the demo).
2. I agree that testing can be much more reliable and pleasant in some codebases than others. I have not been blessed with these types of codebases in my career. Flakiness is from personal experience automating UI tests specifically and having them break when a new nondeterministic popup modal is added or another engineer breaks an identifier/locator strategy.
That being said, if you like writing UI tests and your codebase supports easily maintaining them, there are some really cool DSLs like Maestro!
> We also cache actions so subsequent runs are much faster
Interesting, what do you cache? How do you know if 1 change needs to be rerun versus another?
>Flakiness is from personal experience automating UI tests specifically and having them break when a new nondeterministic popup modal is added or another engineer breaks an identifier/locator strategy
A modal popping up isn't a flake though, it's often when a button is on screen but the test runner can't seem to find it due to run-loop issues or emulator/simulator issues. If a modal pops up on the screen in a test, how does CamelQA resolve this and how would it know if it's an actual regression or not? If a modal pops up on a screen at the wrong time that _could_ be a real regression, versus a developer forgetting to configure some local state.
1. The AI agent writes an automation script (similar to Appium) that we can replay after the first successful run. If there are issues the AI agent gets pulled back into the loop.
2. You can define acceptance criteria in natural language with camel.
The issue with this approach is that for all but the most simple apps it is not possible to deduce the runtime element information needed to write traditional UI tests given just the source code. This can only be done reliably at runtime which is what we do. We run your app and iteratively build UI tests that can be reused later.
Overall I'm a bit skeptical because most UI tests are pretty easy to write today with very natural DSLs that are close to natural language, but definitely want to follow and hear more production use cases.