How to track automated "performance"-type tests over time?
I'm pretty familiar with automated tests where you're comparing a received value to an expected value (e.g. basically all unit/integration tests) --- in a CI/CD workflow, you handle test failures by failing the whole pipeline, and then that commit/PR/etc has a pipeline that failed next to it.
However, what if I have some kind of "performance" measure I want to track, instead? Something that isn't pass/fail, but rather a set of experimental results over time? (e.g. speed of responses from an API, wins/draw/loss rates on chess bot, confusion matrix scores for a classifier, etc.) Is there a tool that can show that kind of "automated experiment" results in order by git commit, pull request, etc?
I thought about sending the data to some kind of data store with a Grafana front-end, but I was hoping there might be some less "diy" method for creating such a display.
I'm pretty interested in this too. I've thought about it in the past, and I think get stuck where you're asking (the post processing and visualizing bit).
I'd thought of having GitHub actions for the measurement, stashing the results as artifacts, then having another workflow that processes the results. Obviously pretty DIY so I'm curious if others have solutions.