Kicking the Tyres on Harbor for Agent Evals

After cobbling together my own eval for Claude, I was interested to discover harbor. It’s described as: A framework for evaluating and optimizing agents and models in container environments. Which …

2026
2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009