Wed, Jun 10 05:58 PM

??????? #benchmarks

3 headlines

TechnologyHacker News• 4h ago

Coding Agent Memory Benchmarks

Something I’m finding while testing SWE-context-bench for the agent memory layer I’m building: evaluating memory is harder than checking whether the agent solved the next task with fewer tokens. The setup: An agent solves a coding task. Later, it gets a related task that should benefit from the...

TechnologyHacker News• 5h ago

State of AI coding spend: benchmarks from 800 developers and $2.3M of usage

1 points, 0 comments on Hacker News

TechnologyHacker News• 23h ago

Why AI code optimization needs production-grounded benchmarks

1 points, 0 comments on Hacker News