by Ben AI
Jump straight to: How to Use Tests & Evals
Learn how to use Claude's new Skills 2.0 with built-in evals and AB testing to create more reliable, optimized skills and automations. This tutorial shows you how to test, iterate, and improve your skills through structured evaluation and comparison.
Recognize that Skills 2.0 includes built-in evals (testing) and AB testing capabilities for automatic performance analysis
Skills 2.0 adds folders with Eval viewer agents, scripts for benchmarking, and report generation to automatically test and score skill performance
Build a basic skill using a structured prompt format with clear goals, triggers, connectors, reference files, and step-by-step process
Include progressive updates so the skill can learn from user feedback and automatically update itself
Prompt Claude to run tests on your skill and specify what criteria to optimize for (speed, style, word count, etc.)
Define 1-2 specific optimization goals and testing criteria rather than running generic tests
Review the structured report showing test variations, performance scores, and outputs to identify areas for improvement
Use the feedback section to provide specific guidance on what to change based on test results
Tell Claude to update the skill based on test results and your feedback, then rerun tests to verify improvements
Repeat this iteration loop 2-3 times to achieve optimal performance
Run AB tests to compare different versions of your skill or test against non-skill approaches
Use AB tests to optimize for speed, token usage, or output quality once you have a functional skill
Use AB tests to determine which reference files or context improve your skill's output quality
Test variations with and without specific reference files to find the optimal balance