Catalog

32 tools found

Agent Evaluation

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on re...

claude-code

3.0 0

Testing

Skill

AI Product Development

Every product will be AI-powered. The question is whether you'll build it right or ship a demo that falls apart in production. This skill covers LLM integration patterns, RAG architecture, prompt ...

Elite AI context engineering specialist mastering dynamic context management, vector databases, knowledge graphs, and intelligent memory systems. Orchestrates context across multi-agent workflows, enterprise AI systems, and long-running projects with 2024/2025 best practices. Use PROACTIVELY for complex AI orchestration.

Context Optimization Techniques

Apply compaction, masking, and caching strategies

Langfuse

Expert in Langfuse - the open-source LLM observability platform. Covers tracing, prompt management, evaluation, datasets, and integration with LangChain, LlamaIndex, and OpenAI. Essential for debug...

🤖 LLM Application Patterns

Production-ready patterns for building LLM applications. Covers RAG pipelines, agent architectures, prompt IDEs, and LLMOps monitoring. Use when designing AI applications, implementing RAG, buildin...

ML Pipeline Workflow

Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipelines, implementing MLOps practices, or automating mod...

Python Performance Optimization

Profile and optimize Python code using cProfile, memory profilers, and performance best practices. Use when debugging slow Python code, optimizing bottlenecks, or improving application performance.

AI Engineer Pro

Autonomously designs and implements production-ready AI systems including RAG pipelines, agent architectures, and MLOps workflows.

Test Results Analyzer

Autonomously analyzes test execution data, generates comprehensive quality metrics, and provides actionable insights for improving test coverage and reliability.

claude-sonnetclaude