About The Role
Hud is looking for an AI Systems Analyst to improve how our AI agents investigate and reason about production incidents.
This role sits at the intersection of analysis, AI behavior, and production systems. You will work on improving the quality, accuracy, and reliability of Hud’s AI workflows by analyzing agent outputs, building evaluation datasets, improving prompts and skills, and validating AI-generated root cause analyses and remediation suggestions.
This is a hands-on role for someone who enjoys going deep into complex technical problems, learning new domains quickly, and systematically improving AI systems through testing, evaluation, and iteration. You will work with real production incidents, real user feedback, and evolving agentic workflows to help Hud’s AI become faster, sharper, and more reliable.
What You’ll Do
-
Analyze how Hud’s AI agents investigate production issues and identify weak, incomplete, or incorrect analyses
-
Improve the core prompt and skills behind Hud MCP to increase analysis quality, reduce error rates, improve runtime efficiency, and optimize token usage
-
Adapt agent behavior to ongoing ecosystem changes, including how the MCP interacts with sub-agents and evolving AI workflows
-
Write and refine advanced skills for complex investigations such as duration spikes, CPU spikes, and other production anomalies
-
Test agent behavior across models, prompts, and use cases to improve consistency and quality
-
Write and optimize prompts for Hud’s auto-remediation workflows
-
Deep dive into real production incidents to validate AI-generated root cause analyses and remediation suggestions
-
Build and maintain evaluation datasets based on real incidents for both the MCP and auto-remediation workflows
-
Tune benchmark scoring to improve stability and make progress measurable over time
-
Analyze product usage and user feedback to identify opportunities to improve agent behavior
-
Help design and optimize additional agentic workflows such as blast radius analysis and post-deployment analysis
-
Optionally create new use cases from scratch by triggering issues in open-source projects or internal environments
Excellence Mindset
-
Highly analytical, detail-oriented, and rigorous about quality
-
Curious and persistent when investigating complex technical behavior
-
Comfortable working in ambiguity and building structure where none exists yet
-
Strong sense of ownership and drive to improve systems end-to-end
-
Excited by iteration, experimentation, and continuous improvement
-
Able to balance technical depth with practical product impact
-
Thrives in fast-paced startup environments with evolving priorities
Bonus Points
-
Experience with observability, debugging, incident response, or production investigations
-
Background in data analysis, data engineering, SRE, AI QA, or similar technical-analytical roles
-
Experience evaluating AI systems in a structured way, including benchmarks, datasets, or scoring methodologies
-
Familiarity with modern software environments including microservices, databases, APIs, and networking
-
Experience in Node.js environments
-
Experience designing or testing agentic workflows
-
Background in highly analytical operational or intelligence roles, including candidates early in their career with unusually strong technical and investigative capabilities
Hard Skills / Experience
-
3+ years of experience in a technical or analytical role
-
Strong analytical and problem-solving skills
-
High proficiency in SQL and comfort working with complex data models
-
Medium proficiency in scripting with Python or JavaScript
-
Experience working with LLMs, AI tools, or AI-powered products
-
Strong understanding of production systems and common production issues
-
Ability to evaluate complex technical outputs and identify weak reasoning, missing context, or incorrect conclusions
-
Ability to learn new technical domains quickly and become effective with minimal ramp-up