TL;DR: Coding agents are incredible – majestic and powerful – but in complex enterprise-scale production they can still wreak havoc. Hud’s Runtime Code Sensor streams live production data into their context, so the code they generate runs safely in the real world. Here’s our story.
Hello world
May and Roee here. We’re excited to share what we’ve been building with our incredible team.
We’re software engineers. We love the craft of building software. And we genuinely believe coding agents are the biggest shift to that craft in decades.
Software is complicated
Coding agents are amazing. They were trained on massive bodies of code and there are things they’re really awesome at, especially when it comes to building new things from scratch.
But when it comes to changing or adding things to existing systems that operate at scale – they struggle. And how could they not? They are lacking half the picture – how the code behaves in production – where issues are sometimes caused by bad code, sometimes by some failing 3rd party, but most often by some frustratingly intricate combination of the two. There’s a gap between coding agents and reality. Software is complicated.
This is where Hud comes in – we are the bridge between coding agents and reality.
We built a Runtime Code Sensor that installs in minutes and runs alongside your code in production, continuously understanding how each function behaves in real life. Most of the time it stays quiet, sending only a lightweight stream of behavioral data. But when something goes wrong – an error, latency spike, or unexpected behavior – it captures deep forensic context straight from production and gives AI everything it needs to understand and fix the issue.
This is not observability as we know it. It’s a whole new paradigm of ubiquitous visibility into how code actually behaves in reality – a new way to understand and improve code in production. It is production intelligence for AI coding agents.
Why today’s tools fall short
A new software stack is emerging. Yesterday’s development and observability tools weren’t designed for AI; the era of agentic code generation calls for a new way of thinking. Just like AI has the code right there to reason over, it needs production data right there as well.
Because real production environments aren’t neat or predictable – they’re a zoo. Third-party dependencies misbehave, queues back up, databases stall, and tiny timing issues cascade into real problems.
How can agents be expected to perform without production awareness? Without visibility into how the entire codebase behaves in production, right now, right here, at their artificial fingertips? How can they be expected to write robust code without this data?
Our origin and realization
We started brainstorming Hud in early 2023, immediately after the LLM revolution began. We looked at how software is built and what its future looks like, and got curious about how, despite amazing observability tools, most engineers operate without knowing how their code behaves in production. We believed that for engineers to write high quality code in scalable systems, they must know how the code behaves in production – and so we asked the simple question: why don’t they?
Our realization was that while in theory you could put logs and traces everywhere automatically, this would lead to 3 problems:
- It will slow production down;
- It will be very expensive;
- Agents aren’t built for consuming such mountains of data as they code.
Simply put, the contemporary observability tech stack is not built for ubiquity. And even when you look at the systems that do exist, each one only sees a narrow slice of reality. You have to either tell it what you want and then wait, or have on-demand access that slows production down.
What we built
Part of the challenge comes from the fact that code in production has multiple layers, some of them are:
- A business layer: e.g. endpoints, queue consumers
- A code layer: the functions themselves
- External dependencies: e.g., databases, IO, other services
Some systems know about the business layer (e.g., APMs), some know about the code layer (e.g., loggers or error trackers), and some about 3rd parties. Sadly oftentimes each part’s data on its own is not only insufficient, but could be misleading, noisy, or a waste of time. To make matters worse, the code that engineers and coding agents see in their IDEs or in Github is not the same code that runs in production, especially in dynamic languages. Transpilation, V8 optimizations – correlating between something in production and something in the IDE at scale is pretty difficult.
We realized that to create the next generation of tools and systems for the future of high quality code, we need to think from first principles and fundamentally differently. No logs or spans or traces, but a Runtime Code Sensor that sees everything, is very sophisticated on the edge, and has negligible footprint.
To achieve that, we assembled a team of low-level cybersecurity researchers and engineers, bringing a wealth of experience in operating system and runtime reverse engineering, to build something completely different:
- An SDK sensor that runs with the code and constantly understands its behavior
- The SDK sends mostly aggregated statistical data – i.e., a tiny fraction of what logs would send, improving both performance, egress and cost
- When something goes wrong (e.g. endpoint error, queue slowdown), the sensor gathers all the forensic context for the specific incident needed to understand why the issue happened and propose a fix, e.g. input parameters, the code flow and exception data, 3rd party parameters such as DB query information, specific pod metrics and details, and more.
- Because all the metric and forensic data is tied automatically to the function level, it can be naturally used by both engineers and AI coding agents in their work.
How we went to market
Hud took a while to build. We had our ups and downs in proving that our approach really works. Then, in late Q3 2025, a couple of things clicked together. First, a few capabilities in our sensor reached a level of maturity that delivered truly amazing results, especially around performance forensics and queue forensics. Second, many engineering orgs experienced a somber realization that followed the mass adoption of Cursor. Apparently it’s really difficult to adopt agentic coding into complex systems. How do you govern its output? How do you trust it to not break your production?
And so – they started looking for ways to move faster with AI code generation, while maintaining high quality and stability.
We could productize our technology in multiple ways (which we tested together with a few great design partners). The dilemma was whether we should build a product that focuses on fixing things that go wrong, or a product that focuses on building new things with higher levels of confidence. The underlying technology does both either way, but a product needs focus.
The market decided for us.
We got tons of interest in a basic offering:
Detect errors and performance degradations in production, with the context needed to fix them with AI.
The cool thing with Hud, is that Hud is all you need to achieve that.
We started selling this new offering, and it looks amazing so far. We are now excited to open the gates and invite more customers in.
Where this is going
“Prediction is very difficult, especially if it is about the future.” – Neils Bohr
We don’t know what the future holds. But there’s a couple of things we feel strongly about. First, that AI code generation will be much, much bigger in the future. Second, that a new stack will emerge to power this revolution – and we are excited to play a key role in it. LFG!