Senior Software Engineer Runtime Internals

About the Role

We’re building a runtime code sensor that operates where most tools don’t: inside running applications. Our goal is to give engineers and AI agents real-time, high-fidelity visibility into how code actually behaves in production – under real load, real traffic, and real failures.

This role blends deep systems engineering with applied research. You’ll dive into runtime internals, explore undocumented behavior, design low-overhead instrumentation, and turn research insights into production-grade components that safely run in customer environments. One day you might analyze GC or JIT behavior; the next, design tracing mechanisms that survive real-world distributed systems.

If you get excited about runtime internals, enjoy breaking (and fixing) complex systems, think like both a researcher and a production engineer, and care deeply about performance, safety, and correctness – you’ll feel right at home here. This is a hands-on, high-impact role for engineers who want to ship technology that engineers actually trust to run in their most critical services.

Hard Skills / Experience

5+ years of hands-on research or development roles.
Deep expertise in at least one runtime (Node.js, Python, or Java/JVM), including understanding of internals (event loop, GC, tracing hooks, bytecode/JIT, etc.).
Hands-on experience building in-process production components (SDKs, agents, profilers, monitoring/security tools) that must be safe, stable, and backward-compatible.
Strong performance engineering skills – profiling CPU/memory, avoiding overhead, understanding how instrumentation affects runtime behavior.
Defensive engineering mindset – experience designing systems that fail-open, degrade gracefully, protect the host application, and never introduce instability.
Track record debugging production issues (latency, memory leaks, regressions, deadlocks) in real-world distributed systems.
Solid understanding of modern backend architectures – experience with microservices, distributed systems, async and event-driven patterns, containers/orchestration (Docker/K8s), cloud runtimes, and the performance or reliability challenges they introduce.
Proven ability to ship stable, resilient, maintainable systems in production.

Engineering Excellence / Mindset

Ability to anticipate technical risks, identify bottlenecks, and drive long-term engineering improvements.
Takes ownership of code quality, documentation, reliability, and observability.
Comfortable working with product teams to balance technical trade-offs with user and business needs.
Autonomous and proactive; capable of mentoring others or leading technical initiatives.

Bonus Points

Background in security agents, observability tools, or other components deployed directly into customer environments.
Experience with APM agents, JVM agents, Python tracing, V8 internals, or other instrumentation/profiling frameworks.
Experience with telemetry systems (metrics, tracing, logging) including batching, rate-limiting, and safe data collection.
Familiarity with sampling techniques, bytecode manipulation, eBPF, or low-overhead tracing.
Exposure to safety-critical or high-throughput environments where reliability and minimal overhead are mandatory.
Contributions to open-source instrumentation, tracing, or internals-related projects.

Requirements

This is a full-time on-site position located in Tel Aviv.
Ability to thrive in a dynamic, fast-paced startup environment is essential.

Apply for this position

jobs@hud.io