How ZoomInfo Identified and Eliminated 4am OOM Crashes with AI

Every night at 4am, a scheduled cron job inside one of ZoomInfo’s services saturated the event loop. ELU hit 100% and memory spiked, which led to occasional pod crashes.

The team naturally wanted higher stability. What made it difficult wasn’t the existence of the ELU spike. It was the lack of visibility into what was causing it. The spike occurred consistently, but existing tools gave sparse, disparate data that required tons of work by the senior engineers on the team to figure out.

ZoomInfo didn’t need another metric. They needed the root cause. Adding Hud to the service reviewed exactly what was going on, using function-level visibility and forensic data straight from that 4am slowdown. The team was quickly able to figure out the issue was massive invocations of a dependency lookup, and quickly fix it with Claude Code to achieve:

How ZoomInfo Identified and Eliminated 4am OOM Crashes with AI

The Challenge: A Too-Heavy Cron Job With No Clear Cause

The nightly batch job processed dependencies and associations across repositories. The job ran massive analysis, but that’s just another day in the office.

The situation was clear: something in the code’s behavior was causing ELU overload that led to OOM pod crashes.

What wasn’t clear was why the job behaved this way. Where exactly in the code was it happening and under what conditions.

Without function-level runtime visibility, identifying the true hot path meant adding logs, redeploying, and iterating.

In short, they sought visibility that matched the sophistication of the system itself.

“Investigating performance spikes can be really frustrating because it’s time consuming to catch the specific underlying issue. Hud is great in that it gets you the forensic data straight from production so you can quickly understand and solve the problem with AI.”

Guy Levin
Guy Levin
VP Productivity

Why Hud

Hud automatically surfaced the ELU spike and exposed the runtime behavior of the cron job without requiring instrumentation or configuration.

Within minutes, the hot path became obvious:

  • Surprising billions of synchronous dependency lookups blocking the event loop
  • An unnecessary N+M pattern,  multiplying invocation volume

Instead of inferring from a multitude of systems, the team could just see the code hot path directly from production data.

The root cause was no longer a question, the function-level production behavior was in front of their eyes.

From Root Cause to Resolution

With the bottleneck clearly identified, ZoomInfo optimized the dependency resolution logic driving the spike, using Claude to help implement the fix quickly.

The results were immediate and measurable: Dependency lookup invocations dropped by 98%, peak memory fell by 62%, and OOM-related crashes were eliminated.

The ELU spike did not disappear entirely, it was expected for a CPU-intensive batch job, but it was now understood and controlled.

The Results

Operational Stability

The cron job no longer destabilized pods under load.

Performance Efficiency

Hot-path invocations were reduced by 98%, dramatically lowering unnecessary computation.

Engineering Confidence

Root cause analysis shifted from research and adding logs, to direct visibility of runtime behavior.

“Before Hud, we had to continuously add logs and investigate the issue over days. With Hud, the root cause was quickly staring at our faces and the resolution with AI was immediate.”

Guy Levin
Guy Levin
VP Productivity

Conclusion

For ZoomInfo, the goal wasn’t to eliminate a metric spike. It was to identify the root cause of a CPU-intensive cron job, that caused OOMs.

With Hud, the root cause became immediately clear. The hot path was dealt with, memory pressure dropped, and OOM-related failures were eliminated, all grounded in real production runtime data.

“We knew the crashes were caused by an ELU spike, but the question was what caused that spike. Using Hud, we found the answer in minutes with an AI-ready fix.” Guy Levin, VP Productivity.”

Guy Levin
Guy Levin
VP Productivity

Have questions?

Book a custom introduction to our learning platform.

Website Design & Development InCreativeWeb.com