Blog

Traditional coding assistants generate code. Kepler goes further — it evaluates every change against task-specific criteria before delivery. Here's how we built EFA.

When you're building an autonomous development engine, the hardest problem isn't generating code — it's knowing when the code is good enough. Traditional approaches rely on post-hoc testing, but by then the code already exists. We wanted to do better.

The core insight

Every task has success criteria. "Fix the login bug" means the login works. "Add a README" means the README exists and is accurate. Rather than generating code first and checking later, we decided to flip the problem: define what good looks like first, then generate toward it.

How EFA works

Evaluation-First Attention (EFA) has three stages:

Criteria extraction — Parse the task description into measurable success criteria
Scoring — As code is generated, continuously score it against the criteria
Self-correction — If scores plateau below threshold, backtrack and try a different approach

// Simplified EFA loop
while (score < threshold && attempts < max_attempts) {
  const plan = await planner.create(criteria);
  const code = await executor.generate(plan);
  const newScore = await evaluator.score(code, criteria);
  if (newScore > currentScore) {
    currentScore = newScore;
    bestCode = code;
  }
}
return bestCode;

What we learned

The hardest part isn't the evaluation — it's knowing what to evaluate. A task like "fix the login bug" might have hidden requirements: preserve user data, don't break other flows, handle edge cases. We've found that explicit criteria + implicit "don't break existing behavior" is a good starting point.

The second insight is that scoring needs to be fast. If evaluation takes as long as generation, you've halved your throughput. We invest heavily in lightweight heuristics that catch 90% of issues in milliseconds.

Results

Since launching EFA, we've seen a 40% reduction in "needs revision" PRs and a 2x improvement in first-attempt success for complex multi-file changes. The key insight: explicitly defining success before starting is far more effective than trying to fix failures after the fact.

When you submit a task to Kepler, it doesn't just start typing. There's a carefully orchestrated sequence of analysis, planning, execution, and quality assurance. We call it Apex.

The 12 phases

1. Understand — Parse task, clarify ambiguities
2. Explore — Clone repo, analyze structure
3. Plan — Create detailed specification
4. Implement — Generate code
5. Review — Self-code-review
6. Test — Run existing tests
7. Fix — Address failures
8. QA — Quality checks
9. Final QA — Last sanity check
10. Commit — Create branch
11. PR — Open pull request
12. Notify — Let you know it's done

Each phase can fail, retry, or escalate. The key innovation: phase 8 (QA) runs the same checks a senior engineer would run — lint, type check, security scan, test coverage — before the code ever reaches you.

Building Evaluation-First Attention: how Kepler scores every change

The core insight

How EFA works

What we learned

Results

Introducing Apex: the 12-phase autonomous pipeline

The 12 phases

Topics we cover