PRISM — Divergent Paths, Unified Solutions

Overview

TL;DR

We present Prism, a multi-solution reasoning and synthesis framework for repository-level issue resolution. Existing agents face two critical bottlenecks: limited solution diversity caused by LLM mode collapse and misleading self-reflection, and a lack of synthesis capabilities that prevents leveraging complementary strengths across candidate patches. Prism adopts a coarse-to-fine paradigm across three stages — Global Exploration, Local Exploration, and Solution Synthesis — to systematically generate, refine, and integrate diverse repair solutions. With Claude 4.6 Sonnet, Prism achieves 82.0% Pass@1 on SWE-bench Verified, setting a new state-of-the-art. Further analyses confirm that the synthesis stage uniquely resolves 15 issues that no isolated candidate could address alone.

Figure 1 · Illustration of existing agent limitations using Django Issue #33413 (ForeignKey db_collation inheritance failure). Live-SWE-Agent correctly fixes related.py; LingXi correctly fixes schema.py; neither alone produces a complete patch — only synthesis resolves the issue.

Motivation

Limitations of Existing Work

⚠️

Limitation 1 — Lack of Solution Diversity & Misleading Self-Reflection

RLHF-fine-tuned models exhibit mode collapse, producing semantically homogeneous solutions even under high-temperature sampling. Simultaneously, erroneous self-reflection feedback causes agents (e.g., Trae, JoyCode) to abandon correct reasoning paths and converge prematurely on incorrect, repetitive patches.

🔗

Limitation 2 — Lack of Solution Synthesis Capabilities

Existing frameworks evaluate candidates in isolation via ranking or voting, discarding complementary partial solutions. Complex issues (e.g., Django #33413) require coordinated multi-file modifications — no single agent covers all critical dimensions, yet no mechanism exists to synthesize their complementary strengths.

Methodology

Main Approach

A coarse-to-fine pipeline: Global Exploration → Local Exploration → Solution Synthesis → Patch Generation.

Figure 2 · The workflow of Prism: Global Exploration generates N_G semantically diverse solutions via contrastive constraints; Local Exploration uses A*-inspired branching to produce N_L sub-solutions per global solution; Solution Synthesis employs Reviewer + Judge agents for cross-review and semantic synthesis; Patch Generation selects the final patch via majority voting.

Global Exploration Contrastive Constraints

Led by a plan agent, this stage iteratively generates N_G semantically diverse fixing solutions. Each solution is decomposed into a four-tuple G = ⟨R, E, M, C⟩: Root Cause, Expected Behavior, Modification Strategy, and Relevant Context. When generating the k-th solution, all prior tuples are injected as contrastive semantic constraints, explicitly requiring divergence across all four dimensions.

P_cons ← enforce (R_k ≠ R_i) ∧ (E_k ≠ E_i) ∧ (M_k ≠ M_i) ∧ (C_k ≠ C_i) // for all G_i ∈ H (historical solution set)

Local Exploration A* Branching

Reusing accumulated context, this stage diversifies the modification strategy dimension (M). The execution trajectory T = {s₀, …, s_L} is analyzed to find the optimal branching point — a critical state with sufficient context but no committed strategy.

f(s_i) = g(s_i) + h(s_i)
g(s_i) = 1 − i/L // Context Deficiency Cost
h(s_i) = Rank(s_i)/L // Solution Convergence Cost (LLM-ranked)
s* = argmin f(s_i) // Optimal branching point

Solution Synthesis Multi-Agent

Reviewer agents perform cross-review along two dimensions: Completeness (missing fix locations?) and Regression Risk (unintended side effects?). The Judge agent then performs (1) Defect Verification, (2) Complementarity Analysis across ⟨R, E, M, C⟩, and (3) Semantic Synthesis — rewriting retained strengths into a unified repair plan.

Patch Generation Majority Voting

All candidates from the three stages are consolidated into a unified candidate pool. Each undergoes static and dynamic verification. A majority voting mechanism selects the single final patch to be submitted.

Experiments

Main Results

Evaluated on SWE-bench Verified and SWE-bench Live Lite. Metric: Pass@1 (% Resolved).

SWE-bench Verified

Method	Backbone Model	Pass@1 (%)
SWE-Agent	GPT-4o	38.0
OpenHands	Claude-3.5-Sonnet	53.0
Trae	Claude-3.7-Sonnet	68.0
JoyCode	Claude-3.7-Sonnet	70.5
Mini-SWE-Agent	DeepSeek-V3.2-Reasoner	70.0
Prism	DeepSeek-V3.2-Reasoner	80.0
Prism	Claude 4.6 Sonnet	82.0

SWE-bench Live Lite

Method	Backbone Model	Pass@1 (%)
SWE-Agent (SOTA baseline)	Claude-4.5-Sonnet	36.0
Prism	DeepSeek-V3.2-Reasoner	42.3%
Prism	Claude 4.6 Sonnet	45.6%

★ Prism outperforms all baselines on SWE-bench Verified. Live Lite results pending.

Ablation & Analysis

Analysis

Stage-wise ablation and component analysis using DeepSeek-V3.2-Reasoner on SWE-bench Verified.

RQ1

Overall Performance:How effective is Prism in repositorylevel issue resolution compared to SOTA baseline agents

On the SWE-Bench Verified benchmark, Prism establishes new SOTA results, achieving 82.0% Pass@1 when powered by Claude 4.6 Sonnet and 80.0% with the open-source DeepSeek-V3.2-Reasoner. Critically, on the dynamically updated SWE-Bench Live Lite dataset specifically constructed to reduce data leakage, Prism (utilizing DeepSeek-V3.2-Reasoner) outperforms the leading closed-source SOTA combination (SWE-Agent + Claude-4.5-Sonnet, 36.0%), attaining xx.x%. These findings confirm that Prism’s enhanced performance in repository-level issue resolution originates from its agentic architecture, rather than from memorization or overfitting to benchmark data.

RQ2

Effectiveness of Exploration Mechanism:How do the Global and Local Exploration mechanisms contribute to Prism’s issue resolution rate?

The progressive exploration mechanism of Prism demonstrates a significant advantage over random sampling. With the same candidate solution pool size (N=12), Prism achieves a Pass@1 of 80.0%, compared to only 71.8% achieved by the high-temperature sampling baseline. Evaluating the resolution rates of the global and local exploration stages individually reveals that Prism effectively navigates the solution space to generate high-quality patches, rather than merely relying on an increased number of candidate solutions.

RQ3

Effectiveness of Synthesis Mechanism: Can Solution Synthesis mechanism effectively combine the advantages of different candidate solutions to improve Prism’s issue resolution rate?

The solution synthesis stage is critical to enhancing Prism’s resolution rate. With only three integrated candidate solutions, Prism achieves a Pass@1 of 79.2%, surpassing both the global exploration stage (74.8% with 3 solutions) and the local exploration phase (75.0% with 6 solutions). The UpSet plot further validates the unique contribution of the solution synthesis stage, revealing 15 issues that only the integrated solutions could resolve. Furthermore, retaining all candidate solutions across all three phases (12 in total) enables Prism to maximize its theoretical upper bound (Pass@N=84.6%) and, through majority voting, attain a SOTA Pass@1 performance of 80.0%.

Summary

Contributions

Effective Issue Resolution Framework

Prism introduces a multi-path reasoning and synthesis framework with three stages — global exploration, local exploration, and solution synthesis — that systematically expands the solution space to produce high-quality patches.

Thorough Evaluation

Evaluated on SWE-bench Verified and SWE-bench Live Lite with both DeepSeek-V3.2-Reasoner and Claude 4.6 Sonnet, achieving SOTA performance against all existing issue resolution agents.

Data Availability

A full reproduction package — including ground-truth datasets, tools, and raw experimental data — is publicly available to support further research and reproducibility.