Prism

A Multi-Solution Reasoning and Synthesis Framework for
Repository-Level Issue Resolution

82.0%
SWE-bench Verified
(Claude 4.6 Sonnet)
80.0%
SWE-bench Verified
(DeepSeek-V3.2-Reasoner)
Overview

TL;DR

We present Prism, a multi-solution reasoning and synthesis framework for repository-level issue resolution. Existing agents face two critical bottlenecks: limited solution diversity caused by LLM mode collapse and misleading self-reflection, and a lack of synthesis capabilities that prevents leveraging complementary strengths across candidate patches. Prism adopts a coarse-to-fine paradigm across three stages — Global Exploration, Local Exploration, and Solution Synthesis — to systematically generate, refine, and integrate diverse repair solutions. With Claude 4.6 Sonnet, Prism achieves 82.0% Pass@1 on SWE-bench Verified, setting a new state-of-the-art. Further analyses confirm that the synthesis stage uniquely resolves 15 issues that no isolated candidate could address alone.

Figure 1 - Motivation Example
Figure 1 · Illustration of existing agent limitations using Django Issue #33413 (ForeignKey db_collation inheritance failure). Live-SWE-Agent correctly fixes related.py; LingXi correctly fixes schema.py; neither alone produces a complete patch — only synthesis resolves the issue.
Motivation

Limitations of Existing Work

⚠️

Limitation 1 — Lack of Solution Diversity & Misleading Self-Reflection

RLHF-fine-tuned models exhibit mode collapse, producing semantically homogeneous solutions even under high-temperature sampling. Simultaneously, erroneous self-reflection feedback causes agents (e.g., Trae, JoyCode) to abandon correct reasoning paths and converge prematurely on incorrect, repetitive patches.

🔗

Limitation 2 — Lack of Solution Synthesis Capabilities

Existing frameworks evaluate candidates in isolation via ranking or voting, discarding complementary partial solutions. Complex issues (e.g., Django #33413) require coordinated multi-file modifications — no single agent covers all critical dimensions, yet no mechanism exists to synthesize their complementary strengths.

Methodology

Main Approach

A coarse-to-fine pipeline: Global Exploration → Local Exploration → Solution Synthesis → Patch Generation.

Figure 2 - Pipeline Architecture
Figure 2 · The workflow of Prism: Global Exploration generates NG semantically diverse solutions via contrastive constraints; Local Exploration uses A*-inspired branching to produce NL sub-solutions per global solution; Solution Synthesis employs Reviewer + Judge agents for cross-review and semantic synthesis; Patch Generation selects the final patch via majority voting.
01

Global Exploration Contrastive Constraints

Led by a plan agent, this stage iteratively generates NG semantically diverse fixing solutions. Each solution is decomposed into a four-tuple G = ⟨R, E, M, C⟩: Root Cause, Expected Behavior, Modification Strategy, and Relevant Context. When generating the k-th solution, all prior tuples are injected as contrastive semantic constraints, explicitly requiring divergence across all four dimensions.

Pcons ← enforce (Rk ≠ Ri) ∧ (Ek ≠ Ei) ∧ (Mk ≠ Mi) ∧ (Ck ≠ Ci) // for all Gi ∈ H (historical solution set)
02

Local Exploration A* Branching

Reusing accumulated context, this stage diversifies the modification strategy dimension (M). The execution trajectory T = {s0, …, sL} is analyzed to find the optimal branching point — a critical state with sufficient context but no committed strategy.

f(si) = g(si) + h(si)
g(si) = 1 − i/L // Context Deficiency Cost
h(si) = Rank(si)/L // Solution Convergence Cost (LLM-ranked)
s* = argmin f(si) // Optimal branching point
03

Solution Synthesis Multi-Agent

Reviewer agents perform cross-review along two dimensions: Completeness (missing fix locations?) and Regression Risk (unintended side effects?). The Judge agent then performs (1) Defect Verification, (2) Complementarity Analysis across ⟨R, E, M, C⟩, and (3) Semantic Synthesis — rewriting retained strengths into a unified repair plan.

04

Patch Generation Majority Voting

All candidates from the three stages are consolidated into a unified candidate pool. Each undergoes static and dynamic verification. A majority voting mechanism selects the single final patch to be submitted.

Experiments

Main Results

Evaluated on SWE-bench Verified and SWE-bench Live Lite. Metric: Pass@1 (% Resolved).

SWE-bench Verified
MethodBackbone ModelPass@1 (%)
SWE-AgentGPT-4o38.0
OpenHandsClaude-3.5-Sonnet53.0
TraeClaude-3.7-Sonnet68.0
JoyCodeClaude-3.7-Sonnet70.5
Mini-SWE-AgentDeepSeek-V3.2-Reasoner70.0
PrismDeepSeek-V3.2-Reasoner80.0
PrismClaude 4.6 Sonnet82.0
SWE-bench Live Lite
MethodBackbone ModelPass@1 (%)
SWE-Agent (SOTA baseline)Claude-4.5-Sonnet36.0
PrismDeepSeek-V3.2-Reasoner42.3%
PrismClaude 4.6 Sonnet45.6%
★ Prism outperforms all baselines on SWE-bench Verified. Live Lite results pending.
Ablation & Analysis

Analysis

Stage-wise ablation and component analysis using DeepSeek-V3.2-Reasoner on SWE-bench Verified.

RQ1

Overall Performance:How effective is Prism in repositorylevel issue resolution compared to SOTA baseline agents

On the SWE-Bench Verified benchmark, Prism establishes new SOTA results, achieving 82.0% Pass@1 when powered by Claude 4.6 Sonnet and 80.0% with the open-source DeepSeek-V3.2-Reasoner. Critically, on the dynamically updated SWE-Bench Live Lite dataset specifically constructed to reduce data leakage, Prism (utilizing DeepSeek-V3.2-Reasoner) outperforms the leading closed-source SOTA combination (SWE-Agent + Claude-4.5-Sonnet, 36.0%), attaining xx.x%. These findings confirm that Prism’s enhanced performance in repository-level issue resolution originates from its agentic architecture, rather than from memorization or overfitting to benchmark data.

RQ2

Effectiveness of Exploration Mechanism:How do the Global and Local Exploration mechanisms contribute to Prism’s issue resolution rate?

The progressive exploration mechanism of Prism demonstrates a significant advantage over random sampling. With the same candidate solution pool size (N=12), Prism achieves a Pass@1 of 80.0%, compared to only 71.8% achieved by the high-temperature sampling baseline. Evaluating the resolution rates of the global and local exploration stages individually reveals that Prism effectively navigates the solution space to generate high-quality patches, rather than merely relying on an increased number of candidate solutions.

Figure 2 - Prism Pipeline
RQ3

Effectiveness of Synthesis Mechanism: Can Solution Synthesis mechanism effectively combine the advantages of different candidate solutions to improve Prism’s issue resolution rate?

The solution synthesis stage is critical to enhancing Prism’s resolution rate. With only three integrated candidate solutions, Prism achieves a Pass@1 of 79.2%, surpassing both the global exploration stage (74.8% with 3 solutions) and the local exploration phase (75.0% with 6 solutions). The UpSet plot further validates the unique contribution of the solution synthesis stage, revealing 15 issues that only the integrated solutions could resolve. Furthermore, retaining all candidate solutions across all three phases (12 in total) enables Prism to maximize its theoretical upper bound (Pass@N=84.6%) and, through majority voting, attain a SOTA Pass@1 performance of 80.0%.

Figure 2 - Prism Pipeline
Summary

Contributions

01

Effective Issue Resolution Framework

Prism introduces a multi-path reasoning and synthesis framework with three stages — global exploration, local exploration, and solution synthesis — that systematically expands the solution space to produce high-quality patches.

02

Thorough Evaluation

Evaluated on SWE-bench Verified and SWE-bench Live Lite with both DeepSeek-V3.2-Reasoner and Claude 4.6 Sonnet, achieving SOTA performance against all existing issue resolution agents.

03

Data Availability

A full reproduction package — including ground-truth datasets, tools, and raw experimental data — is publicly available to support further research and reproducibility.