Sunday, April 5, 2026
CORAL's multi-agent evolution framework surges to 36 upvotes as autonomous AI-for-AI research gains momentum; Steerable Visual Representations hits 40 upvotes; Claude-distilled Qwen and Gemma-4 continue model chart domination
Executive Summary
April 5th's landscape is defined by surging momentum on papers that emerged earlier this week, plus notable new arrivals. CORAL (MIT/NUS, 36 upvotes — up from 14 yesterday) has clearly struck a nerve with its framework for autonomous multi-agent evolution on open-ended problems, replacing fixed heuristics with long-running agents that reflect, collaborate, and maintain shared persistent memory. Its companion paper ASI-Evolve (17 upvotes) closes the AI-for-AI research loop with a learn-design-experiment-analyze cycle that outperforms GPT-5 baselines. Together they represent the strongest signal yet that self-improving agent systems are moving from theory to implementation.
Steerable Visual Representations (40 upvotes, the day's most-engaged paper) continues its upward trajectory, confirming that the ability to redirect frozen ViT features toward arbitrary visual concepts is a genuinely significant result. "Therefore I am. I Think" (20 upvotes) remains the most intellectually provocative result on the board, with its evidence that reasoning models encode decisions before generating chain-of-thought tokens. The supply chain disruption forecasting paper introduces foresight learning for calibrated probabilistic forecasts that beat GPT-5, an increasingly rare benchmark claim.
The model landscape shows continued consolidation: Google's Gemma-4 family now has multiple GGUF quantizations circulating (Unsloth's 26B-A4B GGUF at 301k downloads), Jackrong's Claude-4.6-Opus-distilled Qwen3.5 variants have crossed 524k and 241k downloads respectively, and the uncensored Qwen3.5-9B variant leads all models at 715k downloads. New entrants include Hcompany's Holo3-35B-A3B (a multimodal agent MoE model) and Facebook's TRIBEv2. GitHub trends show oh-my-codex and openscreen maintaining explosive growth, while Block's Goose agent (935 stars/day) emerges as a significant new player in the extensible AI agent space.
Researcher Notes
CORAL's surge from 14 to 36 upvotes in 24 hours is the most important signal today. The framework's key innovation — replacing hard-coded exploration rules with agents that autonomously evolve strategies through reflection and shared persistent memory — directly addresses the brittleness of current agent frameworks. The paper's emphasis on "open-ended discovery" rather than benchmark optimization suggests a maturation of the agent research community's ambitions. Combined with ASI-Evolve's AI-for-AI research loops (now at 17 upvotes), there's a clear community consensus forming around self-improving agent systems as the next frontier.
The "Therefore I am. I Think" paper deserves continued attention for its methodological implications. Linear probes decoding tool-calling decisions from pre-generation activations with high confidence — before a single reasoning token is produced — has profound consequences for CoT-based alignment. If the chain-of-thought is post-hoc rationalization, then monitoring CoT for safety may be fundamentally insufficient. The paper's 20 upvotes (steady from yesterday) suggest the community is still processing these implications.
Steerable Visual Representations at 40 upvotes is now the highest-engaged paper of the week. The practical implications are significant: retrieval, classification, and segmentation systems can now be dynamically redirected without retraining. This is especially relevant as multimodal LLMs continue to lose spatial fidelity when processing visual inputs through language. A purely visual steering mechanism that preserves spatial information fills a genuine gap.
The trending models reveal a maturing distillation ecosystem. Jackrong's Claude-4.6-Opus distillations have crossed half a million downloads, while the uncensored variant leads at 715k. This isn't a novelty effect — it's a sustained production-grade adoption pattern. Hcompany's Holo3-35B-A3B is interesting as a multimodal agent-focused MoE model, suggesting that the agent paradigm is starting to influence model architecture design, not just prompting strategies. Netflix's VOID model gaining 310 likes without a paper demonstrates that applied video AI from major tech companies generates its own gravity.
GitHub trends tell a story of AI tooling maturation. oh-my-codex (15.7k stars, 1,789/day) and Block's Goose (35.7k stars, 935/day) represent two approaches to extensible AI agents: one extends an existing coding agent, the other builds from scratch with multi-LLM support. The continued growth of onyx (24.3k stars, 1,197/day) as an open-source AI chat platform suggests that self-hosted AI infrastructure is becoming a serious category. The emergence of imbue-ai/mngr as a CLI for managing agents is a small but telling signal — agent orchestration is becoming a first-class developer concern.
Themes & Trends
Autonomous Multi-Agent Evolution and Self-Improving AI
risingCORAL and ASI-Evolve together represent the strongest signal yet that self-improving agent systems are moving from theory to implementation. CORAL's surge from 14 to 36 upvotes confirms this resonates deeply with the research community.
Pre-Decision Encoding and Representation Control
risingEvidence that LLMs encode decisions before CoT (Therefore I Am) and that frozen ViTs can be steered without retraining (Steerable Representations) both suggest current systems have more controllable — and more opaque — internal structure than assumed.
Open-Weight Distillation at Scale
risingClaude-distilled Qwen variants crossing 500k+ downloads, uncensored models at 715k, and multiple GGUF quantizations circulating demonstrate that frontier reasoning distillation is now a production-grade phenomenon, not an experiment.
AI Agent Developer Tooling Ecosystem
risingoh-my-codex, Block's Goose, imbue-ai/mngr, and Microsoft's agent-framework collectively show that AI agent tooling is stratifying: extensions for existing agents, new standalone agents, management CLIs, and enterprise orchestration frameworks.
Adversarial Robustness in Physical AI
stableTex3D's 3D adversarial textures for VLA models represent a genuinely new attack surface for embodied AI. As robotics adoption grows, physical adversarial attacks become increasingly practical concerns.
Trending Papers (13)
Steerable Visual Representations
High RelevanceJona Ruthardt, Manu Gaur, Deva Ramanan, Makarand Tapaswi, Yuki M. Asano — Fundamental AI Lab at UTN, Carnegie Mellon University
Introduces a mechanism to steer pretrained frozen Vision Transformer features toward specific visual concepts (color, texture, shape) without retraining, addressing the limitation that generic ViT features focus on salient cues with no user control over representation focus.
Key Findings
- •
Frozen ViT features can be steered toward arbitrary visual concepts without retraining or fine-tuning
- •
Steered representations outperform both generic ViT and text-prompted multimodal LLM representations on concept-specific tasks
- •
The approach preserves spatial visual information that language-centric multimodal representations lose
CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery
High RelevanceAo Qu, Han Zheng, Zijian Zhou, Yihao Yan, Paul Pu Liang et al. — MIT, National University of Singapore, Carnegie Mellon University
First framework for autonomous multi-agent evolution on open-ended problems, replacing rigid control with long-running agents that explore, reflect, and collaborate through shared persistent memory, asynchronous execution, and heartbeat-based interventions.
Key Findings
- •
Autonomous agents outperform fixed-heuristic baselines on sustained open-ended exploration tasks
- •
Shared persistent memory and asynchronous execution enable emergent collaboration without central coordination
- •
Heartbeat-based interventions provide lightweight oversight without constraining agent autonomy
NearID: Identity Representation Learning via Near-identity Distractors
High RelevanceAleksandar Cvejic, Rameen Abdal, Abdelrahman Eldesokey, Bernard Ghanem, Peter Wonka — KAUST Center of Excellence in Generative AI
Introduces a principled framework for evaluating identity-focused tasks using Near-identity distractors that place semantically similar but distinct instances on identical backgrounds, eliminating contextual shortcuts and isolating identity as the sole discriminative signal.
Key Findings
- •
Existing vision encoders conflate identity with background context in identity-focused tasks
- •
Near-identity distractors eliminate contextual shortcuts and isolate genuine identity representation
- •
The framework enables more reliable evaluation of personalized generation and image editing
Therefore I am. I Think
High RelevanceEsakkivel Esakkiraja, Sai Rajeswar, Denis Akhiyarov, Rajagopal Venkatesaramani — ServiceNow AI
Presents evidence that reasoning models encode tool-calling decisions in pre-generation activations before chain-of-thought begins. Linear probes decode these decisions with high confidence, suggesting CoT may serve as post-hoc rationalization rather than genuine deliberation.
Key Findings
- •
Linear probes decode tool-calling decisions from pre-generation activations with very high confidence
- •
In some cases decisions are fully encoded before a single reasoning token is produced
- •
Chain-of-thought may function as post-hoc rationalization rather than causal reasoning
ASI-Evolve: AI Accelerates AI
High RelevanceWeixian Xu, Tiantian Mi, Yixiu Liu, Yang Nan, Pengfei Liu et al. — Shanghai Jiao Tong University
An agentic framework for AI-for-AI research that closes the research loop through a learn-design-experiment-analyze cycle, substantially outperforming GPT-5 baselines on accuracy, calibration, and precision for forecasting tasks.
Key Findings
- •
End-to-end agentic research cycle automates costly, long-horizon AI research loops
- •
Task-specific adaptation through learn-design-experiment-analyze outperforms general-purpose models
- •
Framework substantially outperforms GPT-5 on accuracy, calibration, and precision
AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation
Zhaohe Liao, Kaixun Jiang, Zhihang Liu, Yujie Wei, Xihui Liu et al. — University of Hong Kong, Alibaba Group
First systematic benchmark for evaluating whether state-of-the-art image generation models can produce ready-to-use academic illustrations, addressing the gap between general image quality and the visual-logical consistency required for scientific figures.
Key Findings
- •
Current image generation models struggle with visual-logical consistency required for academic illustrations
- •
VLM-based evaluation is unreliable for complex academic figures with long text descriptions
- •
A structured evaluation framework reveals systematic failure modes in scientific figure generation
Video Models Reason Early: Exploiting Plan Commitment for Maze Solving
High RelevanceKaleb Newman, Tyler Zhu, Olga Russakovsky — Princeton University
Reveals that video diffusion models commit to a high-level motion plan within the first few denoising steps when solving mazes, after which further denoising alters visual details but not the underlying trajectory — a form of early plan commitment.
Key Findings
- •
Video diffusion models commit to a high-level trajectory plan in the earliest denoising steps
- •
Later denoising steps refine visual appearance without changing the committed plan
- •
Early plan commitment can be exploited to improve maze-solving performance
Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models
High RelevanceJiawei Chen, Simin Huang, Jiawei Du, Shuaihang Chen, Zhaoxia Yin et al. — Anhui University, Chinese Academy of Sciences, National University of Singapore
Demonstrates physically realizable adversarial attacks on vision-language-action models through adversarial 3D textures applied to manipulated objects — a more practical attack surface than prior 2D patch methods for real-world robotic deployments.
Key Findings
- •
3D adversarial textures on manipulated objects transfer effectively to physical robotic settings
- •
VLA models are vulnerable to attacks embedded in the objects they interact with
- •
The 3D attack surface is more physically realistic than prior 2D patch-based approaches
MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios
Zhang Li, Zhibo Lin, Qiang Liu, Yuliang Liu et al. — Huazhong University of Science and Technology
First benchmark for multilingual digital and photographed document parsing, addressing the gap where performant models focus on clean English documents while real-world scenarios involve diverse scripts, low-resource languages, and photographed documents.
Key Findings
- •
No systematic benchmark existed for multilingual digital and photographed document parsing
- •
Models performant on clean English documents degrade significantly on diverse scripts and low-resource languages
- •
Photographed documents introduce additional challenges beyond digital document parsing
Efficient and Principled Scientific Discovery through Bayesian Optimization: A Tutorial
Zhongwei Yu, Rasul Tutunov, Alexandre Max Maraval, Jun Wang et al. — University College London, Huawei Noah's Ark Lab
Comprehensive tutorial presenting Bayesian Optimization as a principled probability-driven framework that formalizes and automates the scientific hypothesis-experiment-refine cycle, aiming to replace ad-hoc experimental design with efficient, systematic optimization.
Key Findings
- •
BO formalizes the traditional scientific cycle into a principled probability-driven framework
- •
The tutorial bridges the gap between BO theory and practical scientific applications
- •
Demonstrates resource savings through systematic experimental design versus intuition-driven approaches
Forecasting Supply Chain Disruptions with Foresight Learning
Benjamin Turtel, Paul Wilczewski, Kris Skotheim — Resilinc
An end-to-end framework that trains LLMs to produce calibrated probabilistic forecasts of supply chain disruptions using realized disruption outcomes as supervision, substantially outperforming GPT-5 on accuracy, calibration, and precision.
Key Findings
- •
Task-specific LLM fine-tuning with disruption supervision outperforms general-purpose models including GPT-5
- •
Calibrated probabilistic forecasts enable actionable supply chain risk management
- •
Foresight learning addresses the challenge of reasoning about infrequent, high-impact events from noisy inputs
Signals: Trajectory Sampling and Triage for Agentic Interactions
Shuguang Chen, Adil Hafeez, Salman Paracha — Amazon
Proposes a lightweight, signal-based framework for triaging agentic interaction trajectories at scale, addressing the challenge that agent trajectories are voluminous, non-deterministic, and prohibitively expensive to review individually.
Key Findings
- •
Agent trajectory review at scale requires lightweight signal-based triage rather than exhaustive review
- •
Signal-based framework enables efficient identification of anomalous or interesting trajectories
- •
The framework is practical for post-deployment improvement of multi-step agentic systems
Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents
Nicholas Edwards, Sebastian Schuster — Saarland University
Systematically evaluates clarification-seeking abilities of LLM coding agents on underspecified tasks, finding that current agents optimized for autonomous execution rarely ask clarifying questions when human developers naturally would.
Key Findings
- •
Current coding agents rarely seek clarification even when instructions are critically underspecified
- •
Agents optimized for autonomous execution miss crucial context that humans would ask about
- •
Uncertainty-aware clarification improves task completion on underspecified SWE-bench variants
Trending Models (12)
Jackrong (Community) · image-text-to-text · 27B
Community-distilled 27B model transferring Claude 4.6 Opus reasoning capabilities into Qwen3.5 architecture. Leading downloads at 524k with 2,291 likes, representing the most successful open-weight reasoning distillation to date.
HauhauCS (Community) · text-generation · 9B
Uncensored 9B Qwen3.5 variant leading all models with 715k downloads and 968 likes. The aggressive uncensoring approach indicates strong demand for unrestricted open-weight models.
Baidu · image-text-to-text · unknown
Baidu's vision-language OCR model based on InternVL architecture for document intelligence. 957 likes and growing downloads indicate strong demand for specialized OCR capabilities.
Google · image-text-to-text · 31B
Google's flagship 31B dense instruction-tuned model from the Gemma-4 family with multimodal image-text-to-text capabilities. Downloads climbing to 287k as the ecosystem matures.
Cohere Labs · automatic-speech-recognition · unknown
Cohere's automatic speech recognition model. 790 likes and 96k downloads signal sustained interest as the audio modality gains attention from major labs.
Mistral AI · text-to-speech · 4B
Mistral's 4B-parameter text-to-speech model. 661 likes on 5k downloads suggests strong community interest outpacing actual deployment, possibly awaiting tooling integration.
Jackrong (Community) · image-text-to-text · 27B (quantized)
GGUF quantization of the Claude-distilled Qwen3.5-27B for llama.cpp deployment. 241k downloads and 502 likes demonstrate strong demand for locally-runnable reasoning models.
Prism ML · text-generation · 8B (1-bit)
1-bit quantized 8B model in GGUF format for extreme edge deployment. 384 likes and 32k downloads reflect growing interest in ultra-efficient inference.
Google · image-text-to-text · 26B (4B active)
Google's 26B MoE model with only 4B active parameters, offering dense-model quality at a fraction of compute cost. Now at 133k downloads, growing steadily.
Netflix · video-inpainting · unknown
Netflix's video inpainting model for physics-aware object removal, the model behind the VOID paper. 310 likes with 0 downloads suggests gated or upcoming release generating anticipatory engagement.
Hcompany · image-text-to-text · 35B (3B active)
New multimodal agent-focused MoE model with 35B parameters and 3B active. Architecture based on Qwen3.5-MoE suggests agent-specific model design is emerging as a distinct category.
Meta/Facebook · unknown · unknown
Facebook's latest research model release. 293 likes and 39k downloads with limited public documentation — Meta continues to release models with minimal fanfare.
Trending GitHub Repos (11)
Extension framework for OpenAI Codex CLI adding hooks, agent teams, HUDs, and more. Sustained explosive growth at 1,789 stars/day (15.7k total) — now the dominant AI coding agent customization platform.
Free, open-source screen recording studio with no subscriptions or watermarks. Continued explosive growth at 1,591 stars/day, now 20k total. The developer demo creation category is real.
Open-source AI chat platform supporting every LLM with advanced features. 1,197 stars/day (24.3k total) confirms self-hosted AI chat infrastructure is becoming a serious category.
OSINT tool for hunting social media accounts by username. Perennial trending repo at 994 stars/day and 79k total stars.
Open-source extensible AI agent from Block that goes beyond code suggestions with install, execute, edit, and test capabilities across any LLM. 935 stars/day at 35.7k total represents a serious Codex alternative.
MLX-based Vision Language Model inference and fine-tuning for Apple Silicon. 343 stars/day (3.6k total) shows Apple ecosystem AI tooling continues to grow.
Simple and fast retrieval-augmented generation framework (EMNLP2025). 263 stars/day at 32k total, showing sustained production interest in RAG tooling.
Telegram Desktop messaging app. 249 stars/day at 30.8k total — likely trending due to a major release or policy-related attention.
Microsoft's framework for building, orchestrating and deploying AI agents and multi-agent workflows in Python and .NET. 72 stars/day at 8.7k total.
CLI for managing AI agents from Imbue. Small but telling signal that agent orchestration is becoming a first-class developer concern.
Apple's official MLX LLM inference package. 28 stars/day at 4.4k total — steady growth as the Apple Silicon ML ecosystem matures.