Sunday, April 5, 2026

CORAL's multi-agent evolution framework surges to 36 upvotes as autonomous AI-for-AI research gains momentum; Steerable Visual Representations hits 40 upvotes; Claude-distilled Qwen and Gemma-4 continue model chart domination

autonomous-multi-agent-evolutionrepresentation-steeringllm-pre-decision-encodingopen-weight-distillation-scalingai-agent-developer-toolingsupply-chain-ai-forecasting

Executive Summary

April 5th's landscape is defined by surging momentum on papers that emerged earlier this week, plus notable new arrivals. CORAL (MIT/NUS, 36 upvotes — up from 14 yesterday) has clearly struck a nerve with its framework for autonomous multi-agent evolution on open-ended problems, replacing fixed heuristics with long-running agents that reflect, collaborate, and maintain shared persistent memory. Its companion paper ASI-Evolve (17 upvotes) closes the AI-for-AI research loop with a learn-design-experiment-analyze cycle that outperforms GPT-5 baselines. Together they represent the strongest signal yet that self-improving agent systems are moving from theory to implementation.

Steerable Visual Representations (40 upvotes, the day's most-engaged paper) continues its upward trajectory, confirming that the ability to redirect frozen ViT features toward arbitrary visual concepts is a genuinely significant result. "Therefore I am. I Think" (20 upvotes) remains the most intellectually provocative result on the board, with its evidence that reasoning models encode decisions before generating chain-of-thought tokens. The supply chain disruption forecasting paper introduces foresight learning for calibrated probabilistic forecasts that beat GPT-5, an increasingly rare benchmark claim.

The model landscape shows continued consolidation: Google's Gemma-4 family now has multiple GGUF quantizations circulating (Unsloth's 26B-A4B GGUF at 301k downloads), Jackrong's Claude-4.6-Opus-distilled Qwen3.5 variants have crossed 524k and 241k downloads respectively, and the uncensored Qwen3.5-9B variant leads all models at 715k downloads. New entrants include Hcompany's Holo3-35B-A3B (a multimodal agent MoE model) and Facebook's TRIBEv2. GitHub trends show oh-my-codex and openscreen maintaining explosive growth, while Block's Goose agent (935 stars/day) emerges as a significant new player in the extensible AI agent space.

Researcher Notes

CORAL's surge from 14 to 36 upvotes in 24 hours is the most important signal today. The framework's key innovation — replacing hard-coded exploration rules with agents that autonomously evolve strategies through reflection and shared persistent memory — directly addresses the brittleness of current agent frameworks. The paper's emphasis on "open-ended discovery" rather than benchmark optimization suggests a maturation of the agent research community's ambitions. Combined with ASI-Evolve's AI-for-AI research loops (now at 17 upvotes), there's a clear community consensus forming around self-improving agent systems as the next frontier.

The "Therefore I am. I Think" paper deserves continued attention for its methodological implications. Linear probes decoding tool-calling decisions from pre-generation activations with high confidence — before a single reasoning token is produced — has profound consequences for CoT-based alignment. If the chain-of-thought is post-hoc rationalization, then monitoring CoT for safety may be fundamentally insufficient. The paper's 20 upvotes (steady from yesterday) suggest the community is still processing these implications.

Steerable Visual Representations at 40 upvotes is now the highest-engaged paper of the week. The practical implications are significant: retrieval, classification, and segmentation systems can now be dynamically redirected without retraining. This is especially relevant as multimodal LLMs continue to lose spatial fidelity when processing visual inputs through language. A purely visual steering mechanism that preserves spatial information fills a genuine gap.

The trending models reveal a maturing distillation ecosystem. Jackrong's Claude-4.6-Opus distillations have crossed half a million downloads, while the uncensored variant leads at 715k. This isn't a novelty effect — it's a sustained production-grade adoption pattern. Hcompany's Holo3-35B-A3B is interesting as a multimodal agent-focused MoE model, suggesting that the agent paradigm is starting to influence model architecture design, not just prompting strategies. Netflix's VOID model gaining 310 likes without a paper demonstrates that applied video AI from major tech companies generates its own gravity.

GitHub trends tell a story of AI tooling maturation. oh-my-codex (15.7k stars, 1,789/day) and Block's Goose (35.7k stars, 935/day) represent two approaches to extensible AI agents: one extends an existing coding agent, the other builds from scratch with multi-LLM support. The continued growth of onyx (24.3k stars, 1,197/day) as an open-source AI chat platform suggests that self-hosted AI infrastructure is becoming a serious category. The emergence of imbue-ai/mngr as a CLI for managing agents is a small but telling signal — agent orchestration is becoming a first-class developer concern.

Themes & Trends

↑

Autonomous Multi-Agent Evolution and Self-Improving AI

rising

CORAL and ASI-Evolve together represent the strongest signal yet that self-improving agent systems are moving from theory to implementation. CORAL's surge from 14 to 36 upvotes confirms this resonates deeply with the research community.

↑

Pre-Decision Encoding and Representation Control

rising

Evidence that LLMs encode decisions before CoT (Therefore I Am) and that frozen ViTs can be steered without retraining (Steerable Representations) both suggest current systems have more controllable — and more opaque — internal structure than assumed.

↑

Open-Weight Distillation at Scale

rising

Claude-distilled Qwen variants crossing 500k+ downloads, uncensored models at 715k, and multiple GGUF quantizations circulating demonstrate that frontier reasoning distillation is now a production-grade phenomenon, not an experiment.

↑

AI Agent Developer Tooling Ecosystem

rising

oh-my-codex, Block's Goose, imbue-ai/mngr, and Microsoft's agent-framework collectively show that AI agent tooling is stratifying: extensions for existing agents, new standalone agents, management CLIs, and enterprise orchestration frameworks.

→

Adversarial Robustness in Physical AI

stable

Tex3D's 3D adversarial textures for VLA models represent a genuinely new attack surface for embodied AI. As robotics adoption grows, physical adversarial attacks become increasingly practical concerns.

Trending Papers (13)

Steerable Visual Representations

High Relevance

Jona Ruthardt, Manu Gaur, Deva Ramanan, Makarand Tapaswi, Yuki M. Asano — Fundamental AI Lab at UTN, Carnegie Mellon University

Introduces a mechanism to steer pretrained frozen Vision Transformer features toward specific visual concepts (color, texture, shape) without retraining, addressing the limitation that generic ViT features focus on salient cues with no user control over representation focus.

Key Findings

•
Frozen ViT features can be steered toward arbitrary visual concepts without retraining or fine-tuning
•
Steered representations outperform both generic ViT and text-prompted multimodal LLM representations on concept-specific tasks
•
The approach preserves spatial visual information that language-centric multimodal representations lose

vision-transformersrepresentation-learningsteeringDINOv2retrieval

40 upvotes

arXiv HF PDF

CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery

High Relevance

Ao Qu, Han Zheng, Zijian Zhou, Yihao Yan, Paul Pu Liang et al. — MIT, National University of Singapore, Carnegie Mellon University

First framework for autonomous multi-agent evolution on open-ended problems, replacing rigid control with long-running agents that explore, reflect, and collaborate through shared persistent memory, asynchronous execution, and heartbeat-based interventions.

Key Findings

•
Autonomous agents outperform fixed-heuristic baselines on sustained open-ended exploration tasks
•
Shared persistent memory and asynchronous execution enable emergent collaboration without central coordination
•
Heartbeat-based interventions provide lightweight oversight without constraining agent autonomy

multi-agentopen-ended-learningautonomous-evolutionpersistent-memoryLLM-agents

36 upvotes

arXiv HF PDF

NearID: Identity Representation Learning via Near-identity Distractors

High Relevance

Aleksandar Cvejic, Rameen Abdal, Abdelrahman Eldesokey, Bernard Ghanem, Peter Wonka — KAUST Center of Excellence in Generative AI

Introduces a principled framework for evaluating identity-focused tasks using Near-identity distractors that place semantically similar but distinct instances on identical backgrounds, eliminating contextual shortcuts and isolating identity as the sole discriminative signal.

Key Findings

•
Existing vision encoders conflate identity with background context in identity-focused tasks
•
Near-identity distractors eliminate contextual shortcuts and isolate genuine identity representation
•
The framework enables more reliable evaluation of personalized generation and image editing

identity-representationpersonalizationevaluationvision-encoders

26 upvotes

arXiv HF PDF

Therefore I am. I Think

High Relevance

Esakkivel Esakkiraja, Sai Rajeswar, Denis Akhiyarov, Rajagopal Venkatesaramani — ServiceNow AI

Presents evidence that reasoning models encode tool-calling decisions in pre-generation activations before chain-of-thought begins. Linear probes decode these decisions with high confidence, suggesting CoT may serve as post-hoc rationalization rather than genuine deliberation.

Key Findings

•
Linear probes decode tool-calling decisions from pre-generation activations with very high confidence
•
In some cases decisions are fully encoded before a single reasoning token is produced
•
Chain-of-thought may function as post-hoc rationalization rather than causal reasoning

reasoningchain-of-thoughtinterpretabilitymechanistic-analysisLLM-internals

20 upvotes

arXiv HF PDF

ASI-Evolve: AI Accelerates AI

High Relevance

Weixian Xu, Tiantian Mi, Yixiu Liu, Yang Nan, Pengfei Liu et al. — Shanghai Jiao Tong University

An agentic framework for AI-for-AI research that closes the research loop through a learn-design-experiment-analyze cycle, substantially outperforming GPT-5 baselines on accuracy, calibration, and precision for forecasting tasks.

Key Findings

•
End-to-end agentic research cycle automates costly, long-horizon AI research loops
•
Task-specific adaptation through learn-design-experiment-analyze outperforms general-purpose models
•
Framework substantially outperforms GPT-5 on accuracy, calibration, and precision

AI-for-AIagentic-researchself-improving-systemsautomation

17 upvotes

arXiv HF PDF

AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation

Zhaohe Liao, Kaixun Jiang, Zhihang Liu, Yujie Wei, Xihui Liu et al. — University of Hong Kong, Alibaba Group

First systematic benchmark for evaluating whether state-of-the-art image generation models can produce ready-to-use academic illustrations, addressing the gap between general image quality and the visual-logical consistency required for scientific figures.

Key Findings

•
Current image generation models struggle with visual-logical consistency required for academic illustrations
•
VLM-based evaluation is unreliable for complex academic figures with long text descriptions
•
A structured evaluation framework reveals systematic failure modes in scientific figure generation

benchmarkacademic-illustrationsimage-generationevaluationscientific-figures

9 upvotes

arXiv HF PDF

Video Models Reason Early: Exploiting Plan Commitment for Maze Solving

High Relevance

Kaleb Newman, Tyler Zhu, Olga Russakovsky — Princeton University

Reveals that video diffusion models commit to a high-level motion plan within the first few denoising steps when solving mazes, after which further denoising alters visual details but not the underlying trajectory — a form of early plan commitment.

Key Findings

•
Video diffusion models commit to a high-level trajectory plan in the earliest denoising steps
•
Later denoising steps refine visual appearance without changing the committed plan
•
Early plan commitment can be exploited to improve maze-solving performance

video-diffusionplanningdenoising-dynamicsmaze-solvingemergent-reasoning

8 upvotes

arXiv HF PDF

Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models

High Relevance

Jiawei Chen, Simin Huang, Jiawei Du, Shuaihang Chen, Zhaoxia Yin et al. — Anhui University, Chinese Academy of Sciences, National University of Singapore

Demonstrates physically realizable adversarial attacks on vision-language-action models through adversarial 3D textures applied to manipulated objects — a more practical attack surface than prior 2D patch methods for real-world robotic deployments.

Key Findings

•
3D adversarial textures on manipulated objects transfer effectively to physical robotic settings
•
VLA models are vulnerable to attacks embedded in the objects they interact with
•
The 3D attack surface is more physically realistic than prior 2D patch-based approaches

adversarial-attacksVLA-modelsrobotics3D-texturessafety

8 upvotes

arXiv HF PDF

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

Zhang Li, Zhibo Lin, Qiang Liu, Yuliang Liu et al. — Huazhong University of Science and Technology

First benchmark for multilingual digital and photographed document parsing, addressing the gap where performant models focus on clean English documents while real-world scenarios involve diverse scripts, low-resource languages, and photographed documents.

Key Findings

•
No systematic benchmark existed for multilingual digital and photographed document parsing
•
Models performant on clean English documents degrade significantly on diverse scripts and low-resource languages
•
Photographed documents introduce additional challenges beyond digital document parsing

document-parsingmultilingualbenchmarkOCRlow-resource

7 upvotes

arXiv HF PDF

Efficient and Principled Scientific Discovery through Bayesian Optimization: A Tutorial

Zhongwei Yu, Rasul Tutunov, Alexandre Max Maraval, Jun Wang et al. — University College London, Huawei Noah's Ark Lab

Comprehensive tutorial presenting Bayesian Optimization as a principled probability-driven framework that formalizes and automates the scientific hypothesis-experiment-refine cycle, aiming to replace ad-hoc experimental design with efficient, systematic optimization.

Key Findings

•
BO formalizes the traditional scientific cycle into a principled probability-driven framework
•
The tutorial bridges the gap between BO theory and practical scientific applications
•
Demonstrates resource savings through systematic experimental design versus intuition-driven approaches

bayesian-optimizationscientific-discoverytutorialexperimental-design

6 upvotes

arXiv HF PDF

Forecasting Supply Chain Disruptions with Foresight Learning

Benjamin Turtel, Paul Wilczewski, Kris Skotheim — Resilinc

An end-to-end framework that trains LLMs to produce calibrated probabilistic forecasts of supply chain disruptions using realized disruption outcomes as supervision, substantially outperforming GPT-5 on accuracy, calibration, and precision.

Key Findings

•
Task-specific LLM fine-tuning with disruption supervision outperforms general-purpose models including GPT-5
•
Calibrated probabilistic forecasts enable actionable supply chain risk management
•
Foresight learning addresses the challenge of reasoning about infrequent, high-impact events from noisy inputs

supply-chainforecastingLLM-finetuningprobabilistic-predictionenterprise-AI

5 upvotes

arXiv HF PDF

Signals: Trajectory Sampling and Triage for Agentic Interactions

Shuguang Chen, Adil Hafeez, Salman Paracha — Amazon

Proposes a lightweight, signal-based framework for triaging agentic interaction trajectories at scale, addressing the challenge that agent trajectories are voluminous, non-deterministic, and prohibitively expensive to review individually.

Key Findings

•
Agent trajectory review at scale requires lightweight signal-based triage rather than exhaustive review
•
Signal-based framework enables efficient identification of anomalous or interesting trajectories
•
The framework is practical for post-deployment improvement of multi-step agentic systems

agentstrajectory-analysismonitoringtriageobservability

2 upvotes

arXiv HF PDF

Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents

Nicholas Edwards, Sebastian Schuster — Saarland University

Systematically evaluates clarification-seeking abilities of LLM coding agents on underspecified tasks, finding that current agents optimized for autonomous execution rarely ask clarifying questions when human developers naturally would.

Key Findings

•
Current coding agents rarely seek clarification even when instructions are critically underspecified
•
Agents optimized for autonomous execution miss crucial context that humans would ask about
•
Uncertainty-aware clarification improves task completion on underspecified SWE-bench variants

coding-agentsclarificationuncertaintySWE-benchhuman-AI-interaction

4 upvotes

arXiv HF PDF

Trending Models (12)

Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

Jackrong (Community) · image-text-to-text · 27B

View on HF

Community-distilled 27B model transferring Claude 4.6 Opus reasoning capabilities into Qwen3.5 architecture. Leading downloads at 524k with 2,291 likes, representing the most successful open-weight reasoning distillation to date.

distillationreasoningchain-of-thoughtmultimodal

524.2K downloads2.3K likes

Qwen3.5-9B-Uncensored-HauhauCS-Aggressive

HauhauCS (Community) · text-generation · 9B

View on HF

Uncensored 9B Qwen3.5 variant leading all models with 715k downloads and 968 likes. The aggressive uncensoring approach indicates strong demand for unrestricted open-weight models.

uncensoredmultilingualqwen3.5

715.6K downloads968 likes

Qianfan-OCR

Baidu · image-text-to-text · unknown

View on HF

Baidu's vision-language OCR model based on InternVL architecture for document intelligence. 957 likes and growing downloads indicate strong demand for specialized OCR capabilities.

OCRvision-languageinternvldocument-understanding

36.6K downloads957 likes

Gemma-4-31B-it

Google · image-text-to-text · 31B

View on HF

Google's flagship 31B dense instruction-tuned model from the Gemma-4 family with multimodal image-text-to-text capabilities. Downloads climbing to 287k as the ecosystem matures.

gemma4multimodalinstruction-tuned

287.4K downloads853 likes

Cohere Transcribe 03-2026

Cohere Labs · automatic-speech-recognition · unknown

View on HF

Cohere's automatic speech recognition model. 790 likes and 96k downloads signal sustained interest as the audio modality gains attention from major labs.

ASRspeechaudiotranscription

96.6K downloads790 likes

Voxtral-4B-TTS-2603

Mistral AI · text-to-speech · 4B

View on HF

Mistral's 4B-parameter text-to-speech model. 661 likes on 5k downloads suggests strong community interest outpacing actual deployment, possibly awaiting tooling integration.

TTSspeech-synthesismultilingual

5.1K downloads661 likes

Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF

Jackrong (Community) · image-text-to-text · 27B (quantized)

View on HF

GGUF quantization of the Claude-distilled Qwen3.5-27B for llama.cpp deployment. 241k downloads and 502 likes demonstrate strong demand for locally-runnable reasoning models.

ggufdistillationreasoninglocal-inference

241.1K downloads502 likes

Bonsai-8B-gguf

Prism ML · text-generation · 8B (1-bit)

View on HF

1-bit quantized 8B model in GGUF format for extreme edge deployment. 384 likes and 32k downloads reflect growing interest in ultra-efficient inference.

1-bitquantizationedge-deploymentgguf

32.9K downloads384 likes

Gemma-4-26B-A4B-it

Google · image-text-to-text · 26B (4B active)

View on HF

Google's 26B MoE model with only 4B active parameters, offering dense-model quality at a fraction of compute cost. Now at 133k downloads, growing steadily.

gemma4MoEefficient-inferencemultimodal

133.2K downloads355 likes

void-model

Netflix · video-inpainting · unknown

View on HF

Netflix's video inpainting model for physics-aware object removal, the model behind the VOID paper. 310 likes with 0 downloads suggests gated or upcoming release generating anticipatory engagement.

video-inpaintingobject-removaldiffusionCogVideoX

0 downloads310 likes

Holo3-35B-A3B

Hcompany · image-text-to-text · 35B (3B active)

View on HF

New multimodal agent-focused MoE model with 35B parameters and 3B active. Architecture based on Qwen3.5-MoE suggests agent-specific model design is emerging as a distinct category.

MoEmultimodalagentqwen3.5-moe

1.3K downloads215 likes

TRIBEv2

Meta/Facebook · unknown · unknown

View on HF

Facebook's latest research model release. 293 likes and 39k downloads with limited public documentation — Meta continues to release models with minimal fanfare.

metaresearch

39.7K downloads293 likes

Trending GitHub Repos (11)

Yeachan-Heo/oh-my-codex

High RelevanceGitHub

Extension framework for OpenAI Codex CLI adding hooks, agent teams, HUDs, and more. Sustained explosive growth at 1,789 stars/day (15.7k total) — now the dominant AI coding agent customization platform.

AI-codingcodexdeveloper-toolsextensions

TypeScript15.7K+1.8K today1.5K

siddharthvaddem/openscreen

GitHub

Free, open-source screen recording studio with no subscriptions or watermarks. Continued explosive growth at 1,591 stars/day, now 20k total. The developer demo creation category is real.

developer-toolsscreen-recordingopen-source

TypeScript20.1K+1.6K today1.4K

onyx-dot-app/onyx

High RelevanceGitHub

Open-source AI chat platform supporting every LLM with advanced features. 1,197 stars/day (24.3k total) confirms self-hosted AI chat infrastructure is becoming a serious category.

AI-chatself-hostedLLM-platformopen-source

Python24.3K+1.2K today3.3K

sherlock-project/sherlock

GitHub

OSINT tool for hunting social media accounts by username. Perennial trending repo at 994 stars/day and 79k total stars.

OSINTsecuritysocial-media

Python79.4K+994 today9.3K

block/goose

High RelevanceGitHub

Open-source extensible AI agent from Block that goes beyond code suggestions with install, execute, edit, and test capabilities across any LLM. 935 stars/day at 35.7k total represents a serious Codex alternative.

AI-agentcodingextensiblemulti-LLM

Rust35.7K+935 today3.3K

Blaizzy/mlx-vlm

High RelevanceGitHub

MLX-based Vision Language Model inference and fine-tuning for Apple Silicon. 343 stars/day (3.6k total) shows Apple ecosystem AI tooling continues to grow.

MLXVLMApple-Siliconfine-tuning

Python3.6K+343 today397

HKUDS/LightRAG

High RelevanceGitHub

Simple and fast retrieval-augmented generation framework (EMNLP2025). 263 stars/day at 32k total, showing sustained production interest in RAG tooling.

RAGretrievalLLMEMNLP

Python32.1K+263 today4.6K

telegramdesktop/tdesktop

GitHub

Telegram Desktop messaging app. 249 stars/day at 30.8k total — likely trending due to a major release or policy-related attention.

messagingdesktop-appopen-source

C++30.8K+249 today6.5K

microsoft/agent-framework

High RelevanceGitHub

Microsoft's framework for building, orchestrating and deploying AI agents and multi-agent workflows in Python and .NET. 72 stars/day at 8.7k total.

agentsmulti-agentmicrosoftorchestration

Python8.7K+72 today1.4K