Steerable visual representations and LLM pre-decision biases challenge core assumptions; multi-agent evolution frameworks and adversarial 3D textures push agent capabilities and risks; Gemma-4 and Claude-distilled Qwen dominate trending models

NearID: Identity Representation Learning via Near-identity Distractors

High Relevance

Aleksandar Cvejic, Rameen Abdal, Abdelrahman Eldesokey, Bernard Ghanem, Peter Wonka — KAUST Center of Excellence in Generative AI

Introduces a principled framework for evaluating identity-focused tasks using Near-identity distractors. Existing vision encoders entangle object identity with background context, leading to unreliable metrics. NearID places semantically similar but distinct instances on identical backgrounds, eliminating contextual shortcuts.

Key Findings

•
Existing vision encoders conflate identity with background context in identity-focused tasks
•
Near-identity distractors eliminate contextual shortcuts and isolate genuine identity representation
•
The framework enables more reliable evaluation of personalized generation and image editing

identity-representationpersonalizationevaluationvision-encodersimage-editing

20 upvotes

VOID: Video Object and Interaction Deletion

High Relevance

Saman Motamed, William Harvey, Benjamin Klein, Luc Van Gool, Zhuoning Yuan — Netflix, ETH Zurich

Addresses a fundamental limitation of video object removal: existing methods only inpaint appearance but fail to correct physical interactions (collisions, occlusions). VOID generates training data with synthetic interaction scenarios and produces physically plausible inpainting where removed objects had significant physical effects on others.

Key Findings

•
Current video object removal fails when the removed object has physical interactions beyond visual presence
•
Synthetic training data with interaction scenarios enables physically-plausible inpainting
•
VOID corrects downstream physical effects (collisions, trajectory changes) that current methods ignore

video-editingobject-removalvideo-inpaintingphysics-awarediffusion

17 upvotes

Therefore I am. I Think

High Relevance

Esakkivel Esakkiraja, Sai Rajeswar, Denis Akhiyarov, Rajagopal Venkatesaramani — ServiceNow AI

Presents evidence that LLM reasoning models encode decisions before chain-of-thought generation begins. Linear probes successfully decode tool-calling decisions from pre-generation activations with high confidence, suggesting CoT may serve as post-hoc rationalization rather than genuine deliberation. The finding challenges the interpretability-through-CoT paradigm.

Key Findings

•
Tool-calling decisions are detectable from pre-generation hidden states before any reasoning tokens are produced
•
Early-encoded decisions shape and potentially predetermine chain-of-thought output
•
Chain-of-thought may function as post-hoc rationalization rather than causal reasoning

LLM-reasoningchain-of-thoughtinterpretabilitymechanistictool-use

16 upvotes

CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery

High Relevance

Ao Qu, Han Zheng, Zijian Zhou, Yihao Yan, Yihong Tang — Massachusetts Institute of Technology

First framework for autonomous multi-agent evolution on open-ended problems. Replaces rigid hard-coded exploration rules with long-running agents that explore, reflect, and collaborate through shared persistent memory. Agents autonomously evolve their strategies rather than following fixed heuristics, enabling sustained search and knowledge accumulation.

Key Findings

•
Existing LLM-based evolution methods rely on fixed heuristics that limit agent autonomy
•
CORAL agents autonomously evolve exploration strategies through reflection and shared memory
•
The framework enables sustained open-ended discovery beyond fixed benchmark optimization

multi-agentevolutionopen-endedautonomous-agentsknowledge-accumulation

14 upvotes

ASI-Evolve: AI Accelerates AI

High Relevance

Weixian Xu, Tiantian Mi, Yixiu Liu, Yang Nan, Zhimeng Zhou — SII - GAIR, Shanghai Jiao Tong University

An agentic framework for AI-for-AI research that closes the learn-design-experiment-analyze loop. Augments evolutionary agents with a context-aware design mechanism and an experience registry, tackling the costly, long-horizon, weakly-supervised research loops that drive real AI progress.

Key Findings

•
Standard evolutionary agents cannot handle the long-horizon, weakly-supervised nature of AI research
•
Context-aware design and experience registry components improve research loop efficiency
•
AI-driven AI research can meaningfully accelerate the research cycle

AI-for-AIagentic-researchevolutionaryautomationmeta-learning

10 upvotes

Investigating Autonomous Agent Contributions in the Wild: Activity Patterns and Code Change over Time

High Relevance

Razvan Mihai Popescu, David Gros, Andrei Botocan, Rahul Pandita, Prem Devanbu — TU Delft AISE Lab, UC Davis

Constructs a novel dataset of ~110,000 open-source pull requests to investigate AI-driven contributions and their effects on code quality, team dynamics, and software maintainability. The first large-scale empirical study of autonomous coding agents contributing to real-world open-source projects.

Key Findings

•
Autonomous coding agents now actively contribute branches, PRs, and code reviews in real-world projects
•
The study identifies distinct activity patterns between human and agent contributors
•
Large-scale empirical evidence on AI agent impact on code quality and team dynamics

software-engineeringautonomous-agentsempirical-studycode-qualityopen-source

10 upvotes

Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models

High Relevance

Jiawei Chen, Simin Huang, Jiawei Du, Shuaihang Chen, Yu Tian — East China Normal University

Demonstrates that adversarial 3D textures on physical objects can compromise vision-language-action (VLA) robotic manipulation models. Unlike prior 2D patch attacks, adversarial 3D textures are naturally present in the scene and pose a more physically plausible and damaging threat to deployed robotic systems.

Key Findings

•
Adversarial 3D textures are more physically plausible attack surfaces than 2D patches for robotic systems
•
VLA models are vulnerable to texture-based attacks that transfer across viewpoints
•
The attack surface of deployed VLA robotic systems is larger than previously understood

adversarial-attacksroboticsVLA-models3D-texturessafety

7 upvotes

Video Models Reason Early: Exploiting Plan Commitment for Maze Solving

Kaleb Newman, Tyler Zhu, Olga Russakovsky — Princeton University

Reveals that video diffusion models commit to high-level motion plans within the first few denoising steps during generation. Using 2D maze solving as a controlled testbed, the paper shows 'early plan commitment' — a structural property where the coarse trajectory is decided early, with later steps only refining details.

Key Findings

•
Video diffusion models commit to a high-level motion plan within the first few denoising steps
•
Early plan commitment is a fundamental structural property, not an artifact of training
•
This property can be exploited for more efficient video generation and planning

video-diffusionplanningreasoningdenoisingemergent-capabilities

6 upvotes

AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation

Zhaohe Liao, Kaixun Jiang, Zhihang Liu, Yujie Wei, Junqiu Yu — Wan-AI

First benchmark using VQA for evaluating logical correctness in AI-generated academic illustrations. Addresses the gap between visual quality and factual/logical accuracy in generated scientific figures, decomposing evaluation into testable visual question-answer pairs rather than relying on unreliable VLM holistic judgments.

Key Findings

•
State-of-the-art image generation models produce visually plausible but logically incorrect academic illustrations
•
VQA-based evaluation decomposes logical consistency into verifiable sub-questions
•
Existing VLM-based evaluation is unreliable for long and complex scientific illustrations

benchmarkacademic-illustrationsimage-generationVQAlogical-consistency

6 upvotes

AutoMIA: Improved Baselines for Membership Inference Attack via Agentic Self-Exploration

Ruhao Liu, Weiqi Huang, Qi Li, Xinchao Wang — National University of Singapore, Tsinghua University

Reformulates membership inference as an automated process of self-exploration and strategy evolution using an agentic framework. AutoMIA replaces static handcrafted heuristics with an adaptive agent that discovers attack strategies, achieving better transferability across different large models.

Key Findings

•
Static MIA heuristics fail to transfer across different large models
•
Agentic self-exploration discovers more effective and transferable attack strategies
•
AutoMIA outperforms existing MIA baselines with fully automated strategy discovery

privacymembership-inferenceagenticsecurity-auditingML-safety

5 upvotes

Forecasting Supply Chain Disruptions with Foresight Learning

Benjamin Turtel, Paul Wilczewski, Kris Skotheim — Lightning Rod Labs

End-to-end framework that trains LLMs to produce calibrated probabilistic forecasts of supply chain disruptions using realized outcomes as supervision. Addresses the challenge of reasoning about infrequent, high-impact events from noisy unstructured inputs — a setting where general-purpose models struggle.

Key Findings

•
LLMs can be trained to produce calibrated probabilistic forecasts for rare supply chain events
•
Task-specific adaptation with realized outcome supervision substantially improves forecast quality
•
The framework handles noisy, unstructured inputs that general-purpose models cannot process reliably

forecastingsupply-chainLLM-applicationsprobabilisticreal-world

4 upvotes

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

Zhang Li, Zhibo Lin, Qiang Liu, Ziyang Zhang, Shuo Zhang — Alibaba Group, Zhejiang University

First benchmark for multilingual digital and photographed document parsing, spanning 3,400 document images across 17 languages and diverse scripts. Exposes the critical gap in document parsing performance on non-English, photographed, and low-resource language documents.

Key Findings

•
No systematic benchmark existed for multilingual document parsing across diverse scripts
•
Current models show significant performance degradation on non-English and photographed documents
•
MDPBench covers 17 languages including low-resource scripts previously untested

benchmarkdocument-parsingmultilingualOCRlow-resource

4 upvotes

Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

Trending Models (11)

Jackrong · text-generation · 27B

Qwen3.5-9B-Uncensored-HauhauCS-Aggressive

Qwen3.5-27B fine-tuned with distilled reasoning data from Claude 4.6 Opus. The most downloaded trending model with nearly 500k downloads, representing the community's appetite for frontier reasoning in open weights.

reasoningdistillationqwen3.5unsloth

487.4K downloads2.2K likes

HauhauCS · text-generation · 9B

Uncensored Qwen3.5-9B variant with 700k downloads in GGUF format, reflecting strong demand for unrestricted open-weight models at the 9B parameter scale.

uncensoredqwen3.5gguf

700.2K downloads950 likes

Gemma-4-31B-it

Google · image-text-to-text · 31B

Cohere Transcribe 03-2026

Google's flagship 31B dense instruction-tuned model from the Gemma-4 family with multimodal image-text-to-text capabilities. Part of a simultaneous four-model release spanning the full deployment spectrum.

gemma4multimodalinstruction-tuned

76.2K downloads688 likes

Cohere Labs · automatic-speech-recognition · unknown

Cohere's automatic speech recognition model, signaling the company's expansion into audio modalities beyond text. Strong engagement with 764 likes and 84k downloads.

ASRspeechaudiotranscription

84.6K downloads764 likes

Qianfan-OCR

Baidu · feature-extraction · unknown

Baidu's vision-language OCR model based on InternVL architecture, with 862 likes indicating strong interest in specialized OCR capabilities from Chinese tech giants.

OCRvision-languageinternvldocument-understanding

27.0K downloads862 likes

Voxtral-4B-TTS-2603

Mistral AI · text-to-speech · 4B

Mistral's 4B-parameter text-to-speech model supporting English and French. Represents Mistral's entry into audio generation with 649 likes.

TTSspeech-synthesismultilingual

4.8K downloads649 likes

Bonsai-8B-gguf

Prism ML · text-generation · 8B (1-bit)

1-bit quantized 8B model in GGUF format for llama.cpp, representing the frontier of extreme quantization for edge deployment. 358 likes suggest growing interest in ultra-efficient inference.

1-bitquantizationedge-deploymentgguf

26.2K downloads358 likes

Gemma-4-26B-A4B-it

Google · image-text-to-text · 26B (4B active)

Google's 26B MoE model with only 4B active parameters, offering dense-model quality at a fraction of the compute cost. Part of the Gemma-4 multimodal family.

gemma4MoEefficient-inferencemultimodal

24.4K downloads295 likes

context-1

ChromaDB · text-generation · unknown

A retrieval-native conversational language model from the vector database company ChromaDB, suggesting a convergence between retrieval infrastructure and language modeling.

retrieval-nativeconversationalRAG

3.2K downloads363 likes

LFM2.5-350M

Liquid AI · text-generation · 350M

A 350M-parameter liquid foundation model from Liquid AI, demonstrating architectural diversity at the small-model end of the spectrum. 212 likes indicate interest in alternative architectures.

liquid-architecturesmall-modelefficient

10.2K downloads212 likes

void-model

Netflix · video-inpainting · unknown

siddharthvaddem/openscreen

Netflix's video inpainting model for physics-aware object removal, the model behind the VOID paper. Based on CogVideoX diffusion architecture.

video-inpaintingobject-removaldiffusionCogVideoX

0 downloads207 likes

Trending GitHub Repos (13)

Yeachan-Heo/oh-my-codex

High RelevanceGitHub

Extension framework for OpenAI Codex CLI that adds hooks, agent teams, HUDs, and more. The explosive 3,047 stars-per-day growth reflects massive demand for AI coding agent customization tooling.

AI-codingcodexdeveloper-toolsextensions

TypeScript14.2K+3.0K today1.3K

sherlock-project/sherlock

Free, open-source screen recording studio alternative with no subscriptions or watermarks. 2,771 stars today indicates strong developer demand for production-quality demo creation tools.

developer-toolsscreen-recordingopen-source

TypeScript18.2K+2.8K today1.2K

onyx-dot-app/onyx

High RelevanceGitHub

Open-source AI chat platform with advanced features supporting every LLM. 1,852 stars today signals growing interest in self-hosted AI chat infrastructure.

AI-chatself-hostedLLM-platformopen-source

Python23.3K+1.9K today3.1K

OSINT tool for hunting social media accounts by username across networks. Perennial trending repo with 1,192 stars today and 78k total stars.

OSINTsecuritysocial-mediausername-search

Python78.6K+1.2K today9.2K

google-research/timesfm

High RelevanceGitHub

Google Research's time series foundation model for forecasting. 916 stars today likely driven by renewed interest in time series AI and the supply chain forecasting paper.

time-seriesforecastingfoundation-modelgoogle-research

Python14.2K+916 today1.2K

dmtrKovalenko/fff.nvim

High RelevanceGitHub

Fastest and most accurate file search toolkit specifically optimized for AI agents, Neovim, Rust, C, and Node.js. 750 stars today reflects demand for AI-agent-optimized developer tooling.

neovimfile-searchAI-agentsrustdeveloper-tools

Rust3.3K+750 today131

Blaizzy/mlx-vlm

High RelevanceGitHub

MLX-based Vision Language Model inference and fine-tuning package for Apple Silicon Macs. 499 stars today shows Apple ecosystem AI tooling is thriving.

MLXVLMApple-Siliconfine-tuninginference

Python3.2K+499 today363

f/prompts.chat

hsliuping/TradingAgents-CN

Community prompt sharing platform (formerly Awesome ChatGPT Prompts) with 157k stars. 375 stars today shows sustained interest in prompt engineering resources.

promptscommunityLLMresources

HTML157.2K+375 today20.6K

yusufkaraaslan/Skill_Seekers

Chinese-enhanced multi-agent LLM financial trading framework. 350 stars today and 23k total stars indicate strong Chinese-market demand for AI trading tools.

tradingmulti-agentfinanceChinese

Python23.3K+350 today4.9K

High RelevanceGitHub

Converts documentation websites, GitHub repos, and PDFs into Claude AI skills with automatic conflict detection. Part of the growing ecosystem extending AI coding agents.

claude-codeskillsdocumentationAI-tools

Python12.3K+158 today1.2K

MervinPraison/PraisonAI