FIPO advances RL reasoning with future-KL credit assignment; Agentic AI frameworks dominate GitHub and HuggingFace; Qwen 3.5 ecosystem explodes across model charts

TAPS: Task Aware Proposal Distributions for Speculative Sampling

High Relevance

Mohamad Zbib, Mohamad Bazzi, Ammar Mohanna, Hasan Abed Al Kader Hammoud, Bernard Ghanem — IVUL-KAUST

Speculative decoding accelerates autoregressive generation by letting a lightweight draft model propose future tokens that a larger target model then verifies in parallel. In practice, however, draft models are usually trained on broad generic corpora, which leaves it unclear how much speculative de

Key Findings

•
speculative decoding
•
draft model
•
autoregressive generation

speculative decodingdraft modelautoregressive generationacceptance length

119 upvotes

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

High Relevance

Meituan LongCat Team, Bin Xiao, Chao Wang, Chengjiang Li, Chi Zhang — meituan-longcat

The prevailing Next-Token Prediction (NTP) paradigm has driven the success of large language models through discrete autoregressive modeling. However, contemporary multimodal systems remain language-centric, often treating non-linguistic modalities as external attachments, leading to fragmented arch

Key Findings

•
Next-Token Prediction
•
autoregressive modeling
•
multimodal systems

Next-Token Predictionautoregressive modelingmultimodal systemsdiscrete space

47 upvotes

EpochX: Building the Infrastructure for an Emergent Agent Civilization

High Relevance

Huacan Wang, Chaofa Yuan, Xialie Zhuang, Tu Hu, Shuo Zhang — QuantaAlpha

General-purpose technologies reshape economies less by improving individual tools than by enabling new ways to organize production and coordination. We believe AI agents are approaching a similar inflection point: as foundation models make broad task execution and tool use increasingly accessible, t

Key Findings

•
See paper for details

40 upvotes

Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells

Han Zhang, Guo-Hua Yuan, Chaohao Yuan, Tingyang Xu, Tian Bian — Alibaba-DAMO-Academy

Modeling cellular states and predicting their responses to perturbations are central challenges in computational biology and the development of virtual cells. Existing foundation models for single-cell transcriptomics provide powerful static representations, but they do not explicitly model the dist

Key Findings

•
masked discrete diffusion model
•
single-cell transcriptomics
•
cellular state distribution

masked discrete diffusion modelsingle-cell transcriptomicscellular state distributionconditional simulation

24 upvotes

GEMS: Agent-Native Multimodal Generation with Memory and Skills

Zefeng He, Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu

Recent multimodal generation models have achieved remarkable progress on general-purpose generation tasks, yet continue to struggle with complex instructions and specialized downstream tasks. Inspired by the success of advanced agent frameworks such as Claude Code, we propose GEMS (Agent-Native Mult

Key Findings

•
multimodal generation models
•
agent frameworks
•
agent loop

multimodal generation modelsagent frameworksagent loopagent memory

16 upvotes

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or — snap-research

Modern Text-to-Image (T2I) diffusion models have achieved remarkable semantic alignment, yet they often suffer from a significant lack of variety, converging on a narrow set of visual solutions for any given prompt. This typicality bias presents a challenge for creative applications that require a w

Key Findings

•
diffusion models
•
text-to-image
•
contextual space

diffusion modelstext-to-imagecontextual spacerepulsion

16 upvotes

ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding

Jovana Kondic, Pengyuan Li, Dhiraj Joshi, Isaac Sanchez, Ben Wiesel — ibm-granite

Understanding charts requires models to jointly reason over geometric visual patterns, structured numerical data, and natural language -- a capability where current vision-language models (VLMs) remain limited. We introduce ChartNet, a high-quality, million-scale multimodal dataset designed to advan

Key Findings

•
multimodal dataset
•
chart interpretation
•
vision-language models

multimodal datasetchart interpretationvision-language modelscode-guided synthesis

13 upvotes

VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward

Zhaochong An, Orest Kupyn, Théo Uscidda, Andrea Colaco, Karan Ahuja — google

Large-scale video diffusion models achieve impressive visual quality, yet often fail to preserve geometric consistency. Prior approaches improve consistency either by augmenting the generator with additional modules or applying geometry-aware alignment. However, architectural modifications can compr

Key Findings

•
video diffusion models
•
latent space
•
geometry foundation models

video diffusion modelslatent spacegeometry foundation modelsLatent Geometry Model

11 upvotes

HandX: Scaling Bimanual Motion and Interaction Generation

Zimu Zhang, Yucheng Zhang, Xiyan Xu, Ziyin Wang, Sirui Xu — UIUC-CS

Synthesizing human motion has advanced rapidly, yet realistic hand motion and bimanual interaction remain underexplored. Whole-body models often miss the fine-grained cues that drive dexterous behavior, finger articulation, contact timing, and inter-hand coordination, and existing resources lack hig

Key Findings

•
diffusion models
•
autoregressive models
•
motion capture

diffusion modelsautoregressive modelsmotion capturehand motion synthesis

10 upvotes

Story2Proposal: A Scaffold for Structured Scientific Paper Writing

Zhuoyang Qian, Wei Shi, Xu Lin, Li Ling, Meng Luo — AgentAlphaAGI

Generating scientific manuscripts requires maintaining alignment between narrative reasoning, experimental evidence, and visual artifacts across the document lifecycle. Existing language-model generation pipelines rely on unconstrained text synthesis with validation applied only after generation, of

Key Findings

•
multi-agent framework
•
visual contract
•
structured manuscript generation

multi-agent frameworkvisual contractstructured manuscript generationgenerate evaluate adapt loop

10 upvotes

CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence

Tianle Zeng, Hanxuan Chen, Yanci Wen, Hong Zhang

The convergence of low-altitude economies, embodied intelligence, and air-ground cooperative systems creates growing demand for simulation infrastructure capable of jointly modeling aerial and ground agents within a single physically coherent environment. Existing open-source platforms remain domain

Key Findings

•
co-simulation
•
physics-accurate
•
aerodynamic consistency

co-simulationphysics-accurateaerodynamic consistencysensor modalities

7 upvotes

Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

Trending Models (12)

Jackrong · image-text-to-text · 27B

cohere-transcribe-03-2026

image-text-to-text model

qwen3_5unslothqwenqwen3.5

337.4K downloads1.9K likes

CohereLabs · automatic-speech-recognition ·

automatic-speech-recognition model

cohere_asrautomatic-speech-recognitionaudio

50.5K downloads647 likes

Voxtral-4B-TTS-2603

mistralai · text-to-speech · 4B

text-to-speech model

vllmmistral-commontext-to-speechenfr

3.7K downloads571 likes

Qianfan-OCR

baidu · image-text-to-text ·

image-text-to-text model

internvl_chatfeature-extractionvision-language

17.6K downloads724 likes

context-1

chromadb · text-generation ·

Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF

text-generation model

gpt_osstext-generationconversational

2.4K downloads322 likes

Jackrong · image-text-to-text · 27B

Qwen3.5-9B-Uncensored-HauhauCS-Aggressive

image-text-to-text model

qwen3_5unslothqwenqwen3.5

155.5K downloads393 likes

HauhauCS · general · 9B

AI model model

uncensoredqwen3.5qwenen

623.5K downloads870 likes

tribev2

facebook · general ·

Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive

AI model model

license:cc-by-nc-4.0region:us

14.3K downloads228 likes

HauhauCS · image-text-to-text · 35B

Nemotron-Cascade-2-30B-A3B

image-text-to-text model

uncensoredqwen3.5moevision

592.8K downloads1.1K likes

nvidia · text-generation · 30B

text-generation model

nemotron_htext-generationnvidia

83.8K downloads433 likes

daVinci-MagiHuman

GAIR · image-to-video ·

image-to-video model

text-to-videoimage-text-to-videotext-to-audiotext-to-audio-video

605 downloads276 likes

OmniCoder-9B

Tesslate · text-generation · 9B

text-generation model

qwen3_5image-text-to-textqwen3.5

29.0K downloads547 likes

Trending GitHub Repos (10)

microsoft/VibeVoice

High RelevanceGitHub

Open-Source Frontier Voice AI

voice-aispeechopen-source

Python33.4K+3.9K today3.5K

obra/superpowers

High RelevanceGitHub

Agentic skills framework & software development methodology

agentsskillsdevelopment

Shell128.5K+2.6K today11.0K

luongnv89/claude-howto

NousResearch/hermes-agent

Visual guide to Claude Code with examples and templates

claudecoding-agentguide

Python13.5K+2.4K today2.0K

High RelevanceGitHub

The agent that grows with you

agentllmnous-research

Python20.6K+1.9K today2.9K

OpenBB-finance/OpenBB

Financial data platform for analysts, quants and AI agents

financedata-platformai-agents

Python64.8K+506 today6.4K

google-research/timesfm

High RelevanceGitHub

TimesFM foundation model for time-series forecasting

time-seriesfoundation-modelforecasting

Python11.5K+495 today1.1K

PaddlePaddle/PaddleOCR

microsoft/agent-lightning

OCR toolkit converting PDFs and images into structured data, supports 100+ languages

ocrdocument-aimultilingual

Python74.3K+439 today10.2K

High RelevanceGitHub

Trainer framework for AI agents

agent-trainingframeworkmicrosoft

Python16.3K+130 today1.4K

OpenBMB/ChatDev