Sunday, May 24, 2026
DelTA reframes RLVR as token-level discrimination with 189 upvotes; code knowledge graphs and AI coding agent plugins dominate GitHub; DeepSeek-V4-Pro and Tencent Hy-MT2 lead model trends
Executive Summary
Today's research landscape is anchored by DelTA (189 upvotes), which introduces a discriminator-theoretic view of reinforcement learning from verifiable rewards, showing that policy-gradient updates implicitly function as linear discriminators over token-gradient vectors. This insight fundamentally reframes how credit assignment works in LLM post-training and opens new design spaces for token-level reward shaping. TransitLM (167 upvotes) delivers the largest open transit-planning corpus at 13M+ records, while Perception or Prejudice (160 upvotes) exposes systematic bias in multimodal personality reasoning.
The agent and efficiency stack continues to mature rapidly. Full Attention Strikes Back (83 upvotes) demonstrates that full-attention LLMs are already intrinsically sparse and can be converted with minimal training, while ACC compiles agent trajectories into long-context training data. On the video generation front, WorldKV introduces train-free world memory for consistent video generation, and FlowLong enables long video generation via manifold-constrained diffusion matching.
GitHub trends reveal an explosive week for AI coding infrastructure: andrej-karpathy-skills (3,507 stars today) provides Claude Code behavioral guidelines, codegraph (2,456 stars today) offers pre-indexed code knowledge graphs, and Understand-Anything (2,299 stars today) turns repositories into interactive knowledge graphs. The model landscape is led by DeepSeek-V4-Pro (4.5M downloads) and Tencent's new Hy-MT2 translation model family spanning 1.8B to 30B parameters.
Researcher Notes
DelTA's discriminator framing has deeper implications than the paper title suggests. By showing RLVR updates implicitly act as linear discriminators over token-gradient vectors, the paper doesn't just explain existing behavior — it provides a principled basis for designing token-level credit assignment. Combined with the unsupervised PRM work (23 upvotes) that eliminates annotation bottlenecks for process supervision, we're seeing a convergence toward scalable, fine-grained reward signals that don't require human step-level labels. This is the kind of infrastructure improvement that compounds across the entire post-training pipeline.
The sparse attention thesis is strengthening. Full Attention Strikes Back's finding that trained LLMs are already intrinsically sparse (convertible in ~100 steps) pairs with Gated DeltaNet-2's decoupled erase-write mechanism for linear attention. The practical implication: inference cost reduction may not require architectural redesign from scratch — instead, post-hoc sparsification of existing models could become standard practice, similar to how quantization became routine. Watch for frameworks that combine sparsification with KV-cache compression (WorldKV, KVServe) for compound efficiency gains.
The AI coding agent ecosystem is consolidating around knowledge graphs. The simultaneous trending of codegraph (pre-indexed code KG), Understand-Anything (interactive code KG), and andrej-karpathy-skills (behavioral guidelines for coding agents) signals a maturation pattern: the raw capability of coding agents is no longer the bottleneck — instead, the constraint is how efficiently they understand and navigate codebases. Expect the next competitive frontier to be in context compression and retrieval quality, not raw generation capability.
Tencent's Hy-MT2 family is a quiet strategic move. Releasing translation models at 1.8B, 7B, and 30B-A3B (MoE) simultaneously suggests a serious push to own the multilingual translation stack. Combined with ByteDance's Lance for multimodal generation and DeepSeek-V4-Pro's continued dominance, Chinese AI labs are deploying across every modality and scale tier simultaneously — a breadth strategy that contrasts with the Western focus on frontier reasoning models.
Video generation is bifurcating into consistency vs. length problems. WorldKV tackles the consistency problem (revisiting viewpoints yields coherent content) while FlowLong attacks the length problem (extending generation horizon without drift). Q-ARVD adds the efficiency dimension via quantization. These three papers collectively define the video generation research frontier, and the first system to solve all three simultaneously will likely define the next generation of real-time interactive video.
Themes & Trends
RLVR Token-Level Credit Assignment
risingDelTA's discriminator view of RLVR and unsupervised PRMs both attack the credit assignment problem from different angles — one by revealing implicit token-level discrimination, the other by eliminating the annotation bottleneck. Together they signal scalable, fine-grained reward signals becoming practical.
Attention Efficiency and Sparse Conversion
risingFull Attention Strikes Back demonstrates intrinsic sparsity in trained LLMs, while Gated DeltaNet-2 advances linear attention through decoupled memory editing. Both approaches reduce inference cost without requiring architecture redesign from scratch.
AI Coding Agent Infrastructure
risingGitHub trends show explosive growth in code knowledge graphs (codegraph, Understand-Anything), behavioral configuration (andrej-karpathy-skills), and plugin registries (claude-plugins-official). The competitive frontier is shifting from raw generation to context efficiency and codebase navigation.
Video Generation: Consistency, Length, and Efficiency
risingWorldKV tackles viewpoint consistency via world memory, FlowLong extends generation horizon via manifold-constrained diffusion, and Q-ARVD adds quantization for deployment efficiency. The three axes of the video generation frontier are being attacked simultaneously.
Multimodal Grounded Reasoning
stablePerception or Prejudice, LatentOmni, and SpaceDG probe whether multimodal models genuinely reason or pattern-match, spanning personality assessment, audio-visual temporal grounding, and spatial robustness under degraded inputs.
Agentic Evaluation and Long-Horizon Benchmarks
risingpi-Bench evaluates proactive hidden-intent workflows, Spreadsheet-RL tests multi-step office automation, and CUSP benchmarks scientific forecasting — all moving beyond reactive task completion to sustained intentional agent behavior.
Trending Papers (15)
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards
High RelevanceKaiyi Zhang, Wei Wu, Yankai Lin — Renmin University of China, Tsinghua University
Introduces a discriminator view of RLVR updates, demonstrating that policy-gradient steps implicitly act as linear discriminators over token-gradient vectors. This theoretical reframing reveals how response-level rewards translate into token-level probability changes and enables more principled credit assignment.
Key Findings
- •
Policy-gradient RLVR updates are equivalent to linear discriminators over token-gradient vectors
- •
Provides theoretical grounding for fine-grained token-level credit assignment in RL fine-tuning
- •
Discriminative framing opens new design space for reward shaping and token selection strategies
TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation
High RelevanceHanyu Guo, Jiedong Yang, Chao Chen, Longfei Xu, Kaikui Liu — Chongqing University, ByteDance
Releases the largest open transit-planning corpus with over 13 million route records from four Chinese cities covering 120,845 stations and 13,666 lines, designed for training LLMs to perform transit route planning without map infrastructure dependencies.
Key Findings
- •
First large-scale open dataset for map-free transit route planning with 13M+ records
- •
Covers 4 Chinese cities with 120,845 stations and 13,666 lines at unprecedented scale
- •
LLMs can learn geographic transit reasoning through continual pre-training on this corpus
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?
High RelevanceCaixin Kang, Tianyu Yan, Sitong Gong, Mingfang Zhang, Liangyang Ouyang — Tsinghua University, Renmin University of China
Formalizes Grounded Personality Reasoning (GPR), a task requiring models to justify personality assessments with behavioral evidence rather than superficial cues. Reveals that current MLLMs systematically pattern-match appearances rather than reasoning from observed behavior.
Key Findings
- •
Current MLLMs predominantly pattern-match superficial cues rather than reasoning from behavioral evidence
- •
Introduces GPR as a formal evaluation framework distinguishing perception from prejudice
- •
Identifies systematic first-impression bias as a failure mode in multimodal personality assessment
pi-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows
High RelevanceHaoran Zhang, Luxin Xu, Zhilin Wang, Runquan Gui, Shunkai Zhang — University of Michigan, Carnegie Mellon University
Evaluates whether personal assistant agents can identify and act on hidden user intents before they are explicitly stated, moving beyond reactive task completion to proactive assistance in sustained, long-horizon workflows.
Key Findings
- •
Current agents struggle with proactive identification of unstated user needs and constraints
- •
Long-horizon evaluation reveals compounding failures in sustained multi-step workflows
- •
Benchmarks proactive assistance as a distinct capability from reactive task completion
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps
High RelevanceYanke Zhou, Yiduo Li, Hanlin Tang, Maohua Li, Kan Liu — Tsinghua University, Zhipu AI
Demonstrates that full-attention LLMs are already intrinsically sparse and can be transformed into highly sparse models with only ~100 adaptation steps, achieving efficient long-context inference without the cost of sparse-native training.
Key Findings
- •
Full-attention LLMs exhibit intrinsic sparsity that can be unlocked with minimal adaptation
- •
Sparse conversion requires only ~100 training steps, dramatically reducing transition cost
- •
Achieves efficient long-context inference while preserving model quality
ACC: Compiling Agent Trajectories for Long-Context Training
High RelevanceQisheng Su, Zhen Fang, Shiting Huang, Yu Zeng, Yiming Zhao — Fudan University, Shanghai AI Laboratory
Proposes compiling massive agent trajectories — tool invocations, observations, and reasoning across many turns — into training data for long-context LLMs, leveraging the natural distribution of evidence across distant context segments.
Key Findings
- •
Agent trajectories provide naturally scattered long-range dependencies for training
- •
Compiled trajectories improve long-context reasoning without costly manual curation
- •
Bridges the gap between agent execution data and LLM training data pipelines
PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects
High RelevanceZiang Cao, Yinghao Liu, Haitian Li, Runmao Yao, Fangzhou Hong — Stanford University, Shanghai Jiao Tong University
A unified framework for generating simulation-ready 3D assets with physical properties across rigid, deformable, and articulated object categories, addressing the limitation that most 3D generation methods either neglect physics or handle only one asset type.
Key Findings
- •
First unified framework spanning rigid, deformable, and articulated 3D object generation
- •
Generated assets include physical properties needed for downstream simulation tasks
- •
Novel geometry processing pipeline enables efficient cross-category generation
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning
High RelevanceYifan Dai, Zhenhua Wu, Bohan Zeng, Daili Hua, Jialing Liu — ByteDance, Zhejiang University
Argues that text-based chain-of-thought compresses continuous audio-visual signals too aggressively, weakening temporal grounding. Proposes a unified latent space for joint audio-visual reasoning that preserves fine-grained temporal evidence without discrete token bottlenecks.
Key Findings
- •
Text-based CoT weakens temporal grounding by compressing continuous audio-visual signals
- •
Unified latent space outperforms explicit text-based reasoning for multimodal tasks
- •
Preserves fine-grained temporal evidence that discrete tokenization typically discards
Forecasting Scientific Progress with Artificial Intelligence
High RelevanceSean Wu, Pan Lu, Yupeng Chen, Jonathan Bragg, Yutaro Yamada — Stanford University, Allen Institute for AI
Introduces CUSP, a temporally grounded benchmark for evaluating AI's ability to forecast scientific progress under controlled knowledge constraints, testing feasibility assessment, methodology prediction, and outcome forecasting across multiple disciplines.
Key Findings
- •
First rigorous framework for evaluating scientific forecasting under knowledge cutoff constraints
- •
AI systems show meaningful but uneven forecasting ability across scientific disciplines
- •
Temporal grounding prevents data contamination in scientific prediction evaluation
WorldKV: Efficient World Memory with World Retrieval and Compression
High RelevanceJung Yi, Minjae Kim, Paul Hyunbin Cho, Wooseok Jang, Sangdoo Yun — NAVER AI Lab, KAIST
Addresses the consistency-efficiency tradeoff in autoregressive video diffusion by proposing train-free world memory that retrieves and compresses KV-cache from past rollouts, maintaining viewpoint consistency without breaking real-time constraints.
Key Findings
- •
Full KV-cache preserves consistency but breaks real-time constraints in video generation
- •
Train-free retrieval and compression of world memory maintains long-term consistency
- •
Enables persistent world generation where revisiting viewpoints yields coherent content
Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning
Banghao Chi, Yining Xie, Mingyuan Wu, Jingcheng Yang, Jize Jiang — Peking University, Microsoft Research
Applies reinforcement learning to train spreadsheet agents that can handle complex, multi-step spreadsheet operations beyond what specialized prompting over general-purpose LLMs can achieve, targeting realistic Excel and Google Sheets workflows.
Key Findings
- •
RL-trained agents significantly outperform prompt-only approaches on complex spreadsheet tasks
- •
Demonstrates practical multi-step automation for real-world office productivity workflows
- •
Establishes a new benchmark for realistic spreadsheet task evaluation
SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers
Javad Rajabi, Kimia Shaban, Koorosh Roohi, David B. Lindell, Babak Taati — University of Toronto, Vector Institute
Proposes content-aware spectral-energy scaling for RoPE-based attention in diffusion transformers, enabling higher-resolution generation beyond training resolution without the quality degradation of uniform scaling approaches.
Key Findings
- •
Content-aware frequency scaling outperforms uniform RoPE extrapolation for DiTs
- •
Spectral energy analysis reveals that different frequency components require distinct treatment
- •
Training-free approach enables resolution extrapolation without architectural changes
Unsupervised Process Reward Models
High RelevanceArtyom Gadetsky, Maxim Kodryan, Siba Smarak Panigrahi, Hang Guo, Maria Brbic — EPFL, ETH Zurich
Proposes training process reward models without any human supervision — neither step-level annotations nor ground-truth verification — by leveraging unsupervised signals to provide fine-grained, step-level guidance for LLM reasoning.
Key Findings
- •
PRMs can be trained without any human annotations or ground-truth verification labels
- •
Removes the primary scaling bottleneck for process-level supervision in reasoning
- •
Unsupervised PRMs approach the quality of supervised alternatives on reasoning benchmarks
Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention
High RelevanceAli Hatamizadeh, Yejin Choi, Jan Kautz — NVIDIA Research, University of Washington
Improves linear attention by decoupling the erase and write operations in the delta-rule recurrence, replacing the single scalar gate with separate mechanisms for forgetting and updating the compressed memory state.
Key Findings
- •
Decoupling erase and write prevents interference in compressed memory state updates
- •
Achieves better quality-efficiency tradeoff than single-gate delta-rule approaches
- •
Extends Kimi Delta Attention with more expressive channel-wise memory editing
FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching
Jangho Park, Geon Yeong Park, Gihyun Kwon, Jong Chul Ye — KAIST
Proposes a novel inference-time method for extending video diffusion models to long sequences by constraining generation to the learned data manifold, addressing both drift errors from autoregressive approaches and quality degradation from bidirectional extensions.
Key Findings
- •
Manifold-constrained Tweedie matching prevents drift accumulation in long video generation
- •
Training-free approach compatible with existing video diffusion architectures
- •
Avoids the repetitive motion patterns typical of autoregressive video generation
Trending Models (10)
DeepSeek · text-generation · Unknown
DeepSeek's flagship V4-Pro model with 4.5M+ downloads and 4,191 likes, continuing to dominate as the most downloaded and liked model on HuggingFace with strong conversational and text-generation capabilities.
Circlestone Labs · image-generation · Unknown
Image generation diffusion model with 620K+ downloads and 1,517 likes, compatible with ComfyUI workflows and leading the trending charts for generative image creation.
SulphurAI · text-to-video · Unknown
Text-to-video generation model with 1.28M downloads and 1,302 likes, available in both diffusers and GGUF formats, representing the current state-of-the-art in open text-to-video generation.
OpenBMB · image-text-to-text · Unknown
Multimodal vision-language model with 247K downloads and 914 likes, supporting image-text-to-text tasks with strong performance in visual understanding and reasoning.
ByteDance Research · multimodal-generation · Unknown
Multimodal generation model from ByteDance supporting both image and video generation, rapidly gaining traction with 702 likes. Represents ByteDance's push into unified multimodal generation.
Supertone · text-to-speech · Unknown
Text-to-speech model with 40K downloads and 616 likes, providing high-quality speech synthesis in ONNX format for efficient deployment across platforms.
Tencent · translation · 1.8B
Compact 1.8B-parameter translation model from Tencent's Hunyuan family, part of a new multi-scale translation model lineup spanning 1.8B to 30B parameters for multilingual translation tasks.
Tencent · translation · 30B-A3B (MoE)
Tencent's largest translation model in the Hy-MT2 family using a 30B MoE architecture with 3B active parameters, balancing capacity with efficiency for high-quality multilingual translation.
Unsloth · text-generation · 27B
GGUF-quantized version of Qwen3.6-27B with multi-token prediction, enabling efficient local deployment of a strong 27B-parameter model. 597K downloads indicate strong adoption for local inference.
Cohere · image-text-to-text · Unknown
Cohere's latest Command A+ model in W4A4 quantized format with vision capabilities, gaining 182 likes as a competitive multimodal conversational model with efficient quantization.
Trending GitHub Repos (13)
A CLAUDE.md file derived from Andrej Karpathy's observations on LLM coding pitfalls, providing behavioral guidelines for Claude Code. Explosive growth at 3,507 stars today with 149K total stars signals massive adoption of structured AI coding agent configuration.
Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, OpenCode, and Hermes Agent, reducing token usage and tool calls while running 100% locally. 2,456 stars today with 19.6K total.
Converts code repositories into interactive knowledge graphs for exploration, search, and Q&A. Works with Claude Code, Codex, Cursor, Copilot, and Gemini CLI. 2,299 stars today.
Official Anthropic-managed directory of high-quality Claude Code plugins. Continued strong growth at 2,193 stars today with 26.5K total stars, establishing the canonical plugin registry.
Comprehensive AI engineering curriculum for learning, building, and shipping AI applications. 1,521 stars today with 13.8K total, indicating strong demand for structured AI engineering education.
Composable and growing agent platform built on the Hermes model family with 164K total stars and 1,331 stars today. One of the most-starred agent frameworks in the ecosystem.
Free Claude Code alternative supporting terminal, VSCode extension, and Discord interfaces with voice support. 565 stars today with 28.6K total stars.
Modern finance application with advanced market analytics, investment research, and economic data tools for data-driven decision-making. 545 stars today with 23.1K total.
Chrome DevTools as an MCP server for AI coding agents, enabling programmatic browser inspection and debugging. 435 stars today with 41.3K total, continuing strong adoption in agentic developer stacks.
Open-source managed agents platform for turning coding agents into real teammates with task assignment, progress tracking, and compound skills. 410 stars today with 31.9K total.
754 structured cybersecurity skills for AI agents mapped to 5 frameworks (MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND, NIST AI RMF) across 26 security domains. 281 stars today.
Memory library for building stateful agents, providing persistent memory infrastructure for long-running agent workflows. 112 stars today with 4.1K total.
Agent Reinforcement Trainer using GRPO for training multi-step agents on real-world tasks. Supports Qwen3.6, GPT-OSS, Llama models. 44 stars today with 9.8K total.