Agent trajectory compilation (ACC) opens new long-context training paradigm; Gated DeltaNet-2 decouples linear attention memory editing; code knowledge graphs and agentic skills frameworks explode on GitHub

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

High Relevance

Ali Hatamizadeh, Yejin Choi, Jan Kautz — NVIDIA, University of Washington

Addresses a fundamental limitation in delta-rule linear attention models where a single scalar gate controls both erasing and writing to the compressed recurrent state. By decoupling these operations into separate gating mechanisms, the model avoids the interference where one operation scrambles the other's associations.

Key Findings

•
Single-gate delta-rule models suffer from erase-write interference in compressed memory
•
Decoupled gating mechanisms allow independent control of memory erasure and new value writing
•
Improves upon KDA's channel-wise decay approach for managing the fixed-size recurrent state

linear-attentionefficient-transformersarchitecturememory

1 upvotes

WorldKV: Efficient World Memory with World Retrieval and Compression

High Relevance

Jung Yi, Minjae Kim, Paul Hyunbin Cho, Wooseok Jang, Sangdoo Yun — NAVER AI Lab, Korea Advanced Institute of Science and Technology

Proposes a retrieval-and-compression approach to KV-cache management for autoregressive video diffusion models, enabling persistent world generation where revisiting previously seen viewpoints yields consistent content without breaking real-time constraints.

Key Findings

•
Full KV-cache attention preserves world consistency but memory and compute grow linearly with rollout length
•
Sliding window inference restores throughput but sacrifices long-term consistency
•
WorldKV combines retrieval and compression to maintain both consistency and real-time performance

video-generationworld-modelsKV-cachereal-time

2 upvotes

Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning

Banghao Chi, Yining Xie, Mingyuan Wu, Jingcheng Yang, Jize Jiang — Zhejiang University, Alibaba Group

Applies reinforcement learning to train LLM agents for realistic spreadsheet automation tasks. Addresses limitations of specialized prompting approaches that struggle with complex multi-step spreadsheet operations beyond simple cell manipulation.

Key Findings

•
Specialized prompting over general-purpose LLMs fails on complex spreadsheet operations
•
RL training enables agents to learn multi-step spreadsheet manipulation strategies
•
Bridges the gap between toy spreadsheet tasks and real-world data-centric workflows

agentsreinforcement-learningspreadsheetsautomation

2 upvotes

Swift Sampling: Selecting Temporal Surprises via Taylor Series

Dahye Kim, Bhuvan Sachdeva, Karan Uppal, Naman Gupta, Vineeth N. Balasubramanian — Indian Institute of Technology Hyderabad, Samsung Research

Introduces a training-free frame selection algorithm inspired by the brain's predictive coding that identifies high-information moments in long-form video by modeling it as a differentiable trajectory in visual latent space and computing velocity-based surprise scores.

Key Findings

•
Most frames in long-form video are redundant; critical information resides in temporal surprises
•
Taylor series-based velocity computation identifies moments where visual features deviate from predicted evolution
•
Training-free approach requires no task-specific fine-tuning for frame selection

video-understandingframe-selectionpredictive-codingefficiency

2 upvotes

Diversed Model Discovery via Structured Table Discovery

Zhengyuan Dong, Renée J. Miller — Northeastern University

Argues that model search is inherently comparative and proposes leveraging structured artifacts from model cards — performance tables, configuration data, dataset metadata — to produce diverse, differentiated model recommendations beyond what text-based semantic similarity can achieve.

Key Findings

•
Text-based model search produces homogeneous results due to semantic similarity clustering
•
Structured table artifacts in model cards capture differentiation dimensions text misses
•
Comparative model search requires balancing task alignment with measurable differentiation

model-discoverymodel-cardsinformation-retrievalstructured-data

2 upvotes

From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning

High Relevance

Xitai Jiang, Zihan Tang, Wenze Lin, Yang Yue, Shenzhi Wang — Shanghai Jiao Tong University, Tsinghua University

Introduces SCRL, a curriculum RL framework that derives verifiable subproblems from reference reasoning chains and uses progressive difficulty scheduling to solve the credit assignment problem in outcome-based RLVR, where correct final-answer rollouts are too rare for efficient learning on hard problems.

Key Findings

•
Outcome-based RLVR is inefficient on hard problems because correct final-answer rollouts are rare
•
Decomposing problems into verifiable subproblems enables partial credit assignment from failed attempts
•
Curriculum scheduling from easy to hard subproblems improves sample efficiency

reinforcement-learningreasoningcurriculum-learningcredit-assignment

0 upvotes

Bernini: Latent Semantic Planning for Video Diffusion

Bernini Team, Chenchen Liu, Junyi Chen, Lei Li, Lu Chi — ByteDance

Unifies multimodal large language models and diffusion models through a division of labor: MLLMs perform semantic planning while diffusion models render pixels from high-level semantic guidance and low-level visual features, enabling controllable video generation with strong semantic grounding.

Key Findings

•
MLLMs and diffusion models can be unified through semantic planning plus pixel rendering
•
Latent semantic representations bridge the gap between language reasoning and visual generation
•
The division of labor leverages each architecture family's strengths without compromise

video-generationdiffusionmultimodalsemantic-planning

0 upvotes

TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks

High Relevance

Zhaoyang Chu, Jiarui Hu, Xingyu Jiang, Pengyu Zou, Han Li — Renmin University of China, Ant Group

Introduces a scalable data engine that reverse-engineers evaluation tasks from 80,870 real terminal recordings, producing 1,530 validated tasks spanning 18 categories and 1,280 unique commands, with a curated verified subset of 200 tasks for comprehensive agent evaluation.

Key Findings

•
Automated pipeline converts 80K real terminal recordings into 1,530 validated evaluation tasks
•
Tasks span 18 real-world categories from short operations to 50+ step workflows
•
Coverage of 1,280 unique commands provides breadth unavailable in manually crafted benchmarks

benchmarksagentsterminalevaluation

0 upvotes

pi-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows

Haoran Zhang, Luxin Xu, Zhilin Wang, Runquan Gui, Shunkai Zhang — Peking University, Microsoft Research

Evaluates whether personal assistant agents can identify and act on hidden intents — needs, constraints, and preferences that users leave unstated — in sustained long-horizon workflows, addressing a core challenge in proactive assistance that existing benchmarks overlook.

Key Findings

•
Existing benchmarks rarely evaluate proactive identification of unstated user needs
•
Long-horizon workflows amplify the importance of hidden intent detection
•
Proactive assistance requires different capabilities than reactive task completion

agentspersonal-assistantsbenchmarksproactive-AI

0 upvotes

Trending Models (10)

DeepSeek V4 Pro

DeepSeek · text-generation · unknown

Latest flagship model from DeepSeek with 4M+ downloads, continuing the V4 architecture's dominance in the open-source LLM ecosystem for conversational and general text generation tasks.

conversationaltext-generationfrontier

4.0M downloads4.1K likes

DeepSeek V4 Flash

DeepSeek · text-generation · unknown

Efficient variant of DeepSeek V4 optimized for faster inference while maintaining strong conversational and text generation capabilities, achieving 2.4M+ downloads.

conversationaltext-generationefficient

2.4M downloads1.2K likes

Anima

Circlestone Labs · image-generation · unknown

Diffusion model with 1,468 likes gaining strong traction in the generative image community, compatible with ComfyUI workflows.

diffusionimage-generationcomfyui

591.8K downloads1.5K likes

Sulphur-2-base

SulphurAI · text-to-video · unknown

Text-to-video model with over 1.1M downloads, available in both diffusers and GGUF formats, establishing itself as a leading open-source video generation model.

text-to-videodiffusersgguf

1.2M downloads1.2K likes

MiniCPM-V-4.6

OpenBMB · image-text-to-text · unknown

Multimodal vision-language model with 196K downloads and 876 likes, continuing the MiniCPM-V series' strong performance in image-text understanding at efficient model sizes.

multimodalvision-languageefficient

196.1K downloads876 likes

Lance

ByteDance Research · multimodal · unknown

Any-to-any multimodal model supporting image and video generation from ByteDance, rapidly gaining community attention with 572 likes despite relatively low download count, suggesting strong interest from early adopters.

multimodalimage-generationvideo-generation

739 downloads572 likes

Fara-7B

Microsoft · image-text-to-text · 7B

7B parameter multimodal vision-language model from Microsoft built on Qwen2.5-VL architecture, achieving 592 likes and 15K downloads for image-text understanding tasks.

multimodalvision-languageqwen

15.2K downloads592 likes

Supertonic-3

Supertone · text-to-speech · unknown

Text-to-speech model with ONNX format support, achieving 535 likes and 34K downloads for high-quality speech synthesis applications.

ttsspeech-synthesisonnx

35.0K downloads535 likes

HiDream-O1-Image

HiDream AI · image-text-to-image · unknown

Vision-language model combining image understanding and image generation capabilities in a single architecture based on Qwen3-VL, with 417 likes and 21K downloads.

multimodalimage-generationvision-language

21.6K downloads417 likes

Qwen3.6-27B-MTP-GGUF

Unsloth · text-generation · 27B

multica-ai/andrej-karpathy-skills

GGUF quantized version of Qwen3.6-27B with Multi-Token Prediction, enabling efficient local deployment with 478K downloads.

ggufquantizedqwenefficient-inference

478.5K downloads376 likes

Trending GitHub Repos (15)

colbymchenry/codegraph

High RelevanceGitHub

Pre-indexed code knowledge graph for AI coding agents (Claude Code, Codex, Cursor, OpenCode), reducing token usage and tool calls while keeping everything local. Leading today's GitHub trending with 4,294 daily stars.

code-knowledge-graphai-codingdeveloper-tools

TypeScript13.7K+4.3K today781

High RelevanceGitHub

A single CLAUDE.md file derived from Andrej Karpathy's observations on LLM coding pitfalls, rapidly adopted as best-practice guidance for Claude Code agents. 143K total stars.

ai-codingbest-practicesclaude-code

143.3K+2.6K today14.7K

Imbad0202/academic-research-skills

High RelevanceGitHub

Academic research workflow skills for Claude Code covering the full pipeline from research to writing, review, revision, and finalization. 2,579 daily stars.

academic-researchai-writingclaude-code

Python18.2K+2.6K today1.6K

NousResearch/hermes-agent

High RelevanceGitHub

Nous Research's personal AI agent platform with 161K total stars and 2,056 daily stars, positioning itself as the leading open-source personal agent framework.

agentspersonal-aiopen-source

Python161.6K+2.1K today26.3K

obra/superpowers

High RelevanceGitHub

Agentic skills framework and software development methodology with 201K total stars, providing structured approaches to AI-assisted development.

agentic-skillsdevelopment-methodologyai-assisted-dev

Shell201.6K+1.6K today18.0K

rohitg00/ai-engineering-from-scratch

Comprehensive learning resource for AI engineering covering the full stack from foundations to deployment, gaining 1,333 daily stars.

ai-engineeringeducationfull-stack

Python10.8K+1.3K today2.1K

truelockmc/streambert

msitarzewski/agency-agents

Cross-platform Electron desktop app for streaming and downloading media content with zero ads, gaining 1,094 daily stars.

electronstreamingdesktop-app

JavaScript4.0K+1.1K today315

High RelevanceGitHub

Complete AI agency framework with specialized expert agents for different domains — from frontend development to community management — each with defined processes and deliverables. 103K total stars.

ai-agentsspecialized-agentsagency

Shell103.7K+1.0K today17.1K

trimstray/the-book-of-secret-knowledge

Curated collection of inspiring lists, manuals, cheatsheets, and developer tools with 222K total stars, a perennial trending resource.

developer-resourcescheatsheetsreference

222.5K+756 today13.3K

rmyndharis/OpenWA