Daily Research Feed

training-data-qualityagentic-self-improvementefficient-training-frameworksmultimodal-scalingagent-context-managementgenerative-rendering

Monday, July 27, 2026

DataPrep-Bench sets first unified standard for LLM training data preparation; Skill Self-Play from Qwen co-evolves diverse agent skills via RL; NVIDIA's Molt offers a lean PyTorch-native framework for agentic RL at scale

recursive-self-improving-agentsagent-infrastructure-maturationextreme-quantization-on-deviceembodied-ai-roboticsevolving-user-intent-multi-turnagentic-skills-frameworks

Sunday, July 26, 2026

AREX's recursively self-improving deep research agents top HuggingFace with 127 upvotes; agent infrastructure matures across NOOA, WorkBuddy Bench, and OpenForgeRL; GitHub trending dominated by agentic skills frameworks and AI-native tooling as GLM-5.2 and extreme-quantized models lead HuggingFace adoption

recursive-self-improving-agentsagentic-benchmarking-infrastructureembodied-navigation-visual-trackingteacher-free-self-distillationai-native-browsers-web-agents

Saturday, July 25, 2026

AREX debuts recursively self-improving deep research agents with verify-then-refine loops (118 upvotes); agentic benchmarking and training infrastructure surges across WorkBuddy Bench, NOOA, and OpenForgeRL; AI-native browsers and web agents dominate GitHub trending alongside GLM-5.2's massive HuggingFace adoption

long-horizon-agent-benchmarkingvisual-pretraining-reinventionllm-knowledge-memorization-vs-generalizationaudio-language-modeling-open-weightstabular-foundation-modelstrading-agent-tools-github-surge

Tuesday, July 14, 2026

Agent benchmarks mature with Long-Horizon-Terminal-Bench; visual pretraining challenged by video-first and pixel-space approaches; knowledge-using gap in LLM finetuning exposed; trading agent tools continue GitHub domination

video-generation-as-vision-pretrainingsovereign-multilingual-moe-modelslong-context-test-time-trainingagent-safety-destructive-command-guardscommunity-model-remixing-quantizationclaude-code-ecosystem-consolidation

Monday, July 13, 2026

Video generation models reframed as general-purpose vision learners (GenCeptional); sovereign German-English MoE hybrid Mamba-Transformer Soofi S 30B-A3B matches dense 14-27B models; Destructive Command Guard explodes to 444 stars/day as agent safety tooling surges

real-time-interactive-video-generationbenchmark-shortcut-auditinglong-context-efficient-attentionagentic-memory-proactive-agentsai-for-science-drug-discoveryclaude-code-ecosystem-tooling-surge

Sunday, July 12, 2026

Weekend lull on arXiv but HuggingFace's Vidu S1 (117 upvotes) pushes real-time voice-controlled video generation to consumer GPUs, while NAVER's Video-Oasis and RCORE papers both expose shortcut learning in video benchmarks; GitHub trending is dominated by a Claude Code tooling surge led by DesktopCommanderMCP

icml-2026-wrap-updiffusion-language-modelsai-alignment-governanceparameter-importance-fine-tuningmodel-compressionlatent-reasoning-controlgrok-4-5-release

Saturday, July 11, 2026

ICML 2026 closes in Seoul with diffusion language model ordering and AI alignment-as-censorship winning top paper awards; Grok 4.5 launches as a frontier-adjacent Opus-class model at 60% lower cost; arXiv surfaces Super Weights fine-tuning paradox and Latent Memory Palace for control reasoning

real-time-video-generationagent-skills-ecosystemvideo-understanding-evaluationlong-context-efficiencyai-for-sciencelinear-attention-architectures

Friday, July 10, 2026

Vidu S1 achieves real-time interactive video generation at 42 FPS with voice control; agent skills ecosystem consolidation continues with skills, superpowers, and governance repos dominating GitHub trending; linear attention architectures get systematic comparative study

agent-skills-infrastructureembodied-ai-world-modelsrl-post-training-advancesmodel-ecosystem-diversificationai-agent-security-governancellm-efficiency-linearization

Thursday, July 9, 2026

Agent skills ecosystem explodes across GitHub with skill libraries, optimizers, and security scanners; embodied AI advances with LingBot-World 2.0's infinite-horizon interactive worlds and MoE video pretraining; RL post-training research matures with GRPO signal recovery and competitive cross-model training

world-model-formalizationkv-cache-depth-compression3d-generation-reconstruction-unificationllm-agent-safety-efficiencyopen-weight-frontier-modelsagent-skill-ecosystem

Wednesday, July 8, 2026

World models get a formal definition and roadmap; KV cache compression sees two new cross-layer depth-factorization approaches; 3D scene unification via pixel-space diffusion (PixWorld, ECCV 2026) advances the generation-reconstruction frontier; GitHub trending remains dominated by agent-skill and red-teaming tooling

training-inference-consistency-in-rlembodied-ai-inference-runtimesdiffusion-transformer-quantizationvlm-data-curationllm-research-ideationagent-skill-ecosystem-and-red-teaming

Tuesday, July 7, 2026

Training-inference mismatch in LLM RL becomes the day's dominant story at 142 upvotes; embodied AI gets a unified C++ inference runtime and closed-loop VLA correction; diffusion transformer quantization and VLM data curation push efficiency research forward; GitHub trending is saturated with Claude Code skills, agent multiplexers, and AI red-teaming tools

rl-training-inference-mismatch-&-post-training-stabilityembodied-ai-systems-engineering-&-closed-loop-controlefficient-generative-model-inference-(quantization-&-data-centric-methods)agent-&-mcp-security---red-teamingopen-weight-frontier-model-race-&-efficient-quantized-variantsclaude-code---agent-skills-tooling-ecosystem-growth

Monday, July 6, 2026

LLM RL training stability takes center stage with a 142-upvote paper on training-inference mismatch; embodied AI inference runtimes and VLA closed-loop control mature quickly; diffusion transformer quantization goes data-agnostic; agent red-teaming frameworks emerge to secure the fast-growing MCP/agent ecosystem; GLM-5.2, DeepSeek-V4-Pro and Qwen3.6 lead a crowded frontier open-weight model race; Claude Code skills and agent-tooling repos dominate GitHub trending.

agent-memory-as-trainable-skillfuzzy-function-neural-compilationagentic-benchmark-proxieshybrid-attention-efficiencydiffusion-and-visual-generation-efficiencyclaude-code-agent-skills-ecosystem

Sunday, July 5, 2026

Program-as-Weights compiles natural-language specs into locally-executable neural artifacts, topping HF Papers with 107 upvotes; agent-memory research (AgenticSTS, AutoMem, SkillCoach, DuoMem) converges on treating memory as a trainable, bounded skill rather than an ever-growing context; hybrid-attention and flow-matching work pushes long-context and diffusion efficiency; and GitHub trending is dominated by a viral wave of Claude Code / agent-skills repositories.

agent-memory-and-evaluationneural-compilation-paradigmsefficient-attention-architecturesvisual-generation-and-world-modelsagentic-tooling-ecosystem

Saturday, July 4, 2026

Program-as-Weights surges to 58 upvotes as fuzzy-function programming gains momentum; Agent memory and evaluation benchmarks dominate with AgenticSTS, EvoPolicyGym, and SkillCoach; Agentic tooling ecosystem explodes on GitHub as US Independence Day thins paper submissions

fuzzy-function-programmingagent-benchmarkingagent-skills-ecosystemgguf-model-distributionvisual-generative-optimizationspecialized-domain-evaluation

Friday, July 3, 2026

Program-as-Weights proposes compiling natural-language specs into local neural artifacts; agency-agents explodes to 125K stars as agent skills ecosystem dominates GitHub trending; Ornith-1.0 family from DeepReinforce AI saturates model charts across 9B/35B/397B tiers

world-foundation-modelsspeculative-decoding-innovationsagent-memory-and-skill-evolutionon-policy-distillationagent-infrastructure-platformsgguf-model-distribution

Thursday, July 2, 2026

SenseTime's Orca world foundation model dominates with 178 upvotes; speculative decoding innovations from BlockPilot and Multi-Block Diffusion accelerate inference; agent infrastructure consolidates on GitHub with agency-agents crossing 123K stars

environment-free-agent-trainingagent-infrastructure-platformson-policy-distillationmoe-model-ecosystemai-powered-investingprivacy-first-infrastructure

Wednesday, July 1, 2026

Agent infrastructure explosion continues with agency-agents and Agent-Reach dominating GitHub; Dockerless from ByteDance eliminates execution environments for coding agent verification; GLM-5.2 and Ornith model families surge across quantization variants

on-policy-distillation-advancesagent-infrastructure-explosionfinancial-trading-agentsmoe-models-and-quantization3d-reconstruction-renderingrag-optimization

Tuesday, June 30, 2026

AI agent infrastructure dominates GitHub with agency-agents and cognee surging; on-policy distillation advances with AsyncOPD and PHF; financial trading agents proliferate across open-source ecosystem

codebase-knowledge-infrastructurephysics-grounded-roboticsrlhf-for-image-generationagent-world-modelingopen-model-quantizationai-assisted-peer-review

Monday, June 29, 2026

Codebase knowledge graphs explode on GitHub; NVIDIA and Stanford advance physics-grounded robotics simulation; RLHF matures for image generation with Qwen-Image-2.0-RL

agent-memory-infrastructureverification-and-reward-signalsunified-generative-modelsagentic-rl-distillationworld-modelingefficient-inference

Sunday, June 28, 2026

Agent memory and infrastructure dominate GitHub trending; DanceOPD and Qwen-Image-Agent advance unified generative models; verification and reward signal design emerge as critical bottlenecks for agentic RL

verification-bottleneckunified-generative-modelsagentic-rl-and-distillationagent-infrastructure-toolingworld-modelingefficient-inference

Saturday, June 27, 2026

Verification becomes the bottleneck for coding agents; DanceOPD unifies conflicting image generation capabilities via field distillation; agent infrastructure explodes with design.md, OpenMontage, and AI-Berkshire

agentic-image-generationworld-modelingmoe-architecturesagent-tooling-infrastructuredesign-for-agents

Friday, June 26, 2026

Agentic image generation advances with Qwen-Image-Agent; GLM-5.2 MoE and Krea-2 lead model trending; Agent-Reach and OpenMontage drive massive open-source agentic tooling surge

efficient-rerankingevolvable-embeddingsagentic-benchmarkinglong-horizon-planningagentic-video

Tuesday, June 23, 2026

KaLM reranker decouples query-passage computation for efficient retrieval; EvoEmbedding brings temporal memory to embeddings; agentic benchmarking matures with DailyReport, PlanBench-XL, and PhySciBench; OpenMontage and codebase-memory-mcp lead GitHub trending

diffusion-lmparallel-generationagentic-infrastructuretoken-compressionvisual-grounding

Monday, June 22, 2026

PerceptionDLM achieves parallel region perception via diffusion LMs; GLM-5.2 and DeepSeek-V4-Pro lead model trending; AI agent tooling explosion continues with OpenMontage, headroom, and hermes-agent

world-model-persistent-stateagentic-robot-self-improvementvideo-multimodal-controlllm-agent-evaluationagentic-engineering-tooling

Friday, June 19, 2026

World model persistent state questioned in two simultaneous papers; codebase-memory-mcp surges to 2,322 GitHub stars; GLM-5.2 and MiniMax-M3 top HuggingFace trending

autonomous-agentsagi-to-asimultimodal-reasoninglora-optimizationagent-security

Monday, June 15, 2026

Chatbot-to-Digital-Colleague paradigm shift leads HuggingFace papers; NVIDIA LocateAnything-3B tops model trending; SkillSpector AI agent security scanner surges on GitHub

spatial-intelligence-pretrainingefficient-rl-traininginference-accelerationbrain-computer-interfacesedge-ai-modelsai-agent-tooling

Tuesday, June 2, 2026

VLM vs VGM spatial intelligence showdown reveals complementary strengths; ESPO cuts RL rollout waste by 20% with early-stopping PPO; NVIDIA LocateAnything-3B leads trending models with 807 likes

agent-safety-alignment-momentumlong-horizon-agent-evaluationedge-ai-model-proliferationunified-multimodal-modelsai-developer-tooling-explosionspeech-and-video-generation

Monday, June 1, 2026

AgentDoG 1.5 surges to 127 upvotes as agent safety remains the week's top story; NVIDIA LocateAnything-3B debuts as trending model leader with 606 likes; microsoft/markitdown dominates GitHub at 2,798 stars/day

agent-safety-sustained-momentumopen-multimodal-model-consolidationwifi-spatial-intelligenceembodied-ai-unificationspeech-synthesis-maturationoffline-ai-applications

Sunday, May 31, 2026

AgentDoG 1.5 surges to 111 upvotes as agent safety dominates weekend discourse; Qwen3.6-27B hits 5M downloads making it the second most downloaded model; RuView explodes with 655 stars/day for WiFi-based spatial intelligence without cameras

agent-safety-alignmentembodied-foundation-modelsunified-retrieval-systemsspeech-synthesis-renaissanceefficient-lora-techniquesvideo-world-models

Saturday, May 30, 2026

AgentDoG 1.5 proposes lightweight safety alignment for open-world AI agents with 81 upvotes; Qwen-VLA unifies manipulation and navigation across robot embodiments; VoxCPM surges 1,815 stars/day for tokenizer-free multilingual TTS

rl-training-integritynative-multimodal-generationagent-skills-ecosystemcinematic-video-generationcompact-edge-modelsai-output-quality-alignment

Friday, May 29, 2026

LaRA detects data contamination in RL post-training via layer-wise representation analysis; NAVA from Baidu achieves native audio-visual alignment for joint generation; agent skills ecosystem continues explosive growth with Understand-Anything gaining 3,776 stars/day

research-level-mathematical-reasoningnative-vision-language-modelsagent-skills-ecosystemefficient-video-generationagent-governance-and-securitycompact-model-deployment

Thursday, May 28, 2026

ResearchMath-14K introduces largest research-level math dataset with multi-agent curation from Seoul National University; NEO-ov pioneers native one-vision VLMs for multi-image and video understanding; Understand-Anything leads GitHub with 4,465 stars/day as agent skills ecosystem explodes

robust-3d-reconstructionvideo-generation-evaluationdiffusion-model-efficiencyspatial-foundation-modelsai-agent-infrastructurespecialized-model-ecosystem

Wednesday, May 27, 2026

Light paper day highlights KAIST's GARD for robust 3D reconstruction under degraded conditions and Tencent's EvalVerse for cinematic video evaluation; NousResearch hermes-agent (169K stars) and ECC agent harness (194K stars) dominate GitHub; DeepSeek-V4-Pro maintains 5M-download lead as Anima (1,556 likes) and Hy-MT2 translation models surge

multi-vector-retrieval-efficiencyai-research-automationtranslation-model-specializationmultimodal-generation-convergenceai-coding-agent-infrastructurefinancial-ai-tooling

Tuesday, May 26, 2026

SMART unlocks latent multi-vector retrieval from frozen single-vector models as a plug-and-play upgrade; AutoResearch AI surveys the full spectrum of AI-powered scientific workflow automation; Tencent Hy-MT2 translation models and ByteDance Lance multimodal generator dominate HuggingFace trending; AI coding agent tooling consolidation accelerates with ECC (192K stars), andrej-karpathy-skills (155K stars), and Understand-Anything (31K stars) leading GitHub

visual-chain-of-thought-reasoningscaling-laws-information-theoryagent-skill-optimization3d-scene-reconstructionai-coding-agent-infrastructureefficient-image-generation

Monday, May 25, 2026

ETCHR decouples image editing from reasoning to unlock fine-grained visual chain-of-thought; Shannon Scaling Law reframes LLM training as noisy-channel transmission; AI coding agent infrastructure dominates GitHub with Understand-Anything (4,000 stars today) and andrej-karpathy-skills (2,551 stars today)

rlvr-token-credit-assignmentattention-sparsification-efficiencyai-coding-agent-infrastructurevideo-generation-consistencymultimodal-grounded-reasoningagentic-evaluation-benchmarks

Sunday, May 24, 2026

DelTA reframes RLVR as token-level discrimination with 189 upvotes; code knowledge graphs and AI coding agent plugins dominate GitHub; DeepSeek-V4-Pro and Tencent Hy-MT2 lead model trends

rlvr-and-credit-assignmentattention-sparsificationagentic-evaluation-benchmarksagent-infrastructure-and-governancemultimodal-robustnesskv-cache-and-inference-efficiency

Saturday, May 23, 2026

RLVR token-credit assignment (DelTA) advances fine-grained LLM training signals; full-attention sparsification shows LLMs are intrinsically sparse; agent governance and tooling ecosystems explode on GitHub

agent-training-from-trajectoriesefficient-attention-mechanismsagent-benchmarks-evaluationagentic-coding-toolscurriculum-reinforcement-learning

Friday, May 22, 2026

Agent trajectory compilation (ACC) opens new long-context training paradigm; Gated DeltaNet-2 decouples linear attention memory editing; code knowledge graphs and agentic skills frameworks explode on GitHub

rl-for-reasoningagent-infrastructurevideo-generation-editingmultimodal-hallucinationautonomous-research

Thursday, May 21, 2026

Audio-visual Clever Hans effect exposes MLLM hallucinations; RL-for-reasoning wave crests with five new methods; agent infrastructure matures as OpenComputer and EnvFactory tackle verifiable environments

ai-video-quality-evaluationomni-modal-agent-benchmarkingrl-credit-assignment-and-process-rewardsagent-knowledge-graphs-and-memoryagent-skills-ecosystem-consolidationmultilingual-document-understanding

Wednesday, May 20, 2026

Artifact-Bench exposes MLLM blindspots in AI video quality assessment; OmniGUI pioneers omni-modal GUI agent benchmarking; agent skills and code knowledge graphs dominate GitHub with Karpathy-inspired best practices surging

ai-research-automation-integrityvideo-generation-infrastructurediffusion-language-model-hybridspersonal-ai-and-agent-toolingllm-inference-optimizationagent-native-interfaces

Tuesday, May 19, 2026

AI auto-research integrity crisis mapped end-to-end; ODE-native video alignment via KVPO breaks new ground; open-source personal AI and agent-native CLI tooling dominate GitHub

agentic-coding-assistantsfinancial-ai-modelsllm-context-and-memory-managementai-infrastructure-and-sredocument-preprocessing-for-llmsclaude-ecosystem-explosion

Tuesday, April 14, 2026

Claude-powered agentic coding ecosystem explodes on GitHub with hermes-agent and skills frameworks; financial AI surges with Kronos foundation model; AI SRE tooling emerges as new frontier

sft-vs-rl-generalizationreal-world-agent-evaluationstyle-and-visual-generationvideo-understanding-and-generationagent-tooling-ecosystemopen-weight-model-competition

Monday, April 13, 2026

SFT generalization vindicated with conditional analysis reaching 294 upvotes; ClawBench tests AI agents on 153 real-world online tasks; MegaStyle scales style datasets to 170K prompts; agent tooling ecosystem explodes on GitHub

sft-generalization-momentumagent-evaluation-maturationmoe-architecture-diversificationvideo-generation-controlagentic-infrastructure-buildoutvision-language-ocr-push

Sunday, April 12, 2026

SFT generalization rethink surges to 190 upvotes reshaping post-training orthodoxy; ClawBench leaps to 122 testing agents on real-world tasks; GLM-5.1 MoE and Netflix void-model debut on HuggingFace; hermes-agent dominates GitHub at 6,438 stars/day

agentic-ai-frameworksclaude-code-meta-toolingdomain-specific-foundation-modelstokenizer-free-ttsdocument-parsing-infrastructurewatermark-adversarial-research

Saturday, April 11, 2026

Agentic AI frameworks dominate GitHub trending with hermes-agent, Archon, and multica surging; financial foundation models and tokenizer-free TTS signal new frontier applications; Claude Code tooling meta-layer emerges as a distinct engineering discipline

agentic-ai-frameworksllm-developer-toolingfinancial-foundation-modelsspeech-synthesis-multilingualdata-infrastructure-for-aipersonalized-learning-agents

Friday, April 10, 2026

Agentic AI frameworks surge with NousResearch Hermes-Agent and Multica hitting thousands of GitHub stars; Financial AI gains traction via Kronos foundation model; Claude Code best-practices meta-repos signal maturing LLM developer tooling ecosystem

agent-evaluation-benchmarksreasoning-self-refinementefficient-large-model-trainingdiffusion-language-modelsgemma4-ecosystemagent-framework-explosion

Thursday, April 9, 2026

GBQA benchmark reveals frontier LLMs catch under half of game bugs autonomously; ThinkTwice unifies reasoning and self-refinement via GRPO; Gemma 4 family dominates HuggingFace trending with six model variants

test-time-adaptationlinear-attention-replacementsagent-environment-infrastructurehallucination-detectionautonomous-agent-evaluationagent-tooling-dominance

Wednesday, April 8, 2026

In-Place Test-Time Training enables LLMs to adapt during inference; Polynomial Mixer achieves linear-time attention replacement; Gym-Anything turns any software into an agent environment

video-understanding-benchmarksempirical-scaling-lawsagent-trajectory-optimizationtool-use-efficiencygemma-4-ecosystemvirtual-try-on-and-video-synthesis

Tuesday, April 7, 2026

Video-MME-v2 raises the bar for video understanding evaluation; Adam's Law reveals textual frequency scaling in LLMs; Gemma 4 family dominates model releases with MoE and any-to-any variants

on-device-edge-inferenceagentic-frameworksautonomous-multi-agent-evolutionrepresentation-steering-and-reasoning-introspectionopen-weight-distillation-scalingdeveloper-ai-augmentation

Monday, April 6, 2026

On-device AI inference surges with Google LiteRT-LM and AI Edge Gallery; CORAL and Steerable Visual Representations maintain strong momentum; Claude-distilled Qwen and Gemma-4 dominate model charts

autonomous-multi-agent-evolutionrepresentation-steeringllm-pre-decision-encodingopen-weight-distillation-scalingai-agent-developer-toolingsupply-chain-ai-forecasting

Sunday, April 5, 2026

CORAL's multi-agent evolution framework surges to 36 upvotes as autonomous AI-for-AI research gains momentum; Steerable Visual Representations hits 40 upvotes; Claude-distilled Qwen and Gemma-4 continue model chart domination

representation-steering-and-controlllm-reasoning-mechanismsmulti-agent-evolutionadversarial-robustness-3dvideo-understanding-and-editingopen-weight-distillation

Saturday, April 4, 2026

Steerable visual representations and LLM pre-decision biases challenge core assumptions; multi-agent evolution frameworks and adversarial 3D textures push agent capabilities and risks; Gemma-4 and Claude-distilled Qwen dominate trending models

agent-safety-and-securitybenchmark-and-evaluationefficient-depth-scalingllm-reasoning-robustnessmultimodal-visual-reasoningself-improvement-and-distillation

Friday, April 3, 2026

Agent safety and benchmark proliferation dominate the day; LLM reasoning robustness under context pressure emerges as a critical concern; distillation and efficient scaling techniques show surprising gains

medical-ai-datasetsterminal-agentspretraining-sciencecot-monitorabilitymultimodal-generationedge-models

Thursday, April 2, 2026

Medical AI gets its ImageNet moment with 1000+ dataset survey; Terminal-only agents challenge complex enterprise frameworks; Pretraining science matures with daVinci-LLM scaling laws

rl-reasoningagentic-aimultimodal-unificationspeculative-decodingqwen-ecosystem

Wednesday, April 1, 2026

FIPO advances RL reasoning with future-KL credit assignment; Agentic AI frameworks dominate GitHub and HuggingFace; Qwen 3.5 ecosystem explodes across model charts

trillion-scale-modelsagent-skill-learningcoding-agentsmedical-aidiffusion-transformersvideo-generationgithub-trending

Tuesday, March 31, 2026

Trillion-parameter scientific foundation model arrives; Agent skill distillation from trajectories gains traction; Coding agents get specialized models and organicity benchmarks

transformer-architectureautonomous-agentsai-safetyvideo-generationreasoning-distillationself-improvement

Monday, March 30, 2026

Attention Residuals rethink Transformers; LLM agents autonomously discover GPU kernels and RL algorithms; AI safety alarms as models fail without adversarial prompts