Video-MME-v2 raises the bar for video understanding evaluation; Adam's Law reveals textual frequency scaling in LLMs; Gemma 4 family dominates model releases with MoE and any-to-any variants

Adam's Law: Textual Frequency Law on LLMs

High Relevance

Hongyuan Adam Lu, et al. — University of Waterloo

Uncovers a power-law relationship between textual token frequency in training data and LLM behavioral patterns, demonstrating that model biases and failure modes are predictable from data statistics alone.

Key Findings

•
LLM behavior follows a lawful power-law relationship with token frequency in training corpora
•
Frequency-dependent biases can be predicted from training data statistics without running the model
•
The law holds across multiple model families and scales, suggesting a universal phenomenon

scaling-lawsLLM-behaviortraining-datafrequency-analysisempirical-laws

48 upvotes

Learning to Retrieve from Agent Trajectories

High Relevance

Yuqi Zhou, et al. — University of Michigan

Proposes a learned retrieval method for extracting useful information from agent execution histories, enabling more efficient reuse of past experience for decision-making in LLM agent systems.

Key Findings

•
Learned retrieval over agent trajectories significantly outperforms heuristic selection methods
•
Past execution traces contain reusable knowledge that transfers across similar tasks
•
The approach reduces redundant exploration and improves task completion rates in multi-step agent tasks

agent-trajectoriesretrievalLLM-agentsexperience-reuseefficiency

18 upvotes

Beyond Accuracy: Inefficiency Patterns in Tool-Integrated Reasoning

High Relevance

Qisheng Su, et al. — Tsinghua University

Systematically catalogs inefficiency patterns in how LLMs use external tools for reasoning, identifying wasteful tool calls, unproductive loops, and suboptimal tool selection sequences.

Key Findings

•
LLMs exhibit systematic inefficiency patterns including redundant tool calls and unproductive retry loops
•
Tool selection order significantly impacts reasoning efficiency even when final accuracy is similar
•
Efficiency metrics reveal capability gaps invisible to accuracy-only evaluation

tool-useLLM-reasoningefficiencyfailure-analysisagent-evaluation

18 upvotes

Vanast: Virtual Try-On with Human Image Animation

Hyunsoo Cha, et al. — KAIST

Combines virtual try-on with human image animation to produce realistic clothing visualization on moving subjects, bridging the gap between static garment transfer and dynamic video generation.

Key Findings

•
Joint modeling of garment transfer and body animation produces more coherent try-on videos than sequential approaches
•
Temporal consistency in generated animations significantly improves perceived realism
•
The approach generalizes across diverse body types and clothing categories

virtual-try-onvideo-generationhuman-animationgenerative-AIe-commerce

10 upvotes

ONE-SHOT: Compositional Human-Environment Video Synthesis

Fengyuan Yang, et al. — University of California, San Diego

Enables compositional video synthesis combining human subjects with diverse environments in a single-shot framework, addressing the challenge of generating coherent human-scene interactions.

Key Findings

•
Single-shot composition produces realistic human-environment interactions without multi-stage pipelines
•
Environment-aware human motion generation improves physical plausibility of synthesized videos
•
Approach handles diverse environments including indoor, outdoor, and complex scene layouts

video-synthesiscompositional-generationhuman-scene-interactiondiffusion-modelsgenerative-AI

8 upvotes

Synthetic Sandbox for Training MLE Agents

Yuhang Zhou, et al. — National University of Singapore

Constructs synthetic machine learning engineering environments for training and evaluating autonomous ML agents, providing controlled sandboxes that test end-to-end ML workflow capabilities.

Key Findings

•
Synthetic ML engineering tasks provide a controllable evaluation environment for MLE agents
•
Agent performance varies dramatically across ML workflow stages from data preprocessing to deployment
•
Sandbox environments enable safe iteration on agent capabilities without real infrastructure costs

MLE-agentssynthetic-environmentsagent-trainingmachine-learning-engineeringsandbox

5 upvotes

Mimic Intent, Not Just Trajectories

Renming Huang, et al. — Chinese University of Hong Kong

Argues that imitation learning for agents should focus on replicating the underlying intent behind demonstrations rather than surface-level trajectory matching, leading to more robust and generalizable policies.

Key Findings

•
Intent-level imitation produces policies that generalize better to novel situations than trajectory-level cloning
•
Disentangling intent from execution details reduces compounding errors in sequential decision-making
•
The approach is complementary to existing behavioral cloning methods and can be integrated as an auxiliary objective

imitation-learningintent-modelingLLM-agentsbehavioral-cloninggeneralization

5 upvotes

ACES: Leave-One-Out AUC Consistency for Code Generation

Hui Sun, et al. — Microsoft Research

Introduces a novel code generation evaluation metric based on leave-one-out AUC consistency, providing a more robust signal for model selection than pass@k metrics alone.

Key Findings

•
Leave-one-out AUC consistency captures code generation reliability that pass@k misses
•
The metric is more stable across random seeds and problem subsets than existing evaluation approaches
•
ACES enables better model ranking decisions for deployment in code generation pipelines

code-generationevaluation-metricsAUCreliabilitymodel-selection

4 upvotes

The Geometric Alignment Tax

Prashant C. Raju — Independent Researcher

Formalizes the cost of aligning LLMs in geometric terms, showing that alignment procedures distort the model's representation geometry in ways that reduce downstream capabilities on non-aligned tasks.

Key Findings

•
Alignment procedures create measurable geometric distortions in model representation spaces
•
The distortion magnitude correlates with capability degradation on tasks outside the alignment distribution
•
The geometric framework provides a principled way to quantify the alignment tax across model families

alignmentrepresentation-geometryalignment-taxLLM-capabilitiestheoretical

3 upvotes

Paper Espresso: From Paper Overload to Research Insight

Mingzhe Du, et al. — National University of Singapore

An automated research paper summarization and insight extraction tool addressing information overload in fast-moving AI/ML research, with multi-level summarization that preserves technical details.

Key Findings

•
Automated pipeline reduces time-to-insight for literature review by an order of magnitude
•
Multi-level summarization preserves key technical details that simple abstractive summaries lose
•
Open-source tool designed for integration into existing research workflows

research-toolssummarizationliterature-reviewproductivityopen-source

2 upvotes

BidirLM: From Text to Omnimodal Bidirectional Encoders

Nicolas Boizard, et al. — Meta AI

Extends bidirectional language modeling to omnimodal inputs, converting text-only bidirectional encoders into models that process text, images, and audio within a unified bidirectional framework.

Key Findings

•
Bidirectional encoding over multiple modalities improves cross-modal retrieval compared to causal architectures
•
Text-pretrained bidirectional encoders can be efficiently adapted to process visual and audio inputs
•
The approach maintains the embedding quality advantages of bidirectional models while adding multimodal capability

multimodalbidirectional-encodingomnimodalembeddingsrepresentation-learning

2 upvotes

Trending Models (11)

Gemma 4 31B-IT

Google · image-text-to-text · 31B

Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

Google's flagship Gemma 4 instruction-tuned model at 31B parameters, supporting image-text-to-text tasks. The largest dense variant in the Gemma 4 family, trending as the community benchmarks it against GPT-4o and Claude.

multimodalinstruction-tunedgemma-4

884.0K downloads1.4K likes

Jackrong · text-generation · 27B

Gemma-4-31B-JANG_4M-CRACK

Community-created reasoning model distilling Claude Opus 4.6 reasoning capabilities into a Qwen3.5-27B base, achieving strong performance on reasoning benchmarks through knowledge distillation from a frontier model.

reasoningdistillationcommunity

552.0K downloads2.5K likes

DealignAI · text-generation · 31B

Abliterated (uncensored) variant of Gemma 4 31B with safety guardrails removed, trending rapidly as the community explores the model's full unfiltered capabilities.

abliterateduncensoredgemma-4

29.0K downloads705 likes

Void Model

Netflix · video-inpainting · undisclosed

Netflix's video inpainting and object removal model, designed for seamless removal of unwanted objects from video footage. Notable as Netflix's first open model release.

video-inpaintingobject-removalproduction

0 downloads574 likes

Gemma 4 26B-A4B-IT

Google · text-generation · 26B (4B active)

Mixture-of-experts Gemma 4 variant with 26B total parameters but only 4B active per token, bringing MoE efficiency to consumer-accessible hardware.

MoEefficient-inferencegemma-4

660.0K downloads508 likes

Gemma 4 E4B-IT

Google · any-to-any · 4B

Any-to-any modality Gemma 4 model at 4B parameters, capable of processing and generating across text, image, and audio modalities in a single unified architecture.

any-to-anymultimodalgemma-4

474.0K downloads476 likes

Bonsai-8B-GGUF

Prism ML · text-generation · 8B (1-bit)

1-bit quantized 8B parameter language model pushing the limits of extreme quantization, demonstrating that binary weight models can achieve surprisingly coherent text generation.

1-bitquantizationefficient-inference

53.0K downloads506 likes

GLM-5.1

Zhipu AI · text-generation · undisclosed MoE

Latest generation of the GLM series as a mixture-of-experts text generation model, representing Zhipu AI's continued push to compete with Western frontier labs on open-weight models.

MoEtext-generationChinese-AI

389 downloads450 likes

Qianfan-OCR

Baidu · image-text-to-text · undisclosed

Baidu's dedicated OCR and vision-language model optimized for document understanding and text extraction, achieving strong results on multilingual document benchmarks.

OCRdocument-understandingvision-language

40.0K downloads1.1K likes

OmniVoice

k2-fsa · text-to-speech · undisclosed

Voice cloning and text-to-speech model with high-fidelity voice replication, trending for its ability to clone voices from short audio samples.

TTSvoice-cloningspeech-synthesis

105.0K downloads360 likes

Holo3-35B-A3B

Hcompany · multimodal · 35B (3B active)

NousResearch/hermes-agent

Multimodal mixture-of-experts model with 35B total parameters and 3B active, designed for efficient multimodal reasoning across text and vision tasks.

MoEmultimodalefficient-inference

1.8K downloads246 likes

Trending GitHub Repos (11)

High RelevanceGitHub

A modular, extensible agent framework that grows with the user, featuring plugin-based tool integration and persistent memory. Exploding in popularity with 3,009 stars in a single day.

LLM-agentsagent-frameworktool-integration

Python32.9K+3.0K today4.2K

abhigyanpatwari/GitNexus

Client-side knowledge graph for codebases that enables semantic search and visualization of code relationships, running entirely in the browser.

knowledge-graphcode-searchdeveloper-tools

TypeScript24.7K+1.2K today2.8K

google-ai-edge/gallery

High RelevanceGitHub

Google's showcase of on-device ML and generative AI models running via LiteRT, demonstrating practical deployment of AI on mobile and edge devices.

on-device-AIedge-inferencemobile-ML

Kotlin18.9K+897 today1.8K

tobi/qmd

TheCraigHewitt/seomachine

Minimal CLI search engine for local documentation that indexes and searches markdown and code files with instant results.

searchCLIdocumentation

TypeScript19.7K+859 today1.2K

NVIDIA/personaplex

High RelevanceGitHub

NVIDIA's framework for generating persona-diverse synthetic data, enabling creation of realistic and varied training datasets for LLM fine-tuning and evaluation.

synthetic-datapersona-generationdata-augmentation

Python8.0K+662 today1.2K

google-ai-edge/LiteRT-LM

High RelevanceGitHub

Google's C++ runtime for efficient on-device LLM inference, optimized for mobile and embedded deployment with minimal memory footprint.

on-device-inferenceLLM-runtimeedge-deployment

C++2.6K+528 today253

NVIDIA-NeMo/DataDesigner

High RelevanceGitHub

NVIDIA's tool for designing and generating high-quality synthetic training data pipelines, part of the NeMo ecosystem for large-scale model training.

synthetic-datadata-generationNeMo

Python1.5K+244 today132

SEO content generation tool powered by Claude, automating keyword research, content planning, and article generation for search engine optimization.

SEOcontent-generationClaude

Python4.0K+215 today668

HKUDS/DeepTutor