Agentic AI frameworks dominate GitHub trending with hermes-agent, Archon, and multica surging; financial foundation models and tokenizer-free TTS signal new frontier applications; Claude Code tooling meta-layer emerges as a distinct engineering discipline

SkillClaw: Let Skills Evolve Collectively with Agentic Evolver

High Relevance

Ziyu Ma, Shidong Yang, Yuxiang Ji, Xucong Wang, Yong Wang — Peking University, Microsoft Research

Proposes an agentic evolver that enables LLM agent skills to evolve collectively after deployment, preventing redundant rediscovery of workflows and failure patterns.

Key Findings

•
Agent skills remain static post-deployment, causing waste across users
•
Collective evolution via agentic evolver continuously improves shared skill libraries
•
Strong gains on multi-step complex task benchmarks

agentsskillsevolutionLLMcollective-learning

263 upvotes

ClawBench: Can AI Agents Complete Everyday Online Tasks?

High Relevance

Yuxuan Zhang, Yubo Wang, Yipeng Zhu, Penghui Du, Junwen Miao — Tsinghua University

Evaluation framework of 153 everyday online tasks revealing that AI agents struggle with routine digital life automation despite excelling at specialized coding and research tasks.

Key Findings

•
153-task benchmark covering booking, shopping, form-filling, and other routine web interactions
•
Significant performance gap between specialized tasks and everyday digital automation
•
Current frontier models achieve <40% success rate on many everyday tasks

benchmarkagentsweb-agentsevaluationeveryday-tasks

244 upvotes

HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents

High Relevance

Tencent Robotics X, HY Vision Team, Xumin Yu, Zuyan Liu, Ziyi Wang — Tencent

Foundation models for real-world embodied agents bridging general VLMs with embodied intelligence requirements for robot manipulation and navigation.

Key Findings

•
Bridges gap between general VLMs and embodied agent demands
•
MoT architecture optimized for embodied reasoning tasks
•
Strong transfer from vision-language pretraining to physical-world interaction

embodied-AIroboticsVLMfoundation-modelTencent

156 upvotes

When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models

Zhengyang Sun, Yu Chen, Xin Zhou, Xiaofan Li, Xiwu Chen — Zhejiang University

Training-free NUMINA framework fixes numerical misalignment in text-to-video diffusion by identifying prompt-layout inconsistencies and guiding the denoising process.

Key Findings

•
Identify-then-guide approach for correct object count generation
•
Training-free — works with existing diffusion models without fine-tuning
•
Significant count accuracy improvement without quality degradation

video-generationdiffusionnumerical-alignmenttext-to-video

109 upvotes

MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping

Junyao Gao, Sibo Liu, Jiaxing Li, Yanan Sun, Yuanpeng Tu — Tencent

Scalable data curation pipeline for constructing intra-style consistent, inter-style diverse style datasets using text-to-image generative model consistency.

Key Findings

•
Automated pipeline produces large-scale style-consistent datasets
•
Leverages T2I model style mapping for consistency guarantees
•
Enables downstream style transfer and artistic generation improvements

style-transferdatasettext-to-imagedata-curation

92 upvotes

OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks

High Relevance

Wenbo Hu, Xin Chen, Yan Gao-Tian, Yihe Deng, Nanyun Peng — UCLA NLP

Extends GRPO reinforcement learning to open-source multimodal generalist models, enabling multi-domain visual reasoning with improved data diversity strategies.

Key Findings

•
Successfully applies GRPO to open-source multimodal models
•
Multi-domain coverage overcomes prior domain-limited RL approaches
•
Competitive with proprietary systems on visual reasoning benchmarks

multimodalreasoningGRPOreinforcement-learningvisual-reasoning

44 upvotes

DMax: Aggressive Parallel Decoding for dLLMs

High Relevance

Zigeng Chen, Gongfan Fang, Xinyin Ma, Ruonan Yu, Xinchao Wang — National University of Singapore

New paradigm for efficient diffusion language models enabling aggressive parallelism via soft-transition decoding that mitigates error accumulation.

Key Findings

•
Soft-transition decoding avoids error accumulation in parallel dLLM inference
•
Enables 3-5x speedup over sequential decoding without quality loss
•
Generalizable approach applicable to various diffusion LLM architectures

diffusion-LLMparallel-decodinginference-efficiencydLLM

43 upvotes

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

High Relevance

Chenyu Zhou, Huacan Chai, Wenteng Chen, Zihan Guo, Rong Shan — Fudan University

Unified review of how LLM agent capabilities are externalized into runtime components — memory, skills, protocols, and harness infrastructure — rather than embedded in weights.

Key Findings

•
Agent capability externalization is the dominant architectural trend
•
Unified taxonomy spanning memory, skills, protocols, and harness engineering
•
Runtime-centric design enables composability and production observability

agentsmemoryskillsprotocolssurvey

41 upvotes

MolmoWeb: Open Visual Web Agent and Open Data for the Open Web

High Relevance

Tanmay Gupta, Piper Wolters, Zixian Ma, Peter Sushko, Rock Yuren Pang — Allen Institute for AI (AI2)

Open-source visual web agent with released training data and recipes, demonstrating that open models can match proprietary systems on web navigation tasks.

Key Findings

•
Open-source web agent competitive with proprietary systems
•
Full release of training data, weights, and recipes for reproducibility
•
Demonstrates viability of open alternatives for autonomous web automation

web-agentsopen-sourcevisual-agentAI2

35 upvotes

Trending Models (11)

GLM-5.1

Zhipu AI (zai-org) · text-generation · MoE

Zhipu AI's latest MoE text generation model with strong multilingual capabilities in English and Chinese, released as open-weight under MIT license.

MoEtext-generationmultilingualopen-weight

28.8K downloads1.1K likes

Gemma 4 31B IT

Google · image-text-to-text · 31B

Google's 31B parameter instruction-tuned Gemma 4 model with image-text-to-text capabilities and strong benchmark performance across reasoning and multimodal tasks.

multimodalinstruction-tunedGoogleGemma

2.2M downloads1.8K likes

VoxCPM2

OpenBMB (Tsinghua University) · text-to-speech · N/A

Tokenizer-free multilingual TTS system supporting 30+ languages with voice cloning and voice design capabilities, challenging the dominant codec-based speech synthesis paradigm.

TTSmultilingualvoice-cloningtokenizer-free

7.5K downloads760 likes

MiniMax-M2.7

MiniMax AI · text-generation · N/A

MiniMax's latest large language model with custom architecture, demonstrating strong text generation capabilities with efficient inference.

text-generationcustom-architectureMiniMax

873 downloads535 likes

VOID Model

Netflix · video-to-video · N/A

Netflix's video inpainting and object removal model built on CogVideoX diffusion architecture, enabling seamless video editing and content removal.

video-inpaintingobject-removaldiffusionNetflix

0 downloads775 likes

OmniVoice

k2-fsa · text-to-speech · N/A

Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

Zero-shot multilingual voice cloning and text-to-speech model supporting hundreds of languages with voice design capabilities.

TTSzero-shotmultilingualvoice-cloning

394.0K downloads527 likes

Jackrong (Community) · text-generation · 27B

Community-distilled 27B model transferring Claude 4.6 Opus reasoning capabilities into Qwen3.5 architecture using Unsloth, among the most popular reasoning distillations on HuggingFace.

reasoning-distillationqwen3.5claude-opusunsloth

578.3K downloads2.6K likes

Gemma 4 E4B IT

Google · any-to-any · 4B

Google's efficient 4B parameter Gemma 4 model with any-to-any modality capabilities, designed for edge deployment and resource-constrained environments.

multimodalefficientedgeGoogle

1.3M downloads612 likes

HY-Embodied-0.5

Tencent · image-text-to-text · 2B

Tencent's embodied foundation model for real-world robotics agents, combining vision-language understanding with MoT architecture for embodied intelligence tasks.

embodied-AIroboticsVLMTencent

582 downloads136 likes

Bonsai-8B

Prism ML · text-generation · 8B (1-bit)

1-bit quantized 8B parameter model optimized for on-device inference via llama.cpp, demonstrating that extreme quantization can preserve usable text generation quality.

1-bitquantizationon-devicellama.cpp

74.4K downloads567 likes

Qianfan-OCR

Baidu · image-text-to-text · N/A

NousResearch/hermes-agent

Baidu's specialized OCR and document intelligence model built on InternVL architecture, optimized for multilingual document understanding and extraction.

OCRdocument-intelligencemultilingualBaidu

44.8K downloads1.1K likes

Trending GitHub Repos (15)

High RelevanceGitHub

A growing agentic framework from NousResearch designed to build persistent, evolving AI agents. Surged dramatically today, reflecting intense developer interest in production-ready agent infrastructure built on top of Hermes model weights.

agentsllmagentic-aihermesopen-source

Python56.8K+6.4K today7.5K

microsoft/markitdown

High RelevanceGitHub

Microsoft's Python tool for converting files and office documents (Word, Excel, PowerPoint, PDF, HTML) to Markdown format. Essential infrastructure for RAG pipelines and document-grounded LLM applications.

document-parsingragmarkdownllm-toolsmicrosoft

Python101.3K+3.1K today6.2K

multica-ai/multica

High RelevanceGitHub

An open-source managed agents platform that treats coding agents as persistent teammates—supporting task assignment, progress tracking, and skill compounding. Trending strongly as teams seek to productionize multi-agent workflows.

agentsmulti-agentagentic-platformcoding-agentsopen-source

TypeScript7.3K+1.9K today927

obra/superpowers

High RelevanceGitHub

An agentic skills framework and software development methodology. One of the largest repos trending today by absolute star count, this shell-based framework defines composable 'superpowers' for AI coding workflows.

agentic-aicoding-agentsmethodologydeveloper-tools

Shell146.7K+1.6K today12.6K

shanraisshan/claude-code-best-practice

High RelevanceGitHub

A curated collection of best practices for working with Claude Code, including prompt patterns, workflow optimizations, and behavioral guidelines. Reflects a grassroots meta-tooling movement around Anthropic's coding agent.

claude-codeprompt-engineeringllm-toolscoding-agentsanthropic

HTML36.7K+1.5K today3.4K

coleam00/Archon

High RelevanceGitHub

The first open-source harness builder for AI coding, designed to make AI coding agent behavior deterministic and repeatable. Trending strongly as developers seek reliability guarantees from LLM-powered coding tools.

coding-agentsllm-toolsdeterministic-aideveloper-tools

TypeScript16.2K+1.3K today2.6K

forrestchang/andrej-karpathy-skills

High RelevanceGitHub

A single CLAUDE.md configuration file derived from Andrej Karpathy's public observations on LLM coding pitfalls, designed to improve Claude Code's default behavior. A lightweight but high-signal behavioral engineering artifact.

claude-codeprompt-engineeringkarpathyllm-behaviorcoding-agents

12.5K+1.1K today840

OpenBMB/VoxCPM

High RelevanceGitHub

VoxCPM2 is a tokenizer-free TTS system supporting multilingual speech generation, creative voice design, and true-to-life voice cloning. The tokenizer-free architecture avoids discrete audio token bottlenecks and shows strong quality gains.

ttsspeech-synthesismultilingualvoice-cloningtokenizer-free

Python9.5K+953 today1.1K

HKUDS/DeepTutor

opendataloader-project/opendataloader-pdf

An agent-native personalized learning assistant from Hong Kong University of Data Science, combining RAG, adaptive pedagogy, and agentic workflows for individualized education. Strong traction reflecting demand for AI in EdTech.

agentseducationragpersonalized-learningedtech

Python16.5K+836 today2.2K

High RelevanceGitHub

An open-source PDF parser designed for AI-ready data extraction, automating PDF accessibility for downstream LLM and RAG applications. Java-based with strong traction as document parsing infrastructure demand grows.

pdf-parsingragdata-extractionllm-toolsopen-source

Java15.3K+777 today1.3K

aloshdenny/reverse-SynthID

High RelevanceGitHub

A reverse engineering project targeting Google's SynthID AI watermarking detection system for Gemini outputs. High adversarial and safety research implications—challenges the robustness of AI content provenance infrastructure.

watermarkingadversarial-aisynthidgoogle-geminiai-safety

Python2.1K+682 today180

shiyu-coder/Kronos

High RelevanceGitHub

Kronos is a foundation model for the language of financial markets, designed to understand and generate financial market data, news, and signals. Represents the verticalization trend of large models for specialized domains.

financefoundation-modeltime-seriesfinancial-marketsdomain-specific-llm

Python13.4K+607 today2.7K

D4Vinci/Scrapling

K-Dense-AI/scientific-agent-skills

An adaptive web scraping framework capable of handling single requests through full-scale crawls. Relevant as a data acquisition layer for LLM training pipelines and agentic browsing tools.

web-scrapingdata-collectionllm-toolscrawling

Python36.1K+511 today3.1K

A curated set of ready-to-use agent skills for research, science, engineering, analysis, finance, and writing. Provides a skills library abstraction layer for building domain-expert AI agents.

agentsskills-libraryscientific-aidomain-specificagentic-ai

Python18.1K+158 today2.0K

agentscope-ai/agentscope