noos fri 22 may · · 01:02
may 2026
mtwtfss ····12345678910111213141516171819202122232425262728293031
links / week 106
mtwtfss
2 today · -1 vs avg
one paragraph note on what claude got wrong
noos.app — Other AI — 47 items · page 2/2
refresh filter archive

Other AI

47 items · page 2/2
friday · 15 may 1
product hunt — ai 6d

OpenIT

Discussion | Link

14 may 4
arxiv cs.AI 1w

Self-Distilled Agentic Reinforcement Learning

Reinforcement learning (RL) has emerged as a central paradigm for post-training LLM agents, yet its trajectory-level reward signal provides only coarse supervision for long-horizon interaction. On-Policy Self-Distillation (OPSD) complements RL by introducing dense token-level guidance from a teacher branch augmented with privileged context. However, transferring OPSD to multi-turn agents proves pr

arxiv cs.CL 1w

Case-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Use

Tool use extends large language models beyond parametric knowledge, but reliable execution requires balancing appropriate reasoning depth with strict structural validity. We approach this problem from a case-based perspective to present CAST, a case-driven framework that treats historical execution trajectories as structured cases. Instead of reusing raw exemplar outputs, CAST extracts case-derive

product hunt — ai 1w

Raindrop Workshop

Discussion | Link

product hunt — ai 1w

Notion Developer Platform

Discussion | Link

13 may 2
arxiv cs.CL 1w

WARDEN: Endangered Indigenous Language Transcription and Translation with 6 Hours of Training Data

This paper introduces WARDEN, an early language model system capable of transcribing and translating Wardaman, an endangered Australian indigenous language into English. The significant challenge we face is the lack of large-scale training data: in fact, we only have 6 hours of annotated audio. Therefore, while it is common practice to train a single model for transcription and translation using l

product hunt — ai 1w

PitchDrop.ai

Discussion | Link

12 may 4
arxiv cs.AI 1w

Learning, Fast and Slow: Towards LLMs That Adapt Continually

Large language models (LLMs) are trained for downstream tasks by updating their parameters (e.g., via RL). However, updating parameters forces them to absorb task-specific information, which can result in catastrophic forgetting and loss of plasticity. In contrast, in-context learning with fixed LLM parameters can cheaply and rapidly adapt to task-specific requirements (e.g., prompt optimization),

arxiv cs.AI 1w

Solve the Loop: Attractor Models for Language and Reasoning

Looped Transformers offer a promising alternative to purely feed-forward computation by iteratively refining latent representations, improving language modeling and reasoning. Yet recurrent architectures remain unstable to train, costly to optimize and deploy, and constrained to small, fixed recurrence depths. We introduce Attractor Models, in which a backbone module first proposes output embeddin

simon willison 1w

llm 0.32a2

Access large language models from the command-line

arxiv cs.CL 1w

TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection

We introduce TextSeal, a state-of-the-art watermark for large language models. Building on Gumbel-max sampling, TextSeal introduces dual-key generation to restore output diversity, along with entropy-weighted scoring and multi-region localization for improved detection. It supports serving optimizations such as speculative decoding and multi-token prediction, and does not add any inference overhea

11 may 6
r/MachineLearning 1w

I tested reasoning models on the problems where surface-level thinking fails — AIME, proof sketches, and "why does this code have a subtle off-by-one", [D]

I've been running a somewhat unusual benchmark suite. Not the standard automated ones — I've been feeding different reasoning models a collection of \~120 problems that I've personally verified require "deep reasoning" rather than pattern matching. The mix: \~40 AIME-style competition math, \~30 GPQA-level scientific reasoning, \~25 ARC-style abstract reasoning, and \~25 "real world" problems (sub

r/MachineLearning 1w

Interactive Jensen–Shannon Divergence Visualisation [P]

An interactive visualisation of Jensen–Shannon divergence - the symmetric, always-finite cousin of KL. Shape two distributions and watch JSD, its ceiling of one bit, and the per-point contribution respond in real time. https://robotchinwag.com/posts/jensen-shannon-divergence-visualisation/ Feedback welcome.

github trending 1w

Lordog/dive-into-llms — 《动手学大模型Dive into LLMs》系列编程实践教程

《动手学大模型Dive into LLMs》系列编程实践教程. Contribute to Lordog/dive-into-llms development by creating an account on GitHub.

r/coolgithubprojects 1w

Open-sourced our MCP server for GPU workload execution looking for feedback

r/LocalLLaMA 1w

Markdown browser for LLMs

I built a markdown web renderer for AI agents. Instead of taking expensive screenshots and piping them through vision models, TextWeb renders web pages as markdown that LLMs can reason about natively. Full JavaScript execution, interactive elements annotated. It provides a CLI and an MCP server. You can find it here: [https://github.com/woheller69/textweb](https://github.com/woheller69/textweb)

r/MachineLearning 1w

Why is human LLM annotation so expensive? [D]

Scale AI and similar services charge a lot for annotation. MTurk is cheap but the quality is horrible for anything requiring real domain understanding. For small teams that need a few thousand labeled examples to calibrate their evals or fine tune a model, there seems to be no good middle ground. How is everyone handling this? Are you doing it manually or has anyone found something that actually

no item selected.
my computer inbox 47 saved about.txt