Show HN: A MitM proxy to see what your LLM tools are sending
I built this out of curiosity about what Claude Code was actually sending to the API. Turns out, watching your tokens tick up in real-time is oddly satisfying.
Sherlock sits between your LLM tools and the API, showing you every request with a live dashboard, and auto-saved copies of every prompt as markdown and json.
Show HN: Shelvy Books
Hey HN! I built a little side project I wanted to share.
Shelvy is a free, visual bookshelf app where you can organize books you're reading, want to read, or have finished. Sign in to save your own collection.
Not monetized, no ads, no tracking beyond basic auth. Just a fun weekend project that grew a bit.
Live: https://shelvybooks.com
Would love any feedback on the UX or feature ideas!
Show HN: Codex.nvim – Codex inside Neovim (no API key required)
Hi HN! I built codex.nvim, an IDE-style Neovim integration for Codex.
Highlights:
- Works with OpenAI Codex plans (no API key required)
- Fully integrated in Neovim (embedded terminal
workflow)
- Bottom-right status indicator shows busy/wait state
- Send selections or file tree context to Codex quickly
Repo:
https://github.com/ishiooon/codex.nvim
Why I built this:
I wanted to use Codex comfortably inside Neovim without
relying on the API.
Happy to hear feedback and ideas!
Show HN: Pinecone Explorer – Desktop GUI for the Pinecone vector database
https://github.com/stepandel/pinecone-explorer
Show HN: SHDL – A minimal hardware description language built from logic gates
Hi, everyone!
I built SHDL (Simple Hardware Description Language) as an experiment in stripping hardware description down to its absolute fundamentals.
In SHDL, there are no arithmetic operators, no implicit bit widths, and no high-level constructs. You build everything explicitly from logic gates and wires, and then compose larger components hierarchically. The goal is not synthesis or performance, but understanding: what digital systems actually look like when abstractions are removed.
SHDL is accompanied by PySHDL, a Python interface that lets you load circuits, poke inputs, step the simulation, and observe outputs. Under the hood, SHDL compiles circuits to C for fast execution, but the language itself remains intentionally small and transparent.
This is not meant to replace Verilog or VHDL. It’s aimed at: - learning digital logic from first principles - experimenting with HDL and language design - teaching or visualizing how complex hardware emerges from simple gates.
I would especially appreciate feedback on: - the language design choices - what feels unnecessarily restrictive vs. educationally valuable - whether this kind of “anti-abstraction” HDL is useful to you.
Repo: https://github.com/rafa-rrayes/SHDL
Python package: PySHDL on PyPI
To make this concrete, here are a few small working examples written in SHDL:
1. Full Adder
component FullAdder(A, B, Cin) -> (Sum, Cout) {
x1: XOR; a1: AND;
x2: XOR; a2: AND;
o1: OR;
connect {
A -> x1.A; B -> x1.B;
A -> a1.A; B -> a1.B;
x1.O -> x2.A; Cin -> x2.B;
x1.O -> a2.A; Cin -> a2.B;
a1.O -> o1.A; a2.O -> o1.B;
x2.O -> Sum; o1.O -> Cout;
}
}2. 16 bit register
# clk must be high for two cycles to store a value
component Register16(In[16], clk) -> (Out[16]) {
>i[16]{
a1{i}: AND;
a2{i}: AND;
not1{i}: NOT;
nor1{i}: NOR;
nor2{i}: NOR;
}
connect {
>i[16]{
# Capture on clk
In[{i}] -> a1{i}.A;
In[{i}] -> not1{i}.A;
not1{i}.O -> a2{i}.A;
clk -> a1{i}.B;
clk -> a2{i}.B;
a1{i}.O -> nor1{i}.A;
a2{i}.O -> nor2{i}.A;
nor1{i}.O -> nor2{i}.B;
nor2{i}.O -> nor1{i}.B;
nor2{i}.O -> Out[{i}];
}
}
}3. 16-bit Ripple-Carry Adder
use fullAdder::{FullAdder};
component Adder16(A[16], B[16], Cin) -> (Sum[16], Cout) {
>i[16]{ fa{i}: FullAdder; }
connect {
A[1] -> fa1.A;
B[1] -> fa1.B;
Cin -> fa1.Cin;
fa1.Sum -> Sum[1];
>i[2,16]{
A[{i}] -> fa{i}.A;
B[{i}] -> fa{i}.B;
fa{i-1}.Cout -> fa{i}.Cin;
fa{i}.Sum -> Sum[{i}];
}
fa16.Cout -> Cout;
}
}
Show HN: Dwm.tmux – a dwm-inspired window manager for tmux
Hey, HN! With all recent agentic workflows being primarily terminal- and tmux-based, I wanted to share a little project I created about decade ago.
I've continued to use this as my primary terminal "window manager" and wanted to share in case others might find it useful.
I would love to hear about other's terminal-based workflows and any other tools you may use with similar functionality.
Show HN: The HN Arcade
I love seeing all the small games that people build and post to this site.
I don't want to forget any, so I have built a directory/arcade for the games here that I maintain.
Feel free to check it out, add your game if its missing and let me know what you think. Thanks!
Show HN: Cursor for Userscripts
I’ve been experimenting with embedding an Claude Code/Cursor-style coding agent directly into the browser.
At a high level, the agent generates and maintains userscripts and CSS that are re-applied on page load. Rather than just editing DOM via JS in console the agent is treating the page, and the DOM as a file.
The models are often trained in RL sandboxes with full access to the filesystem and bash, so they are really good at using it. So to make the agent behave well, I've simulated this environment.
The whole state of a page and scripts is implemented as a virtual filesystem hacked on top of browser.local storage. URL is mapped to directories, and the agent starts inside this directory. It has the tools to read/edit files, grep around and a fake bash command that is just used for running scripts and executing JS code.
I've tested only with Opus 4.5 so far, and it works pretty reliably. The state of the file system can be synced to the real filesystem, although because Firefox doesn't support Filesystem API, you need to manually import the fs contents first.
This agent is really useful for extracting things to CSV, but it's also can be used for fun.
Demo: https://x.com/ichebykin/status/2015686974439608607
Show HN: I built a small browser engine from scratch in C++
Hi HN! Korean high school senior here, about to start CS in college.
I built a browser engine from scratch in C++ to understand how browsers work. First time using C++, 8 weeks of development, lots of debugging—but it works!
Features:
- HTML parsing with error correction
- CSS cascade and inheritance
- Block/inline layout engine
- Async image loading + caching
- Link navigation + history
Hardest parts:
- String parsing(html, css)
- Rendering
- Image Caching & Layout Reflowing
What I learned (beyond code):
- Systematic debugging is crucial
- Ship with known bugs rather than chase perfection
- The Power of "Why?"
~3,000 lines of C++17/Qt6. Would love feedback on code architecture and C++ best practices!
GitHub: https://github.com/beginner-jhj/mini_browser
Show HN: Cua-Bench – a benchmark for AI agents in GUI environments
Hey HN, we're excited to share Cua-Bench ( https://github.com/trycua/cua ), an open-source framework for evaluating and training computer-use agents across different environments.
Computer-use agents show massive performance variance across different UIs—an agent with 90% success on Windows 11 might drop to 9% on Windows XP for the same task. The problem is OS themes, browser versions, and UI variations that existing benchmarks don't capture.
The existing benchmarks (OSWorld, Windows Agent Arena, AndroidWorld) were great but operated in silos—different harnesses, different formats, no standardized way to test the same agent across platforms. More importantly, they were evaluation-only. We needed environments that could generate training data and run RL loops, not just measure performance. Cua-Bench takes a different approach: it's a unified framework that standardizes environments across platforms and supports the full agent development lifecycle—benchmark, train, deploy.
With Cua-Bench, you can:
- Evaluate agents across multiple benchmarks with one CLI (native tasks + OSWorld + Windows Agent Arena adapters)
- Test the same agent on different OS variations (Windows 11/XP/Vista, macOS themes, Linux, Android via QEMU)
- Generate new tasks from natural language prompts
- Create simulated environments for RL training (shell apps like Spotify, Slack with programmatic rewards)
- Run oracle validations to verify environments before agent evaluation
- Monitor agent runs in real-time with traces and screenshots
All of this works on macOS, Linux, Windows, and Android, and is self-hostable.
To get started:
Install cua-bench:
% pip install cua-bench
Run a basic evaluation:
% cb run dataset datasets/cua-bench-basic --agent demo
Open the monitoring dashboard:
% cb run watch <run_id>
For parallelized evaluations across multiple workers:
% cb run dataset datasets/cua-bench-basic --agent your-agent --max-parallel 8
Want to test across different OS variations? Just specify the environment:
% cb run task slack_message --agent your-agent --env windows_xp
% cb run task slack_message --agent your-agent --env macos_sonoma
Generate new tasks from prompts:
% cb task generate "book a flight on kayak.com"
Validate environments with oracle implementations:
% cb run dataset datasets/cua-bench-basic --oracle
The simulated environments are particularly useful for RL training—they're HTML/JS apps that render across 10+ OS themes with programmatic reward verification. No need to spin up actual VMs for training loops.
We're seeing teams use Cua-Bench for:
- Training computer-use models on mobile and desktop environments
- Generating large-scale training datasets (working with labs on millions of screenshots across OS variations)
- RL fine-tuning with shell app simulators
- Systematic evaluation across OS themes and browser versions
- Building task registries (collaborating with Snorkel AI on task design and data curation, similar to their Terminal-Bench work)
Cua-Bench is 100% open-source under the MIT license. We're actively developing it as part of Cua (https://github.com/trycua/cua), our Computer Use Agent SDK, and we'd love your feedback, bug reports, or feature ideas.
GitHub: https://github.com/trycua/cua
Docs: https://cua.ai/docs/cuabench
Technical Report: https://cuabench.ai
We'll be here to answer any technical questions and look forward to your comments!
Show HN: Config manager for Claude Code (and others) – rules, MCPs, permissions
I use Claude Code across multiple projects with different conventions and some shared repos just as it so happens to be the real world. Managing the config files (.claude/rules/, mcps.json, settings.json) by hand got tedious, so I built a local web UI for it.
This one started out as claude-config but migrated to coder-config as I'm adding others (Gemini, AG, Codex, etc).
Main features: - Visual editor for rules, permissions, and MCP servers - Project registry to switch between codebases - "Workstreams" to group related repos (frontend + API + shared libs) with shared context - Auto-load workstreams on cd to included folders - Also supports Gemini CLI and Codex CLI
Install: npm install -g coder-config coder-config ui # UI at http://localhost:3333 coder-config ui install # optionally, autostart on MacOS
It can also be installed as a PWA and live in your taskbar.
Open source, runs locally, no account needed. Feedback and contributions welcome!
Sorry, haven't had any chance to test on other OSes (linux/windows)
Show HN: Build Web Automations via Demonstration
Hey HN,
We’ve been building browser agents for a while. In production, we kept converging on the same pattern: deterministic scripts for the happy path, agents only for edge cases. So we built Demonstrate Mode.
The idea is simple: You perform your workflow once in a remote browser. Notte records the interactions and generates deterministic automation code.
How it works: - Record clicks, inputs, navigations in a cloud browser - Compile them into deterministic code (no LLM at runtime) - Run and deploy on managed browser infrastructure
Closest analog is Playwright codegen but: - Infrastructure is handled (remote browsers, proxies, auth state) - Code runs in a deployable runtime with logs, retries, and optional agent fallback
Agents are great for prototyping and dynamic steps, but for production we usually want versioned code and predictable cost/behavior. Happy to dive into implementation details in the comments.
Demo: https://www.loom.com/share/f83cb83ecd5e48188dd9741724cde49a
-- Andrea & Lucas, Notte Founders
Show HN: LemonSlice – Upgrade your voice agents to real-time video
Hey HN, we're the co-founders of LemonSlice (try our HN playground here: https://lemonslice.com/hn). We train interactive avatar video models. Our API lets you upload a photo and immediately jump into a FaceTime-style call with that character. Here's a demo: https://www.loom.com/share/941577113141418e80d2834c83a5a0a9
Chatbots are everywhere and voice AI has taken off, but we believe video avatars will be the most common form factor for conversational AI. Most people would rather watch something than read it. The problem is that generating video in real-time is hard, and overcoming the uncanny valley is even harder.
We haven’t broken the uncanny valley yet. Nobody has. But we’re getting close and our photorealistic avatars are currently best-in-class (judge for yourself: https://lemonslice.com/try/taylor). Plus, we're the only avatar model that can do animals and heavily stylized cartoons. Try it: https://lemonslice.com/try/alien. Warning! Talking to this little guy may improve your mood.
Today we're releasing our new model* - Lemon Slice 2, a 20B-parameter diffusion transformer that generates infinite-length video at 20fps on a single GPU - and opening up our API.
How did we get a video diffusion model to run in real-time? There was no single trick, just a lot of them stacked together. The first big change was making our model causal. Standard video diffusion models are bidirectional (they look at frames both before and after the current one), which means you can't stream.
From there it was about fitting everything on one GPU. We switched from full to sliding window attention, which killed our memory bottleneck. We distilled from 40 denoising steps down to just a few - quality degraded less than we feared, especially after using GAN-based distillation (though tuning that adversarial loss to avoid mode collapse was its own adventure).
And the rest was inference work: modifying RoPE from complex to real (this one was cool!), precision tuning, fusing kernels, a special rolling KV cache, lots of other caching, and more. We kept shaving off milliseconds wherever we could and eventually got to real-time.
We set up a guest playground for HN so you can create and talk to characters without logging in: https://lemonslice.com/hn. For those who want to build with our API (we have a new LiveKit integration that we’re pumped about!), grab a coupon code in the HN playground for your first Pro month free ($100 value). See the docs: https://lemonslice.com/docs. Pricing is usage-based at $0.12-0.20/min for video generation.
Looking forward to your feedback!
EDIT: Tell us what characters you want to see in the comments and we can make them for you to talk to (e.g. Max Headroom)
*We did a Show HN last year for our V1 model: https://news.ycombinator.com/item?id=43785044. It was technically impressive but so bad compared to what we have today.
Show HN: Lendy – Keep track of books you have lended
The article discusses the concept of 'Lendy', a new type of lending platform that aims to provide a decentralized, trustless, and transparent approach to peer-to-peer lending. It explores the potential benefits and challenges of this blockchain-based lending system.
Show HN: WordRE, Wordle for Real Estate
Show HN: Extracting React apps from Figma Make's undocumented binary format
The article explores methods for reverse-engineering Figma design files, allowing users to extract and modify the underlying data, such as vector graphics, text elements, and layer structures, without directly accessing the Figma application.
Show HN: Sandbox Agent SDK – unified API for automating coding agents
We’ve been working with automating coding agents in sandboxes as of late. It’s bewildering how poorly standardized and difficult to use each agent varies between each other.
We open-sourced the Sandbox Agent SDK based on tools we built internally to solve 3 problems:
1. Universal agent API: interact with any coding agent using the same API
2. Running agents inside the sandbox: Agent Sandbox provides a Rust binary that serves the universal agent API over HTTP, instead of having to futz with undocumented interfaces
3. Universal session schema: persisting sessions is always problematic, since we don’t want the source of truth for the conversation to live inside the container in a schema we don’t control
Agent Sandbox SDK has:
- Any coding agent: Universal API to interact with all agents with full feature coverage
- Server or SDK mode: Run as an HTTP server or with the TypeScript SDK
- Universal session schema: Universal schema to store agent transcripts
- Supports your sandbox provider: Daytona, E2B, Vercel Sandboxes, and more
- Lightweight, portable Rust binary: Install anywhere with 1 curl command
- OpenAPI spec: Well documented and easy to integrate
We will be adding much more in the coming weeks – would love to hear any feedback or questions.
Show HN: One Human + One Agent = One Browser From Scratch in 20K LOC
Related: https://simonwillison.net/2026/Jan/27/one-human-one-agent-on...
Show HN: We Built the 1. EU-Sovereignty Audit for Websites
The article discusses an audit of the European Union's policies and institutions, highlighting the need for greater transparency, accountability, and efficiency in the EU's governance. It emphasizes the importance of addressing concerns about the EU's democratic legitimacy and decision-making processes.
Show HN: Drum machine VST made with React/C++
Hi HN! We just launched our drum machine vst this month! We will be updating it with many new synthesis models and unique features. Check it out, join our discord and show us what you made!
Show HN: Fuzzy Studio – Apply live effects to videos/camera
Back story:
I've been learning computer graphics on the side for several years now and gain so much joy from smooshing and stretching images/videos. I hope you can get a little joy as well with Fuzzy Studio!
Try applying effects to your camera! My housemates and I have giggled so much making faces with weird effects!
Nothing gets sent to the server; everything is done in the browser! Amazing what we can do. I've only tested on macOS... apologies if your browser/OS is not supported (yet).
Show HN: I wrapped the Zorks with an LLM
I grew up on the Infocom games and when microsoft actually open-sourced Zork 1/2/3 I really wanted to figure out how to use LLMs to let you type whatever you want, I always found the amount language that the games "understood" to be so limiting - even if it was pretty state of the art at the time.
So I figured out how to wrap it with Tambo.. (and run the game engine in the browser) basically whatever you type gets "translated" into zork-speak and passed to the game - and then the LLM takes the game's output and optionally adds flavor. (the little ">_" button at the top exposes the actual game input)
What was a big surprise to me is multi-turn instructions - you can ask it to "Explore all the rooms in the house until you can't find any more" and it will plug away at the game for 10+ "turns" at a time... like Claude Code for Zork or something
Show HN: I'm building an AI-proof writing tool. How would you defeat it?
Show HN: Record and share your coding sessions with CodeMic
You can record and share coding sessions directly inside your editor.
Think Asciinema, but for full coding sessions with audio, video, and images.
While replaying a session, you can pause at any point, explore the code in your own editor, modify it, and even run it. This makes following tutorials and understanding real codebases much more practical than watching a video.
Local first, and open source.
p.s. I’ve been working on this for a little over two years* and would appreciate any feedback.
* Previously: CodeMic: A new way to talk about code - https://news.ycombinator.com/item?id=42485088 - Dec 2024 (58 comments)
Show HN: Frame – Managing projects, tasks, and context for Claude Code
I built Frame to better manage the projects I develop with Claude Code, to bring a standard to my Claude Code projects, to improve project and task planning, and to reduce context and memory loss. In its current state, Frame works entirely locally. You don’t need to enter any API keys or anything like that. You can run Claude Code directly using the terminal inside Frame.
Why am I not using existing IDEs? Simply because, for me, I no longer need them. What I need is an interface centered around the terminal, not a code editor. I initially built something that allowed me to place terminals in a grid layout, but then I decided to take it further. I realized I also needed to manage my projects and preserve context.
I’m still at a very early stage, but even being able to build the initial pieces I had in mind within 5–6 days—using Claude Code itself—feels kind of crazy.
What can you do with Frame?
You can start a brand-new project or turn an existing one into a Frame project. For this, Frame creates a set of Markdown and JSON files with rules I defined. These files exist mainly to manage tasks and preserve context.
You can manually add project-related tasks through the UI. I haven’t had the chance to test very complex or long-running scenarios yet, but from what I’ve seen, Claude Code often asks questions like: “Should I add this as a task to tasks.json?” or “Should we update project_notes.md after this project decision?” I recommend saying yes to these.
I also created a JSON file that keeps track of the project structure, down to function-level details. This part is still very raw. In the future, I plan to experiment with different data structures to help AI understand the project more quickly and effectively.
As mentioned, you can open your terminals in either a grid or tab view. I added options up to a 3×3 grid. Since the project is open source, you can modify it based on your own needs.
I also added a panel where you can view and manage plugins.
For code files or other files, I included a very simple editor. This part is intentionally minimal and quite basic for now.
Based on my own testing, I haven’t encountered any major bugs, but there might be some. I apologize in advance if you run into any issues.
My core goal is to establish a standard for AI-assisted projects and make them easier to manage. I’m very open to your ideas, support, and feedback. You can see more details on GitHub : https://github.com/kaanozhan/Frame
Show HN: Eightile, A Nested Anagram Solver Game
Eightile is a website that provides detailed information and resources on various topics, including technology, business, and lifestyle. The site aims to offer a comprehensive and informative platform for readers to stay up-to-date on the latest trends and developments.
Show HN: Moltbook – A social network for moltbots (clawdbots) to hang out
Hey everyone!
Just made this over the past few days.
Moltbots can sign up and interact via CLI, no direct human interactions.
Just for fun to see what they all talk about :)
Show HN: Code Puppy
Saw online, been using for a while
Show HN: TiniText – Small tools for transcription, summaries and drafts
Hi HN,
This is my first time sharing something like this here. I built TiniText to solve a problem I kept running into myself: I needed simple tools for transcription, summaries, and drafts, but most options felt heavier than necessary for everyday use.
Instead of building a large platform, I focused on small, single-purpose tools with predictable behavior and minimal setup.
The app supports: - Audio transcription (single and multi-speaker) - Text summarization with adjustable detail - Simple draft generation (blogs / meeting notes) - A few practical text utilities
It’s production-ready but still early, and I’d really appreciate feedback on: - where the UX feels unnecessary - what’s missing for real-world use - whether “small focused tools” is still a compelling direction
Happy to answer questions.
Show HN: Spar – Built a tool to help improve store conversion rates
Last year my co-founder and I were talking about ecommerce store owners hitting small conversion issues that could easily be fixed—but nobody had the time or expertise to actually identify what was broken and validate fixes.
So we built Spar to handle that loop automatically. It analyzes any ecommerce store (Shopify, WooCommerce, BigCommerce, headless, doesn't matter) by crawling it like a customer would. It finds conversion gaps you don't know about, prioritized by impact, and gives you specific A/B test hypotheses for each issue instead of just generic best practices.
It works with any publicly accessible store and gives you results in minutes. It identifies issues across your pages (we're getting cart and checkout completed soon).
The idea generation is tailored per store. Free to sign up. Let me know if you want access to more of the gaps.