Show HN: Signet – Autonomous wildfire tracking from satellite and weather data
I built Signet in Go to see if an autonomous system could handle the wildfire monitoring loop that people currently run by hand - checking satellite feeds, pulling up weather, looking at terrain and fuels, deciding whether a detection is actually a fire worth tracking.
All the data already exists: NASA FIRMS thermal detections, GOES-19 imagery, NWS forecasts, LANDFIRE fuel models, USGS elevation, Census population data, OpenStreetMap. The problem is it arrives from different sources on different cadences in different formats.
Most of the system is deterministic plumbing - ingestion, spatial indexing, deduplication. I use Gemini to orchestrate 23 tools across weather, terrain, imagery, and incident tracking for the part where clean rules break down: deciding which weak detections are worth investigating, what context to pull next, and how to synthesize noisy evidence into a structured assessment.
It also records time-bounded predictions and scores them against later data, so the system is making falsifiable claims instead of narrating after the fact. The current prediction metrics are visible on the site even though the sample is still small.
It's already opening incidents from raw satellite detections and matching some to official NIFC reporting. But false positives, detection latency, and incident matching can still be rough.
I'd especially welcome criticism on: where should this be more deterministic instead of LLM-driven? And is this kind of autonomous monitoring actually useful, or just noisier than doing it by hand?
Show HN: What if your synthesizer was powered by APL (or a dumb K clone)?
I built k-synth as an experiment to see if a minimalist, K-inspired array language could make sketching waveforms faster and more intuitive than traditional code. I’ve put together a web-based toolkit so you can try the syntax directly in the browser without having to touch a compiler:
Live Toolkit: https://octetta.github.io/k-synth/
If you visit the page, here is a quick path to an audio payoff:
- Click "patches" and choose dm-bell.ks.
- Click "run"—the notebook area will update. Click the waveform to hear the result.
- Click the "->0" button below the waveform to copy it into slot 0 at the top (slots are also clickable).
- Click "pads" in the entry area to show a performance grid.
- Click "melodic" to play slot 0's sample at different intervals across the grid.
The 'Weird' Stack:
- The Language: A simplified, right-associative array language (e.g., s for sine, p for pi).
- The Web Toolkit: Built using WASM and Web Audio for live-coding samples.
- AI Pair-Programming: I used AI agents to bootstrap the parser and web boilerplate, which let me vet the language design in weeks rather than months.
The Goal: This isn't meant to replace a DAW. It’s a compact way to generate samples for larger projects. It’s currently in a "will-it-blend" state. I’m looking for feedback from the array language and DSP communities—specifically on the operator choices and the right-to-left evaluation logic.
Source (MIT): https://github.com/octetta/k-synth
Show HN: Han – A Korean programming language written in Rust
A few weeks ago I saw a post about someone converting an entire C++ codebase to Rust using AI in under two weeks.
That inspired me — if AI can rewrite a whole language stack that fast, I wanted to try building a programming language from scratch with AI assistance.
I've also been noticing growing global interest in Korean language and culture, and I wondered: what would a programming language look like if every keyword was in Hangul (the Korean writing system)?
Han is the result. It's a statically-typed language written in Rust with a full compiler pipeline (lexer → parser → AST → interpreter + LLVM IR codegen).
It supports arrays, structs with impl blocks, closures, pattern matching, try/catch, file I/O, module imports, a REPL, and a basic LSP server.
This is a side project, not a "you should use this instead of Python" pitch. Feedback on language design, compiler architecture, or the Korean keyword choices is very welcome.
https://github.com/xodn348/han
Show HN: RSS tool to remix feeds, build from webpages, and skip podcast reruns
It's been nice seeing some RSS projects pop up lately, so here's mine and the scope creep that led to it.
I wanted to read a couple feeds from their beginnings, something RSS doesn't particularly do and nobody particularly uses it for. But I've done the keep-a-tab-open-for-8-months thing more than once to work through an archive, and I don't fancy doing it again. As I worked through the steps and showed it to friends, we accumulated some other quality-of-life use cases that fit well enough into "just filter some XML" that they glommed on.
So, Sponder can:
- Run basic filtering on RSS feeds, either by keywords or regular expressions.
- Parse any webpage into an RSS feed, including autodetection of title/image/link/etc elements, following page links back through history, and coming back for new items later.
- Control the pace of that full historical feed to serve you an article per week, or 12 per day, whatever pace you like.
- Automatically detect and filter rerun episodes from a podcast feed.
- Be configured either by UI or typing some YAML.
It does not:
- Replace your RSS or podcast client, it's middleware that publishes a modified feed for you.
- Replace every one of your feeds, just the ones you wish were different. Though you can important and export OPMLs if you wish a lot were different.
- Run content through LLMs, though I'm considering it for rerun detection since metadata similarity only gets so far.
I'd love to hear from you fine folks:
- What bugs you about your feeds
- How configuring a flow goes
Show HN: Dialtone watcher – what is my laptop doing and am I normal
Hi HN we are Andrew and Dex. We built dialtone watcher, a small Go agent for macOS and Linux with a very specific goal: tell me what my machine is doing all day and help me compare that with others.
What it does so far:
- Watches running processes, CPU and memory use, and active network endpoints.
- Groups traffic into human sized summaries by process, domain, and coarse protocol like HTTPS, DNS, QUIC, and Postgres.
- Stores a local summary and can post bounded rollups to the dialtoneapp.com api so enough installs can turn the fleet view into something real.
We kept circling the same question: why is there no simple tool that answers "what does this machine actually spend its day doing?" Activity Monitor shows one slice. Little Snitch shows another. Fleet tools exist, but usually behind a corporate wall. We wanted something more honest and inspectable. The real motivating question was not just "what is my laptop doing?" but "am I normal?"
Say I have a MacBook Pro with 14 cores and 36 GB of memory and I run Docker all day. Why is Docker chewing so much more CPU and RAM on my machine than on similar developer machines? Why do I have some weird helper process that keeps hanging around? Why is my laptop talking to domains I do not recognize? You cannot answer those questions from one machine alone. You need a baseline from many machines with comparable hardware and comparable work.
https://dialtoneapp.com/demo
Open source MIT License: https://github.com/andrewarrow/dialtone-watcher
Andrew and I kept a history of our conversations in:
https://github.com/andrewarrow/dialtone-watcher/tree/main/pr...
The big idea is crowdsourced threat intelligence. Every installed agent becomes a sensor. Each one reports process to domain connections, DNS activity, connection frequency, bytes transferred, and basic IP context like ASN and country. On one machine that data is mildly interesting. Across thousands of machines it becomes powerful very fast.
Security companies like CrowdStrike and SentinelOne do exactly this. But those products are enterprise-only, expensive, and opaque.
If some unknown helper suddenly starts talking to the same odd domain on 27 machines in an hour, it's a pattern. If a so called PDF viewer is uploading 18 MB to a domain almost nobody has seen before, that starts to look like exfiltration. If a new VSCode release is the only build talking to some random domain, that starts to smell like a supply chain problem. If Slack or Docker suddenly behaves nothing like the baseline for similar developer machines, you can flag that too.
We think there is room for something more open, inspectable, and useful for normal developers. If you try this, feedback should focus on readability of the summary, correctness of process and domain attribution, whether the upload payload feels proportionate, and what comparisons would actually help you decide "am I normal?" If enough people install it, run it, and send data, the demo becomes real and the real product gets much smarter.
I'll leave you with the following question. Should modern software projects include a prompts directory like this? It takes so little effort to capture the prompts used and they tell a story like git history does.
Show HN: Code Royale – Play and learn poker with Claude Code (skill)
I built a Claude Code skill that turns your terminal into a poker table. You play No-Limit Texas Hold'em against three AI opponents, each running as a separate Claude subagent with its own personality and hidden cards. The main agent acts as dealer, manages the game state, and optionally coaches you.
The coaching side has three modes: no help at all, real-time hints before each decision, or post-hand analysis only.
Fair warning: Claude naturally takes every hand to the extreme. Expect more pocket aces and dramatic river cards than any real table would produce.
╭─────────────────────╮
│ POT: 130 │
│ Q♥ 9♦ 4♠ │
Alex │ │ Jordan
[990] ╰─────────────────────╯ [905]
(SB) (BB)
Fold Bet 50
You <- Sam
[965] [1000]
(BTN) (UTG)
┌────────┐ Fold
│ K♠ Q♠ │
└────────┘
Coach's whisper: You flopped top pair with a king kicker — a very strong hand here. Jordan was the pre-flop raiser and is c-betting into you, which is standard. Calling is solid to
keep the pot manageable. You could also raise to ~140 for value and protection, but calling in position and letting Jordan keep betting is a perfectly good line.
[F]old [C]all 50 [R]aise to ___
Something to do in your second terminal while Claude does your work in the first.Repo: https://github.com/BohdanPetryshyn/code-royale
Show HN: Ichinichi – One note per day, E2E encrypted, local-first
Look, every journaling app out there wants you to organize things into folders and tags and templates. I just wanted to write something down every day.
So I built this. One note per day. That's the whole deal.
- Can't edit yesterday. What's done is done. Keeps you from fussing over old entries instead of writing today's.
- Year view with dots showing which days you actually wrote. It's a streak chart. Works better than it should.
- No signup required. Opens right up, stores everything locally in your browser. Optional cloud sync if you want it
- E2E encrypted with AES-GCM, zero-knowledge, the whole nine yards.
Tech-wise: React, TypeScript, Vite, Zustand, IndexedDB. Supabase for optional sync. Deployed on Cloudflare. PWA-capable.
The name means "one day" in Japanese (いちにち).
The read-only past turned out to be the thing that actually made me stick with it. Can't waste time perfecting yesterday if yesterday won't let you in.
Live at https://ichinichi.app | Source: https://github.com/katspaugh/ichinichi
Show HN: AgentMailr – dedicated email inboxes for AI agents
I kept running into the same problem while building AI agents: every agent that needs email ends up sharing my personal inbox or a single company domain. That breaks attribution, creates deliverability risk, and makes it impossible to test sender identities per agent.
So I built AgentMailr. You call an API to create an inbox, your agent gets a unique email address, and replies route back to that specific agent. Works for both inbound (OTP parsing, reply routing) and outbound (cold email, notifications).
Bring your own domain is supported so emails come from your domain, not ours. REST API and MCP server are live. Node/Python SDKs are in progress.
Happy to answer questions about the architecture or how I'm handling multi-agent routing.
Show HN: GitAgent – An open standard that turns any Git repo into an AI agent
We built GitAgent because we kept seeing the same problem: every agent framework defines agents differently, and switching frameworks means rewriting everything.
GitAgent is a spec that defines an AI agent as files in a git repo.
Three core files — agent.yaml (config), SOUL.md (personality/instructions), and SKILL.md (capabilities) — and you get a portable agent definition that exports to Claude Code, OpenAI Agents SDK, CrewAI, Google ADK, LangChain, and others.
What you get for free by being git-native:
1. Version control for agent behavior (roll back a bad prompt like you'd revert a bad commit) 2. Branching for environment promotion (dev → staging → main) 3. Human-in-the-loop via PRs (agent learns a skill → opens a branch → human reviews before merge) 4. Audit trail via git blame and git diff 5. Agent forking and remixing (fork a public agent, customize it, PR improvements back) 6. CI/CD with GitAgent validate in GitHub Actions
The CLI lets you run any agent repo directly:
npx @open-gitagent/gitagent run -r https://github.com/user/agent -a claude
The compliance layer is optional, but there if you need it — risk tiers, regulatory mappings (FINRA, SEC, SR 11-7), and audit reports via GitAgent audit.
Spec is at https://gitagent.sh, code is on GitHub.
Would love feedback on the schema design and what adapters people would want next.
Show HN: GrobPaint: Somewhere Between MS Paint and Paint.net
GrobPaint is an open-source pixel art software that provides a simple and intuitive interface for creating and editing pixel art. The project aims to be a lightweight, user-friendly alternative to more complex image editing tools, focusing on the core features needed for pixel art creation.
Show HN: Voice-tracked teleprompter using on-device ASR in the browser
I built a teleprompter that scrolls based on your voice instead of a timer.
Paste a script, press record, and it highlights the current word as you speak. If you pause it waits; if you skip lines it finds its place again.
Everything runs entirely in the browser — speech recognition (Moonshine ONNX), VAD, and fuzzy script matching.
Demo: https://larsbaunwall.github.io/promptme-ai
Most of the project was initially built using Perplexity Computer, which made for an interesting agentic coding workflow.
Curious what people think about the script alignment approach.
Show HN: Channel Surfer – Watch YouTube like it’s cable TV
I know, it's a very first-world problem. But in my house, we have a hard time deciding what to watch. Too many options!
So I made this to recreate Cable TV for YouTube. I made it so it runs in the browser. Quickly import your subscriptions in the browser via a bookmarklet. No accounts, no sign-ins. Just quickly import your data locally.
Show HN: Context Gateway – Compress agent context before it hits the LLM
We built an open-source proxy that sits between coding agents (Claude Code, OpenClaw, etc.) and the LLM, compressing tool outputs before they enter the context window.
Demo: https://www.youtube.com/watch?v=-vFZ6MPrwjw#t=9s.
Motivation: Agents are terrible at managing context. A single file read or grep can dump thousands of tokens into the window, most of it noise. This isn't just expensive — it actively degrades quality. Long-context benchmarks consistently show steep accuracy drops as context grows (OpenAI's GPT-5.4 eval goes from 97.2% at 32k to 36.6% at 1M https://openai.com/index/introducing-gpt-5-4/).
Our solution uses small language models (SLMs): we look at model internals and train classifiers to detect which parts of the context carry the most signal. When a tool returns output, we compress it conditioned on the intent of the tool call—so if the agent called grep looking for error handling patterns, the SLM keeps the relevant matches and strips the rest.
If the model later needs something we removed, it calls expand() to fetch the original output. We also do background compaction at 85% window capacity and lazy-load tool descriptions so the model only sees tools relevant to the current step.
The proxy also gives you spending caps, a dashboard for tracking running and past sessions, and Slack pings when an agent is sitting there waiting on you.
Repo is here: https://github.com/Compresr-ai/Context-Gateway. You can try it with:
curl -fsSL https://compresr.ai/api/install | sh
Happy to go deep on any of it: the compression model, how the lazy tool loading works, or anything else about the gateway. Try it out and let us know how you like it!
Show HN: Data-anim – Animate HTML with just data attributes
Hey HN, I built data-anim — an animation library where you never have to write JavaScript yourself.
You just write:
<div data-anim="fadeInUp">Hello</div>
That's it. Scroll-triggered fade-in animation, zero JS to write.What it does:
- 30+ built-in animations (fade, slide, zoom, bounce, rotate, etc.)
- 4 triggers: scroll (default), load, click, hover
- 3-layer anti-FOUC protection (immediate style injection → noscript fallback → 5s timeout)
- Responsive controls: disable per device or swap animations on mobile
- TypeScript autocomplete for all attributes
- Under 3KB gzipped, zero dependencies
Why I built this:
I noticed that most animation needs on landing pages and marketing sites are simple — fade in on scroll, slide in from left, bounce on hover. But the existing options are either too heavy (Framer Motion ~30KB) or require JS boilerplate.
I also think declarative HTML attributes are the most AI-friendly animation format. When LLMs generate UI, HTML attributes are the output they hallucinate least on — no selector matching, no JS API to misremember, no script execution order to get wrong.
Docs: https://ryo-manba.github.io/data-anim/
Playground: https://ryo-manba.github.io/data-anim/playground/
npm: https://www.npmjs.com/package/data-anim
Happy to answer any questions about the implementation or design decisions.
Show HN: Ink – Deploy full-stack apps from AI agents via MCP or Skills
Hi HN, I built Ink, a full stack deployment platform where the primary users are AI agents, not humans.
We all know AI can write code, but deploying them still requires a human to wire it up: hosting, databases, DNS, and secrets. Ink gives agents those tools directly.
The agent calls "deploy" and the platform auto-detects the framework, builds it, deploys it, and returns a live URL at *.ml.ink. Here's a demo with Claude Code: https://www.youtube.com/watch?v=F6ZM_RrIaC0.
What Ink does that I haven't seen elsewhere:
- One agent skill for compute + databases + DNS + secrets + domains + usage + metrics + logs + scaling. The agent doesn't juggle separate providers — one account, one auth, one set of tools.
- DNS zone delegation. Delegate a zone once (e.g. dev.acme.com) and agents create any subdomain instantly — no manual adding DNS records each time, no propagation wait.
- Multiple agents and humans share one workspace and collaborate on projects. I envision a future where many agents collaborate together. I'm working on a cool demo to share.
- Built-in git hosting. Agents push code and deploy without the human setting up GitHub first. No external account needed. (Of course if you're a developer you can store code on GitHub — that's the recommended pattern.)
You also have what you'd expect: - UI with service observability designed for humans (logs, metrics, DNS). - GitHub integration — push triggers auto-redeploy. - Per-minute billing for CPU, memory, and egress. No per-seat, no per-agent. - Error responses designed for LLMs. Structured reason codes with suggested next actions, not raw stack traces. When a deploy fails the agent reads the log, fixes it, and redeploys autonomously.
Try: https://ml.ink Free $2 trial credits, no credit card. In case you want to try further here's 20% code "GOODFORTUNE".
Show HN: Learn Arabic with spaced repetition and comprehensible input
Sharing a friends first-ever Rails application, dedicated to Arabic learning, from 0 to 1. Pulls language learning methods from Anki, comprehensible input and more.
Show HN: Axe – A 12MB binary that replaces your AI framework
I built Axe because I got tired of every AI tool trying to be a chatbot.
Most frameworks want a long-lived session with a massive context window doing everything at once. That's expensive, slow, and fragile. Good software is small, focused, and composable... AI agents should be too.
Axe treats LLM agents like Unix programs. Each agent is a TOML config with a focused job. Such as code reviewer, log analyzer, commit message writer. You can run them from the CLI, pipe data in, get results out. You can use pipes to chain them together. Or trigger from cron, git hooks, CI.
What Axe is:
- 12MB binary, two dependencies. no framework, no Python, no Docker (unless you want it)
- Stdin piping, something like `git diff | axe run reviewer` just works
- Sub-agent delegation. Where agents call other agents via tool use, depth-limited
- Persistent memory. If you want, agents can remember across runs without you managing state
- MCP support. Axe can connect any MCP server to your agents
- Built-in tools. Such as web_search and url_fetch out of the box
- Multi-provider. Bring what you love to use.. Anthropic, OpenAI, Ollama, or anything in models.dev format
- Path-sandboxed file ops. Keeps agents locked to a working directory
Written in Go. No daemon, no GUI.
What would you automate first?
Show HN: What was the world listening to? Music charts, 20 countries (1940–2025)
I built this because I wanted to know what people in Japan were listening to the year I was born. That question spiraled: how does a hit in Rome compare to what was charting in Lagos the same year? How did sonic flavors propagate as streaming made musical influence travel faster than ever? 88mph is a playable map of music history: 230 charts across 20 countries, spanning 8 decades (1940–2025). Every song is playable via YouTube or Spotify. It's open source and I'd love help expanding it — there's a link to contribute charts for new countries and years. The goal is to crowdsource a complete sonic atlas of the world.
Show HN: I built Wool, a lightweight distributed Python runtime
I spent a long time working in the payments industry, specifically on a rather niche reporting/aggregation platform with spiky workloads that were not easily parallelized. To pump as much data through our pipeline as possible, we had to rely on complex locking schemes across half a dozen or so not-so-micro services - keeping a clear mental picture of how the services interacted for a given data source was a major headache. This problem always intrigued me, even after I no longer worked at the company, and lead to the development of Wool.
If you've worked with frameworks like Ray or Prefect, you're probably familiar with the promise of going from script to scale in two lines of code (or something along those lines). This is essentially the solution I was looking for: a framework with limited boilerplate that facilitated arbitrary distribution schemes within a single, coherent codebase. What I was hoping for, though, was something a little bit more focused - I wasn't working on ML pipelines and didn't need much else other than the distribution layer. This is where Wool comes in. While it's API is very similar to those of Ray and Prefect, where it differentiates itself is in its scope and architecture.
First, Wool is not a task orchestrator. It provides push-based, best-effort, at-most-once execution. There is no built-in coordination state, retry logic, or durable task tracking. Those concerns remain application-defined. The beauty of Wool is that it looks and feels like native async Python, allowing you to use purpose-built libraries for your needs as you would for any other Python app (with some caveats).
Second, Wool was designed with speed in mind. Because it's not bloated with features, it's actually pretty fast, even in its current nascent state. Wool routines are dispatched directly to a decentralized peer-to-peer network of gRPC workers, who can distribute nested routines amongst themselves in turn. This results in low dispatch latencies and high throughput. I won't make any performance claims until I can assemble some more robust benchmarks, but running local workers on my M4 MacBook Pro (a trivial example, I know), I can easily achieve sub-millisecond dispatch latencies.
Anyway, check it out, any and all feedback is welcome. Regarding docs- the code is the documentation for now, but I promise I'll sort that out soon. I've got plenty of ideas for next steps, but it's always more fun when people actually use what you've built, so I'm open to suggestions for impactful features.
-Conrad
Show HN: Signet.js – A minimalist reactivity engine for the modern web
Signet.js is a JavaScript library that provides a simple and secure way to manage digital signatures and verify the integrity of data. The library supports both RSA and ECDSA signing algorithms and can be used in both browser and Node.js environments.
Show HN: OneCLI – Vault for AI Agents in Rust
We built OneCLI because AI agents are being given raw API keys. And it's going about as well as you'd expect. We figured the answer isn't "don't give agents access," it's "give them access without giving them secrets."
OneCLI is an open-source gateway that sits between your AI agents and the services they call. You store your real credentials once in OneCLI's encrypted vault, and give your agents placeholder keys. When an agent makes an HTTP call through the proxy, OneCLI matches the request by host/path, verifies the agent should have access, swaps the placeholder for the real credential, and forwards the request. The agent never touches the actual secret. It just uses CLI or MCP tools as normal.
Try it in one line: docker run --pull always -p 10254:10254 -p 10255:10255 -v onecli-data:/app/data ghcr.io/onecli/onecli
The proxy is written in Rust, the dashboard is Next.js, and secrets are AES-256-GCM encrypted at rest. Everything runs in a single Docker container with an embedded Postgres (PGlite), no external dependencies. Works with any agent framework (OpenClaw, NanoClaw, IronClaw, or anything that can set an HTTPS_PROXY).
We started with what felt most urgent: agents shouldn't be holding raw credentials. The next layer is access policies and audit, defining what each agent can call, logging everything, and requiring human approval before sensitive actions go through.
It's Apache-2.0 licensed. We'd love feedback on the approach, and we're especially curious how people are handling agent auth today.
GitHub: https://github.com/onecli/onecli Site: https://onecli.sh
Show HN: KeyID – Free email and phone infrastructure for AI agents (MCP)
KeyID.ai is a platform that provides identity verification and authentication solutions, allowing businesses to securely onboard and manage users. The platform offers various features such as document verification, liveness detection, and biometric identification to help enterprises combat fraud and enhance their security measures.
Show HN: Rudel – Claude Code Session Analytics
We built rudel.ai after realizing we had no visibility into our own Claude Code sessions. We were using it daily but had no idea which sessions were efficient, why some got abandoned, or whether we were actually improving over time.
So we built an analytics layer for it. After connecting our own sessions, we ended up with a dataset of 1,573 real Claude Code sessions, 15M+ tokens, 270K+ interactions.
Some things we found that surprised us: - Skills were only being used in 4% of our sessions - 26% of sessions are abandoned, most within the first 60 seconds - Session success rate varies significantly by task type (documentation scores highest, refactoring lowest) - Error cascade patterns appear in the first 2 minutes and predict abandonment with reasonable accuracy - There is no meaningful benchmark for 'good' agentic session performance, we are building one.
The tool is free to use and fully open source, happy to answer questions about the data or how we built it.
Show HN: SupplementDEX – The Evidence-Based Supplement Database
Hi this is a work in progress but it works to determine supplement efficacy for 500 conditions at the moment.
Things you can do:
- search for a condition -> find which supplements are effective -> see which studies indicate they are effective -> read individual study summaries
- search for a supplement -> see effectiveness table, dosing, safety, dietary sources, mechanisms of action (+ browse all original sources)
let me know what you think
Show HN: s@: decentralized social networking over static sites
Show HN: Understudy – Teach a desktop agent by demonstrating a task once
I built Understudy because a lot of real work still spans native desktop apps, browser tabs, terminals, and chat tools. Most current agents live in only one of those surfaces.
Understudy is a local-first desktop agent runtime that can operate GUI apps, browsers, shell tools, files, and messaging in one session. The part I'm most interested in feedback on is teach-by-demonstration: you do a task once, the agent records screen video + semantic events, extracts the intent rather than coordinates, and turns it into a reusable skill.
Demo video: https://www.youtube.com/watch?v=3d5cRGnlb_0
In the demo I teach it: Google Image search -> download a photo -> remove background in Pixelmator Pro -> export -> send via Telegram. Then I ask it to do the same for Elon Musk. The replay isn't a brittle macro: the published skill stores intent steps, route options, and GUI hints only as a fallback. In this example it can also prefer faster routes when they are available instead of repeating every GUI step.
Current state: macOS only. Layers 1-2 are working today; Layers 3-4 are partial and still early.
npm install -g @understudy-ai/understudy
understudy wizard
GitHub: https://github.com/understudy-ai/understudyHappy to answer questions about the architecture, teach-by-demonstration, or the limits of the current implementation.
Show HN: Open-source browser for AI agents
Hi HN, I forked chromium and built agent-browser-protocol (ABP) after noticing that most browser-agent failures aren’t really about the model misunderstanding the page. Instead, the problem is that the model is reasoning from a stale state.
ABP is designed to keep the acting agent synchronized with the browser at every step. After each action (click, type, etc), it freezes JavaScript execution and rendering, then captures the resulting state. It also compiles the notable events that occurred during that action loop, such as navigation, file pickers, permission prompts, alerts, and downloads, and sends that along with a screenshot of the frozen page state back to the agent.
The result is that browser interaction starts to feel more like a multimodal chat loop. The agent takes an action, gets back a fresh visual state and a structured summary of what happened, then decides what to do next from there. That fits much better with how LLMs already work.
A few common browser-use failures ABP helps eliminate: * A modal appears after the last Playwright screenshot and blocks the input the agent was about to use * Dynamic filters cause the page to reflow between steps * An autocomplete dropdown opens and covers the element the agent intended to click * alert() / confirm() interrupts the flow * Downloads are triggered, but the agent has no reliable way to know when they’ve completed
As proof, ABP with opus 4.6 as the driver scores 90.5% on the Online Mind2Web benchmark. I think modern LLMs already understand websites, they just need a better tool to interact with them. Happy to answer questions about the architecture, forking chrome or anything else in the comments below.
Try it out: `claude mcp add browser -- npx -y agent-browser-protocol --mcp` (Codex/OpenCode instructions in the docs)
Demo video: https://www.loom.com/share/387f6349196f417d8b4b16a5452c3369
Show HN: Structural analysis of the D'Agapeyeff cipher (1939)
I am working on the D'Agapeyeff cipher, an unsolved cryptogram from 1939. Two findings that I haven't seen published before:
1. All 5 anomalous symbol values in the cipher cluster in the last column of a 14x14 grid. This turns out to be driven by a factor-of-2-and-7 positional pattern in the linear text.
2. Simulated annealing with Esperanto quadgrams (23M char Leipzig corpus) on a 2x98 columnar transposition consistently outscores English by 200+ points and recovers the same Esperanto vocabulary across independent runs.
The cipher is not solved. But the combination of structural geometry and computational linguistics narrows the search space significantly.
Work in progress, more to come!
Show HN: Hedra – an open-world 3D game I wrote from scratch before LLMs
With the current inflow of LLM aided software, I thought I would share a cool "hand-coded" project from the previous era (I wrote this in highschool so roughly ~8 years ago).
Hedra is an open world 3d game written from scratch using only OpenGL and C#. It has quite a few cool features like infinite generation, skinned animated mesh rendering, post processing effects, etc. Originally the physics engine was also written from scratch but i swapped for the more reliable bulletphysics.
Show HN: Costly – Open-source SDK that audits your LLM API costs
The article discusses the challenges and potential solutions for the high cost of education, including the impact of student debt, the rise of online learning, and the need for more affordable and accessible educational options.