Show HN: KeelTest – AI-driven VS Code unit test generator with bug discovery
I built this because Cursor, Claude Code and other agentic AI tools kept giving me tests that looked fine but failed when I ran them. Or worse - I'd ask the agent to run them and it would start looping: fix tests, those fail, then it starts "fixing" my code so tests pass, or just deletes assertions so they "pass".
Out of that frustration I built KeelTest - a VS Code extension that generates pytest tests and executes them, got hooked and decided to push this project forward... When tests fail, it tries to figure out why:
- Generation error: Attemps to fix it automatically, then tries again
- Bug in your source code: flags it and explains what's wrong
How it works:
- Static analysis to map dependencies, patterns, services to mock.
- Generate a plan for each function and what edge cases to cover
- Generate those tests
- Execute in "sandbox"
- Self-heal failures or flag source bugs
Python + pytest only for now. Alpha stage - not all codebases work reliably. But testing on personal projects and a few production apps at work, it's been consistently decent. Works best on simpler applications, sometimes glitches on monorepos setups. Supports Poetry/UV/plain pip setups.
Install from VS Code marketplace: https://marketplace.visualstudio.com/items?itemName=KeelCode...
More detailed writeup how it works: https://keelcode.dev/blog/introducing-keeltest
Free tier is 7 tests files/month (current limit is <=300 source LOC). To make it easier to try without signing up, giving away a few API keys (they have shared ~30 test files generation quota):
KEY-1: tgai_jHOEgOfpMJ_mrtNgSQ6iKKKXFm1RQ7FJOkI0a7LJiWg
KEY-2: tgai_NlSZN-4yRYZ15g5SAbDb0V0DRMfVw-bcEIOuzbycip0
KEY-3: tgai_kiiSIikrBZothZYqQ76V6zNbb2Qv-o6qiZjYZjeaczc
KEY-4: tgai_JBfSV_4w-87bZHpJYX0zLQ8kJfFrzas4dzj0vu31K5E
Would love your honest feedback where this could go next, and on which setups it failed, how it failed, it has quite verbose debug output at this stage!
Show HN: SMTP Tunnel – A SOCKS5 proxy disguised as email traffic to bypass DPI
A fast SOCKS5 proxy that tunnels your traffic through what looks like normal SMTP email, bypassing Deep Packet Inspection firewalls.
How it works: - Client runs a local SOCKS5 proxy (127.0.0.1:1080) - Traffic is sent to server disguised as SMTP (EHLO, STARTTLS, AUTH) - DPI sees legitimate email session, not a VPN/proxy
Features: - One-liner install on any Linux VPS - Multi-user with per-user secrets and IP whitelists - Auto-generated client packages (just double-click to run) - Auto-reconnect on connection loss - Works with any app that supports SOCKS5
Tech: Python/asyncio, TLS 1.2+, HMAC-SHA256 auth
GitHub: https://github.com/x011/smtp-tunnel-proxy
Show HN: RepoReaper – AST-aware, JIT-loading code audit agent (Python/AsyncIO)
OP here. I built RepoReaper to solve code context fragmentation in RAG.
Unlike standard chat-with-repo tools, it simulates a senior engineer's workflow: it parses Python AST for logic-aware chunking, uses a ReAct loop to JIT-fetch missing file dependencies from GitHub, and employs hybrid search (BM25+Vector). It also generates Mermaid diagrams for architecture visualization. The backend is fully async and persists state via ChromaDB.
Link: https://github.com/tzzp1224/RepoReaper
Show HN: Arabic Calligraphy Generator – 11 styles, free, no signup
I built a small web tool that lets people create Arabic calligraphy without needing design software. Most existing tools are either too complex or very limited, so I wanted something simple and accessible.
Features: • Write Arabic directly or translate from English • 11 classic calligraphy styles (Thuluth, Naskh, Kufi, Diwani, etc.) • Adjust layout, colors, line height, stroke, and rotation • Export as PNG, JPG, or SVG • No signup required
I’d appreciate any feedback on performance, UI, or calligraphy accuracy. This is a solo side project and still evolving.
Site: https://arabiccalligraphygenerator.online
Show HN: A simple way to find open source issues to contribute to
Finding open source issues is easy. Deciding which ones are worth your time is not.
I built Contrib.FYI as a simple web app to reduce that decision cost.
Instead of relying on static, curated lists, it uses live GitHub API data and shows issues in chronological order, so discovery stays fresh.
On top of that, it surfaces a few early signals (language, stars, no comments, no linked PRs) to help you avoid opening issues that are already being worked on.
The goal is not to find more issues, but to find better candidates to spend your time on.
Source code is available here: https://github.com/K-dash/contrib-fyi
Feedback is welcome.
Show HN: Deep learning without gradient descent, 500 layers, no skip connections
Show HN: Comet MCP – Give Claude Code a browser that can click
Hey HN,
Claude Code is pretty agentic now. It writes scripts, calls APIs, uses CLIs. But when something requires actually clicking through a website, it stops and asks me to do it.
Problem is, I'm often unfamiliar with these platforms myself. "Go to App Store Connect and generate a P8 key" okay but where? I end up spending 10 minutes navigating menus I've never seen before.
I started delegating these tasks to Perplexity's Comet browser. It handles the clicking, returns what I need. But copy-pasting between Claude and Comet got old fast.
So I built this MCP server to connect them directly. Now when Claude needs to interact with a website that has no API, it can just ask Comet to handle it.
Examples:
- Grab my app ID from RevenueCat dashboard
- Generate a P8 key in App Store Connect
- Navigate admin panels behind login walls
I tried Playwright MCP but having Claude do the clicking itself overwhelms the context window. Comet's agentic browsing just works better in my experience.Comet doesn't have an API, so this uses CDP to communicate with it directly.
Show HN: Metabase-Impact – Find which Metabase questions break before you deploy
I built a CLI tool that scans your Metabase instance to find which SQL questions reference a column or table you're about to drop/rename.
metabase-impact --metabase-url http://localhost:3000 --api-key "mb_xxx" --drop-column orders.user_id
It outputs affected questions with direct links so you can fix or archive them before deploying.
Built this after breaking dashboards one too many times. Uses sqlglot for SQL parsing (handles aliases and complex queries). Only works on native SQL questions, not MBQL/GUI queries.
Show HN: Put Greenland on the Moon (interactive map for size compare)
Just built a small tool and created some comparsion of country size vs. planets. Greenland seems larger than i thought.
The tool allows you to drag a counry to other planet to see the size there.
Show HN: VaultSandbox – Test your real MailGun/SES/etc. integration
I've spent the last few months working on something I wish I'd had years ago. I kept running into the same issue: CI green, production mail broken. TLS handshake failures, DKIM alignment mismatches, SPF soft-fails ... the stuff that only surfaces when real mail servers are involved. Most test tools (Mailpit, MailHog) are catch-alls. They confirm "an email was sent" but don't validate the protocol. They also aren't designed for network-exposed environments: no auth, unprotected Web UI, easy to enumerate messages.
VaultSandbox is my attempt at fixing that. It's a self-hosted SMTP gateway (AGPLv3) that validates SPF, DKIM, DMARC, and rDNS on every incoming message. You keep your production email provider (Postmark, SendGrid, SES) in tests and you just change the recipient domain. No mocking, no config changes. There are client SDKs (Node, Python, Go, Java, .NET), plus a Web UI and a CLI for manual testing.
Some technical details:
Deterministic Tests Instead of polling or sleep loops, the SDKs use Server-Sent Events (SSE) so test assertions trigger the moment the mail hits the gateway.
Minimal infrastructure footprint Built with NestJS and Angular, with no external database dependency to keep the container footprint small and easier to reason about.
Post-Quantum Encryption I use ML-KEM-768 for the encryption layer. Incoming mail is encrypted immediately using a client-generated public key and the plaintext is discarded. The server only ever stores encrypted message data and cannot decrypt it. I chose PQ because I wanted to build something I wouldn't have to revisit in five years. If it handles large PQ keys reliably, everything else is easy.
Quick start: https://vaultsandbox.dev/getting-started/quickstart/
Site: https://vaultsandbox.com
I'd love feedback, especially on whether AGPLv3 would be a blocker for something you'd self-host in dev.
Show HN: Make audio loops online
I created a small webapp, to create simple audio loops online. A bit rough around the edges but gets you started in less than 10 seconds on creating loops.
Show HN: Cited AI – AI answers with citations linking to exact source passages
Hey HN!
I’m Collin, a 20 year old law student from Amsterdam. I built Cited AI, an AI that gives accurate and verifiable answers from your documents/context.
As a law student who uses AI a lot, I know how important it is that answers are actually accurate and verifiable. I remember many instances where I'd ask a chatbot like ChatGPT or Claude about case law or long documents, and it would either hallucinate facts that didn't exist, or fail to give me the exact passages from my source documents so I could verify its answers. Even when specifically asking for exact quotes, finding that passage in the original document is still a hassle. That's why last November I started building Cited.
It should work with all types of content, complex PDFs (including math heavy ones) and long documents up to 75k words. No RAG or chunking is used, to make sure the full document is in the LLM's context.
I would love to hear your feedback!
Show HN: Mantic.sh – A structural code search engine for AI agents
Author here! Some context: I published this 48 hours ago and it was auto-listed on MCPMarket (the MCP tools directory). Got 700+ organic downloads with zero marketing—developers were actively searching for exactly this solution.
The "Git Accelerator" optimization story:
Initially used a file walker that took 6.6s on Chromium. Profiling showed 90% was filesystem I/O. The fix: git ls-files returns 480k paths in ~200ms. Added smart heuristics for untracked files (only scan dirs <50k files), bringing total to 0.46s.
Why this matters: Agents can't wait 10 seconds for search. Sub-500ms makes it feel instant, changing how they explore codebases.
Installation:
Cursor: npx mantic.sh@latest
VS Code: npx mantic.sh@latest
CLI: npm i -g mantic.sh
Limitations: Mantic is optimized for precise queries ("find stripe webhook") where structure matters. For fuzzy exploratory search, traditional embeddings may still be better. Curious if HN has ideas for hybrid approaches.Happy to answer questions!
Show HN: 48-digit prime numbers every git commit
The article discusses Git Prime, a web-based tool that helps developers manage their Git repositories more effectively. It provides features like project monitoring, code review, and team collaboration to improve the software development workflow.
Show HN: Stash – Sync Markdown Files with Apple Notes via CLI
Stash is a decentralized storage and sharing platform that allows users to securely store, access, and share their digital files across devices and with others, using blockchain technology to ensure data privacy and ownership.
Show HN: Prism.Tools – Free and privacy-focused developer utilities
Hi HN, I'm Barry and I've built Prism.Tools (https://blgardner.github.io/prism.tools/) – a collection of client-side developer utilities that respect your privacy.
Many of these tools were used way back in the days when I ran a BBS and started my communities first ISP, serving three local communities with Dial-Up Internet, Web Hosting etc. The tools have been refined to reflect the changes in tech since then and designed for the Novice and Pro alike. As I locate more tools others may find useful I will refine and add them to the collection. Use them, Share them, or not. They will be here if you need them...
40+ dev tools (JSON formatters, regex tester, base64 encoder, Git command helper, etc.) that run entirely in your browser. Zero tracking, zero analytics, zero data collection – everything processes locally. Self-contained HTML files with no build process or frameworks.
I realized I had a lot of tools/utilities I've built over the years for my own use. I lothe having to 'sign-up' just to access/use simple utilities that I can create myself. I've refined them and put them in one safe place so I could easily access them if/when needed. I decided to make them available via Github Pages for anyone that may find them useful. Prism.Tools is the result.
Each tool is a standalone HTML file with embedded CSS and JavaScript. No frameworks, no npm packages, no build steps – just open the file and it works.
The entire toolset:
- 100% client-side processing – your data never leaves your browser.
- No external dependencies except for specific libraries from cdnjs.cloudflare.com (marked.js for markdown, exifr for image metadata, etc.)
- Consistent dark UI – every tool follows the same design language for familiarity.
- Vanilla JS where possible – only reaching for Public CDN Resources when necessary.
The constraint of "single HTML file" was intentional. It forces simplicity and ensures tools remain maintainable. It also means users can inspect, modify, or self-host any tool trivially.
These tools have helped me with debugging production issues, Quick formatting tasks, learning Git commands (the Git command helper has been particularly helpful)
Just visit https://blgardner.github.io/prism.tools/ and try any tool. No signup, no install.
What tools are missing that you find yourself needing? Any performance issues with specific tools? UI/UX friction points?
All tools follow the same privacy-first philosophy... Your data stays in your browser. No accounts, no tracking, no servers processing your information. The project is also a demonstration that you don't always need React, Vue, or complex build pipelines – sometimes vanilla JavaScript in a single HTML file is exactly the right tool for the job.
Vanilla JavaScript (ES6+) CSS3 with CSS Grid Minimal external libraries: marked.js, exifr, highlight.js, sql-formatter (all from CDN) No frameworks, no bundlers, no npm Hosted on Github Pages
Happy to answer questions about the technical implementation, design decisions, or specific tools!
All tools are inspectable – just view source on any page to see exactly how they work!
Show HN: Tailsnitch – A security auditor for Tailscale
Show HN: Foundertrace – chain of YC startups founded by its employees
Inspired by PG’s tweet about a chain of 4 YC startups where the founder worked at a YC startup, I vibe coded and generated these genealogy chains for all ~6k YC startups. And to make these trees easily accessible I packaged them into a hosted webapp.
Few noteworthy YC startups which have had huge impact in YC ecosystem
Airbnb - 83 YC startups spawned
Stripe - 67 YC startups spawned
Dropbox - 50 YC startups spawned
Justin.tv/Twitch - 47 YC startups spawned
More recently founded YC startups which have spawned a lot more YC startups
Rappi - 21 YC startups spawned
Brex - 20 YC startups spawned
Scale AI - 19 YC startups spawned
Show HN: GPU Cuckoo Filter – faster queries than Blocked Bloom, with deletion
The article discusses the Cuckoo Filter, a space-efficient data structure that can be used as an alternative to traditional Bloom filters for set membership queries. It provides an overview of the Cuckoo Filter's design, advantages, and use cases.
Show HN: Jax-JS, array library in JavaScript targeting WebGPU
JAX-JS is a machine learning library for the web that provides a powerful and flexible framework for building and training neural networks directly in JavaScript. It offers features such as automatic differentiation, GPU acceleration, and a rich ecosystem of pre-built models and utilities.
Show HN: llmgame.ai – The Wikipedia Game but with LLMs
I used to play the Wikipedia Game in high school and had an idea for applying the same mechanic of clicking from concept to concept to LLMs.
Will post another version that runs with an LLM entirely in the browser soon, but for now, please enjoy as long as my credits last...
Warning: the LLM does not always cooperate
Show HN: A RAM-only, end-to-end encrypted P2P terminal chat in Python
Hi HN,
This is cmd-chat, a Python terminal chat app designed around a few constraints:
- No central servers - No message or key persistence - No plaintext credentials ever sent over the network
Authentication uses *SRP*, and messages are encrypted after key exchange. All data lives in memory only and disappears when the process exits.
This was partly a learning project and partly an experiment in building a “minimum-trust” chat system using standard cryptographic primitives.
Curious to hear thoughts on the threat model, crypto choices, and overall design.
Show HN: DoNotNotify – Log and intelligently block notifications on Android
Why - I got sick of apps abusing notifications on my Android phone. While the OS does give you the ability to switch off notifications based on channels, most apps either don't use it or abuse it intentionally. In my case, I live in a gated society that uses an app called MyGate to allow visitors, and the app intentionally pushes ads through the same channels since you cannot block them.
What - DoNotNotify is an app that logs all incoming notifications, and displays them grouped by app. It also captures the action behind the notification, which can be triggered from the app itself. From this log, you can create rules to whitelist/blacklist notifications from apps depending on their notification content. These filters can even be regex expressions, which allows for more complicated use-cases. The app ships with some pre-defined rules for popular apps like Facebook, Amazon, Instagram, Netflix, TikTok, Reddit etc.
Where - The website is at https://donotnotify.com/.
Would also like to call out that the app runs purely on your device, never communicates with anything on the Internet, and only requires notifications access to work. It is completely free, and there is no advertising or hidden gotchas.
Show HN: DDL to Data – Generate realistic test data from SQL schemas
I built DDL to Data after repeatedly pushing back on "just use production data and mask it" requests. Teams needed populated databases for testing, but pulling prod meant security reviews, PII scrubbing, and DevOps tickets. Hand-written seed scripts were the alternative slow, fragile, and out of sync the moment schemas changed.
Paste your CREATE TABLE statements, get realistic test data back. It parses your schema, preserves foreign key relationships, and generates data that looks real, emails look like emails, timestamps are reasonable, uniqueness constraints are honored.
No setup, no config. Works with PostgreSQL and MySQL.
https://ddltodata.com
Would love feedback from anyone who deals with test data or staging environments. What's missing?
Show HN: 25 years of house prices in England and Wales
The article provides a dashboard that tracks and visualizes the latest trends in UK house prices, allowing users to explore data on regional variations, price changes over time, and other key housing market indicators.
Show HN: Symbolic Circuit Distillation: prove program to LLM circuit equivalence
Hi HN, I've been exploring various applications of formal methods to ML/interpretability and I've been hoping to get more eyes on the approach.
I have been working on a small interpretability project I call Symbolic Circuit Distillation. The goal is to take a tiny neuron-level circuit (like the ones in OpenAI's "Sparse Circuits" work) and automatically recover a concise Python program that implements the same algorithm, along with a bounded formal proof that the two are equivalent on a finite token domain.
Roughly, the pipeline is:
1. Start from a pruned circuit graph for a specific behavior (e.g. quote closing or bracket depth) extracted from a transformer. 2. Treat the circuit as an executable function and train a tiny ReLU network ("surrogate") that exactly matches the circuit on all inputs in a bounded domain (typically sequences of length 5–10 over a small token alphabet). 3. Search over a constrained DSL of common transformer motifs (counters, toggles, threshold detectors, small state machines) to synthesize candidate Python programs. 4. Use SMT-based bounded equivalence checking to either: - Prove that a candidate program and the surrogate agree on all inputs in the domain, or - Produce a counterexample input that rules the program out.
If the solver finds a proof, you get a small, human-readable Python function plus a machine-checkable guarantee that it matches the original circuit on that bounded domain.
Why I built this
Mechanistic interpretability has gotten pretty good at extracting "small crisp circuits" from large models, but turning those graphs into clean, human-readable algorithms is still very manual. My goal here is to automate that last step: go from "here is a sparse circuit" to "here is a verified algorithm that explains what it does", without hand-holding.
What works today
- Tasks: quote closing and bracket-depth detection from the OpenAI circuit_sparsity repo. - Exact surrogate fitting on a finite token domain. - DSL templates for simple counters, toggles, and small state machines. - SMT-based bounded equivalence between: sparse circuit -> ReLU surrogate -> Python program in the DSL.
Limitations and open questions
- The guarantees are bounded: equivalence is only proven on a finite token domain (short sequences and a small vocabulary). - Currently focused on very small circuits. Scaling to larger circuits and longer contexts is open engineering and research work. - The DSL is hand-designed around a few motifs. I am not yet learning the DSL itself or doing anything very clever in the search.
What I would love feedback on
- Are the problem framing and guarantees interesting to people working on mechanistic interpretability or formal methods? - Suggestions for next benchmarks: which circuits or behaviors would you want to see distilled next? - Feedback on the DSL design, search strategy, and SMT setup.
Happy to answer questions about implementation details, the SMT encoding, integration with OpenAI's Sparse Circuits repo, or anything else.
Show HN: I built "Google" for searching Shadcn blocks on the web
Shoogle is a new search engine that aims to provide a more personalized and user-friendly experience compared to traditional search engines. It uses advanced machine learning algorithms to understand user intent and deliver more relevant and tailored search results.
Show HN: Finding similarities in New Yorker covers
Originally learned about image hashing and similarity comparison for product image searches. Decided to apply it to magazine covers.
Thrasher covers: https://shoplurker.com/labs/thrasher-covers/
Show HN: Shellock, a real-time CLI flag explainer for fish shell
Shellock is an open-source project that provides a simple and lightweight web server written in Rust. It aims to be a secure, fast, and scalable alternative to traditional web servers, making it suitable for a wide range of applications and environments.
Show HN: Server-rendered multiplayer games with Lua (no client code)
Hey folks — here’s a small experiment I hacked together over the weekend:
https://cleoselene.com/
In short, it’s a way to build multiplayer games with no client-side game logic. Everything is rendered on the server, and the game itself is written as simple Lua scripts.
I built this to explore a few gamedev ideas I’ve been thinking about while working on Abstra: - Writing multiplayer games as if they were single-player (no client/server complexity) - Streaming game primitives instead of pixels, which should be much lighter - Server-side rendering makes cheating basically impossible - Game secrets never leave the server
This isn’t meant to be a commercial project — it’s just for fun and experimentation for now.
If you want to try it out, grab a few friends and play here: https://cleoselene.com/astro-maze/