Show HN: I trained a 9M speech model to fix my Mandarin tones
Built this because tones are killing my spoken Mandarin and I can't reliably hear my own mistakes.
It's a 9M Conformer-CTC model trained on ~300h (AISHELL + Primewords), quantized to INT8 (11 MB), runs 100% in-browser via ONNX Runtime Web.
Grades per-syllable pronunciation + tones with Viterbi forced alignment.
Try it here: https://simedw.com/projects/ear/
Show HN: Phage Explorer
I got really interested in biology and genetics a few months ago, just for fun.
This was largely inspired by the work of Sydney Brenner, which became the basis of my brennerbot.org project.
In particular, I became very fascinated by phages, which are viruses that attack bacteria. They're the closest thing to the "fundamental particles" of biology: the minimal units of genetic code that do something useful that allows them to reproduce and spread.
They also have some incredible properties, like having a structure that somehow encodes an icosahedron.
I always wondered how the DNA of these things translated into geometry in the physical world. That mapping between the "digital" realm of ACGT, which in turn maps onto the 20 amino acids in groups of 3, and the world of 3D, analog shapes, still seems magical and mysterious to me.
I wanted to dig deeper into the subject, but not by reading a boring textbook. I wanted to get a sense for these phages in a tangible way. What are the different major types of phages? How do they compare to each other in terms of the length and structure of their genetic code? The physical structure they assume?
I decided to make a program to explore all this stuff in an interactive way.
And so I'm very pleased to present you with my open-source Phage Explorer:
phage-explorer.org
I probably went a bit overboard, because what I ended up with has taken a sickening number of tokens to generate, and resulted in ~150k lines of Typescript and Rust/Wasm.
It implements 23 analysis algorithms, over 40 visualizations, and has the complete genetic data and 3D structure of 24 different classes of phage.
It actually took a lot of engineering to make this work well in a browser; it's a surprising amount of data (this becomes obvious when you look at some of the 3D structure models).
It works fairly well on mobile, but if you want to get the full experience, I highly recommend opening it on a desktop browser in high resolution.
As far as I know, it's the most complete informational / educational software about phages available anywhere. Now, I am the first to admit that I'm NOT an expert, or even that knowledgeable, about, well, ANY of this stuff.
So if you’re a biology expert, please take a look and let me know what you think of what I've made! And if I've gotten anything wrong, please let me know in the GitHub Issues and I'll fix it:
https://github.com/Dicklesworthstone/phage_explorer
Show HN: SF Microclimates
https://microclimates.solofounders.com/
Show HN: Amla Sandbox – WASM bash shell sandbox for AI agents
WASM sandbox for running LLM-generated code safely.
Agents get a bash-like shell and can only call tools you provide, with constraints you define. No Docker, no subprocess, no SaaS — just pip install amla-sandbox
Show HN: Pinecone Explorer – Desktop GUI for the Pinecone vector database
https://github.com/stepandel/pinecone-explorer
Show HN: Blink – Native macOS code snippet manager. Local, offline, <1s search
Show HN: Kolibri, a DIY music club in Sweden
We’re Maria and Jonatan, a married couple running a small music night in Norrköping, Sweden, called Kolibri.
It’s not a software project. We run it through our own small Swedish company, pay artists, and do the operations ourselves. We do one night a month (usually the last Friday) in a restaurant venue called Mitropa. A typical night is about 50–70 paying guests. The first years it was DJs only, but last year we started doing live bands as well.
We made a simple site with schedule plus photos/video so you can see what it looks like: https://kolibrinkpg.com/
On the site:
* photos and short videos (size/atmosphere)
* the kind of acts we book (post-punk, darkwave, synth, adjacent electronic)
* enough context to copy parts of the format if you’re building something similar locally
* for the tech-curious: we built our own ticketing system (first used in February) and a media ingestion pipeline for Instagram and external photographers
How it started was accidental. I was doing remote music sessions with a friend in London (Ableton projects back and forth on FaceTime), ran out of beer, and walked into the nearest place. I got talking to Nahir, who runs Mitropa, and floated the idea of running a DIY music night there. He was up for it.What made it take off was doing things in person. People will show up alone if they trust the room. Maria ended up doing a lot of that work: greeting newcomers, noticing who looks uncertain, and setting a tone where people treat each other decently.
Maria didn’t come from a DJ background. Klubbvärdinnan started as a joke name at Kolibri and then became her DJ moniker. She got good quickly, and after a first gig outside our own night she started getting booked elsewhere too.
Marketing-wise, what worked best was very analogue: walking around town, visiting local businesses we genuinely like, buying something, introducing ourselves, and asking if we could leave a flyer.
In the beginning we weren’t sure how to present it on social media. So we filmed headphone walks: one person walking through town listening to a track we picked. It looked good, people wanted to be in them, and afterwards we’d buy them a couple of drinks and actually talk. That turned a social media interaction into a real connection. It was a bit of luck, but it worked.
Questions welcome about what worked, what failed, costs/logistics, and what we’d do differently if we started over.
Show HN: Kling VIDEO 3.0 released: 15-second AI video generation model
Kling just announced VIDEO 3.0 - a significant upgrade from their 2.6 and O1 models.
Key improvements:
*Extended duration:* • Up to 15 seconds of continuous video (vs previous 5-10 seconds) • Flexible duration ranging from 3-15 seconds • Better for complex action sequences and scene development
*Unified multimodal approach:* • Integrates text-to-video, image-to-video, reference-to-video • Video modification and transformation in one model • Native audio generation (synchronized with video)
*Two variants:* • VIDEO 3.0 (upgraded from 2.6) • VIDEO 3.0 Omni (upgraded from O1)
*Enhanced capabilities:* • Improved subject consistency with reference-based generation • Better prompt adherence and output stability • More flexibility in storyboarding and shot control
This positions Kling competitively against: - Runway Gen-4.5 ($95/month) - Sora 2 (limited access) - Veo 3.1 (Google) - Grok Imagine (just topped rankings)
The 15-second duration is particularly interesting - enables more narrative storytelling vs the typical 5-second clips. Combined with native audio, this could change workflows for content creators.
Pricing isn't mentioned in the announcement. Previous Kling models ranged from $10-40/month, significantly cheaper than Runway.
Anyone have access to test this yet? Curious how the quality compares to Runway and Sora at this new duration.
Show HN: Cicada – A scripting language that integrates with C
I wrote a lightweight scripting language that runs together with C. Specifically, it's a C library, you run it through a C function call, and it can callback your own C functions. Compiles to ~250 kB. No dependencies beyond the C standard library.
Key language features: * Uses aliases not pointers, so it's memory-safe * Arrays are N-dimensional and resizable * Runs scripts or its own 'shell' * Error trapping * Methods, inheritance, etc. * Customizable syntax
Show HN: Interactive Equation Solver
Hey HN
Here's a demo of an interactive equation solver for sympy/Python:
https://youtu.be/O837G7cj5fE?si=6YjfNFviozCLSYut
The example is a simple kinematics exercise from a college physics text.
I'd be curious to know if there are any other systems like this.
Code is in a branch here:
https://github.com/dharmatech/combine-equations.py/tree/eq-g...
Show HN: I built an AI conversation partner to practice speaking languages
Hi,
I built TalkBits because most language apps focus on vocabulary or exercises, but not actual conversation. The hard part of learning a language is speaking naturally under pressure.
TalkBits lets you have real-time spoken conversations with an AI that acts like a native speaker. You can choose different scenarios (travel, daily life, work, etc.), speak naturally, and the AI responds with natural speech back.
The goal is to make it feel like talking to a real person rather than doing lessons.
Techwise, it uses realtime speech input, transcription, LLM responses, and tts streaming to keep latency low so the conversation feels fluid.
I’m specially interested in feedback about: – Does it feel natural? – Where does the conversation break immersion? – What would make you use this regularly?
Happy to answer technical questions too.
Thanks
Show HN: Mystral Native – Run JavaScript games natively with WebGPU (no browser)
Hi HN, I've been building Mystral Native — a lightweight native runtime that lets you write games in JavaScript/TypeScript using standard Web APIs (WebGPU, Canvas 2D, Web Audio, fetch) and run them as standalone desktop apps. Think "Electron for games" but without Chromium. Or a JS runtime like Node, Deno, or Bun but optimized for WebGPU (and bundling a window / event system using SDL3).
Why: I originally started by starting a new game engine in WebGPU, and I loved the iteration loop of writing Typescript & instantly seeing the changes in the browser with hot reloading. After getting something working and shipping a demo, I realized that shipping a whole browser doesn't really work if I also want the same codebase to work on mobile. Sure, I could use a webview, but that's not always a good or consistent experience for users - there are nuances with Safari on iOS supporting WebGPU, but not the same features that Chrome does on desktop. What I really wanted was a WebGPU runtime that is consistent & works on any platform. I was inspired by deno's --unsafe-webgpu flag, but I realized that deno probably wouldn't be a good fit long term because it doesn't support iOS or Android & doesn't bundle a window / event system (they have "bring your own window", but that means writing a lot of custom code for events, dealing with windowing, not to mention more specific things like implementing a WebAudio shim, etc.). So that got me down the path of building a native runtime specifically for games & that's Mystral Native.
So now with Mystral Native, I can have the same developer experience (write JS, use shaders in WGSL, call requestAnimationFrame) but get a real native binary I can ship to players on any platform without requiring a webview or a browser. No 200MB Chromium runtime, no CEF overhead, just the game code and a ~25MB runtime.
What it does: - Full WebGPU via Dawn (Chrome's implementation) or wgpu-native (Rust) - Native window & events via SDL3 - Canvas 2D support (Skia), Web Audio (SDL3), fetch (file/http/https) - V8 for JS (same engine as Chrome/Node), also supports QuickJS and JSC - ES modules, TypeScript via SWC - Compile to single binary (think "pkg"): `mystral compile game.js --include assets -o my-game` - macOS .app bundles with code signing, Linux/Windows standalone executables - Embedding API for iOS and Android (JSC/QuickJS + wgpu-native)
It's early alpha — the core rendering path works well & I've tested on Mac, Linux (Ubuntu 24.04), and Windows 11, and some custom builds for iOS & Android to validate that they can work, but there's plenty to improve. Would love to get some feedback and see where it can go!
MIT licensed.
Repo: https://github.com/mystralengine/mystralnative
Docs: https://mystralengine.github.io/mystralnative/
Show HN: Foundry – Turns your repeated workflows into one-click commands
Show HN: Ourguide – OS wide task guidance system that shows you where to click
Hey! I'm eshaan and I'm building Ourguide -an on-screen task guidance system that can show you where to click step-by-step when you need help.
I started building this because whenever I didn’t know how to do something on my computer, I found myself constantly tabbing between chatbots and the app, pasting screenshots, and asking “what do I do next?” Ourguide solves this with two modes. In Guide mode, the app overlays your screen and highlights the specific element to click next, eliminating the need to leave your current window. There is also Ask mode, which is a vision-integrated chat that captures your screen context—which you can toggle on and off anytime -so you can ask, "How do I fix this error?" without having to explain what "this" is.
It’s an Electron app that works OS-wide, is vision-based, and isn't restricted to the browser.
Figuring out how to show the user where to click was the hardest part of the process. I originally trained a computer vision model with 2300 screenshots to identify and segment all UI elements on a screen and used a VLM to find the correct icon to highlight. While this worked extremely well—better than SOTA grounding models like UI Tars—the latency was just too high. I'll be making that CV+VLM pipeline OSS soon, but for now, I’ve resorted to a simpler implementation that achieves <1s latency.
You may ask: if I can show you where to click, why can't I just click too? While trying to build computer-use agents during my job in Palo Alto, I hit the core limitation of today’s computer-use models where benchmarks hover in the mid-50% range (OSWorld). VLMs often know what to do but not what it looks like; without reliable visual grounding, agents misclick and stall. So, I built computer use—without the "use." It provides the visual grounding of an agent but keeps the human in the loop for the actual execution to prevent misclicks.
I personally use it for the AWS Console's "treasure hunt" UI, like creating a public S3 bucket with specific CORS rules. It’s also been surprisingly helpful for non-technical tasks, like navigating obscure settings in Gradescope or Spotify. Ourguide really works for any task when you’re stuck or don't know what to do.
You can download and test Ourguide here: https://ourguide.ai/downloads
The project is still very early, and I’d love your feedback on where it fails, where you think it worked well, and which specific niches you think Ourguide would be most helpful for.
Show HN: ShapedQL – A SQL engine for multi-stage ranking and RAG
Hi HN,
I’m Tullie, founder of Shaped. Previously, I was a researcher at Meta AI, worked on ranking for Instagram Reels, and was a contributor to PyTorch Lightning.
We built ShapedQL because we noticed that while retrieval (finding 1,000 items) has been commoditized by vector DBs, ranking (finding the best 10 items) is still an infrastructure problem.
To build a decent for you feed or a RAG system with long-term memory, you usually have to put together a vector DB (Pinecone/Milvus), a feature store (Redis), an inference service, and thousands of lines of Python to handle business logic and reranking.
We built an engine that consolidates this into a single SQL dialect. It compiles declarative queries into high-performance, multi-stage ranking pipelines.
HOW IT WORKS:
Instead of just SELECT , ShapedQL operates in four stages native to recommendation systems:
RETRIEVE: Fetch candidates via Hybrid Search (Keywords + Vectors) or Collaborative Filtering. FILTER: Apply hard constraints (e.g., "inventory > 0"). SCORE: Rank results using real-time models (e.g., p(click) or p(relevance)). REORDER: Apply diversity logic so your Agent/User doesn’t see 10 nearly identical results.
THE SYNTAX: Here is what a RAG query looks like. This replaces about 500 lines of standard Python/LangChain code:
SELECT item_id, description, price
FROM
-- Retrieval: Hybrid search across multiple indexes
search_flights("$param.user_prompt", "$param.context"),
search_hotels("$param.user_prompt", "$param.context")
WHERE -- Filtering: Hard business constraints
price <= "$param.budget" AND is_available("$param.dates")
ORDER BY -- Scoring: Real-time reranking (Personalization + Relevance)
0.5 * preference_score(user, item) +
0.3 * relevance_score(item, "$param.user_prompt")
LIMIT 20If you don’t like SQL, you can also use our Python and Typescript SDKs. I’d love to know what you think of the syntax and the abstraction layer!
Show HN: OpenVideo – A self-hostable, open-source video editor in the browser
Open-source, browser-based video editor inspired by CapCut. Timeline editing, runs on modern web APIs, and can be self-hosted. Looking for feedback from devs and video folks.
Show HN: I'm building an AI-proof writing tool. How would you defeat it?
Show HN: Hosted OpenClaw with Secure Isolation
The article discusses the launch of Moltcloud's hosted OpenClaw service, which allows users to access and use the OpenClaw language model without the need for complex setup or infrastructure. The service provides easy access to a powerful AI tool for various applications, including natural language processing and generation.
Show HN: LemonSlice – Upgrade your voice agents to real-time video
Hey HN, we're the co-founders of LemonSlice (try our HN playground here: https://lemonslice.com/hn). We train interactive avatar video models. Our API lets you upload a photo and immediately jump into a FaceTime-style call with that character. Here's a demo: https://www.loom.com/share/941577113141418e80d2834c83a5a0a9
Chatbots are everywhere and voice AI has taken off, but we believe video avatars will be the most common form factor for conversational AI. Most people would rather watch something than read it. The problem is that generating video in real-time is hard, and overcoming the uncanny valley is even harder.
We haven’t broken the uncanny valley yet. Nobody has. But we’re getting close and our photorealistic avatars are currently best-in-class (judge for yourself: https://lemonslice.com/try/taylor). Plus, we're the only avatar model that can do animals and heavily stylized cartoons. Try it: https://lemonslice.com/try/alien. Warning! Talking to this little guy may improve your mood.
Today we're releasing our new model* - Lemon Slice 2, a 20B-parameter diffusion transformer that generates infinite-length video at 20fps on a single GPU - and opening up our API.
How did we get a video diffusion model to run in real-time? There was no single trick, just a lot of them stacked together. The first big change was making our model causal. Standard video diffusion models are bidirectional (they look at frames both before and after the current one), which means you can't stream.
From there it was about fitting everything on one GPU. We switched from full to sliding window attention, which killed our memory bottleneck. We distilled from 40 denoising steps down to just a few - quality degraded less than we feared, especially after using GAN-based distillation (though tuning that adversarial loss to avoid mode collapse was its own adventure).
And the rest was inference work: modifying RoPE from complex to real (this one was cool!), precision tuning, fusing kernels, a special rolling KV cache, lots of other caching, and more. We kept shaving off milliseconds wherever we could and eventually got to real-time.
We set up a guest playground for HN so you can create and talk to characters without logging in: https://lemonslice.com/hn. For those who want to build with our API (we have a new LiveKit integration that we’re pumped about!), grab a coupon code in the HN playground for your first Pro month free ($100 value). See the docs: https://lemonslice.com/docs. Pricing is usage-based at $0.12-0.20/min for video generation.
Looking forward to your feedback!
EDIT: Tell us what characters you want to see in the comments and we can make them for you to talk to (e.g. Max Headroom)
*We did a Show HN last year for our V1 model: https://news.ycombinator.com/item?id=43785044. It was technically impressive but so bad compared to what we have today.
Show HN: The HN Arcade
I love seeing all the small games that people build and post to this site.
I don't want to forget any, so I have built a directory/arcade for the games here that I maintain.
Feel free to check it out, add your game if its missing and let me know what you think. Thanks!
Show HN: SHDL – A minimal hardware description language built from logic gates
Hi, everyone!
I built SHDL (Simple Hardware Description Language) as an experiment in stripping hardware description down to its absolute fundamentals.
In SHDL, there are no arithmetic operators, no implicit bit widths, and no high-level constructs. You build everything explicitly from logic gates and wires, and then compose larger components hierarchically. The goal is not synthesis or performance, but understanding: what digital systems actually look like when abstractions are removed.
SHDL is accompanied by PySHDL, a Python interface that lets you load circuits, poke inputs, step the simulation, and observe outputs. Under the hood, SHDL compiles circuits to C for fast execution, but the language itself remains intentionally small and transparent.
This is not meant to replace Verilog or VHDL. It’s aimed at: - learning digital logic from first principles - experimenting with HDL and language design - teaching or visualizing how complex hardware emerges from simple gates.
I would especially appreciate feedback on: - the language design choices - what feels unnecessarily restrictive vs. educationally valuable - whether this kind of “anti-abstraction” HDL is useful to you.
Repo: https://github.com/rafa-rrayes/SHDL
Python package: PySHDL on PyPI
To make this concrete, here are a few small working examples written in SHDL:
1. Full Adder
component FullAdder(A, B, Cin) -> (Sum, Cout) {
x1: XOR; a1: AND;
x2: XOR; a2: AND;
o1: OR;
connect {
A -> x1.A; B -> x1.B;
A -> a1.A; B -> a1.B;
x1.O -> x2.A; Cin -> x2.B;
x1.O -> a2.A; Cin -> a2.B;
a1.O -> o1.A; a2.O -> o1.B;
x2.O -> Sum; o1.O -> Cout;
}
}2. 16 bit register
# clk must be high for two cycles to store a value
component Register16(In[16], clk) -> (Out[16]) {
>i[16]{
a1{i}: AND;
a2{i}: AND;
not1{i}: NOT;
nor1{i}: NOR;
nor2{i}: NOR;
}
connect {
>i[16]{
# Capture on clk
In[{i}] -> a1{i}.A;
In[{i}] -> not1{i}.A;
not1{i}.O -> a2{i}.A;
clk -> a1{i}.B;
clk -> a2{i}.B;
a1{i}.O -> nor1{i}.A;
a2{i}.O -> nor2{i}.A;
nor1{i}.O -> nor2{i}.B;
nor2{i}.O -> nor1{i}.B;
nor2{i}.O -> Out[{i}];
}
}
}3. 16-bit Ripple-Carry Adder
use fullAdder::{FullAdder};
component Adder16(A[16], B[16], Cin) -> (Sum[16], Cout) {
>i[16]{ fa{i}: FullAdder; }
connect {
A[1] -> fa1.A;
B[1] -> fa1.B;
Cin -> fa1.Cin;
fa1.Sum -> Sum[1];
>i[2,16]{
A[{i}] -> fa{i}.A;
B[{i}] -> fa{i}.B;
fa{i-1}.Cout -> fa{i}.Cin;
fa{i}.Sum -> Sum[{i}];
}
fa16.Cout -> Cout;
}
}
Show HN: Build Web Automations via Demonstration
Hey HN,
We’ve been building browser agents for a while. In production, we kept converging on the same pattern: deterministic scripts for the happy path, agents only for edge cases. So we built Demonstrate Mode.
The idea is simple: You perform your workflow once in a remote browser. Notte records the interactions and generates deterministic automation code.
How it works: - Record clicks, inputs, navigations in a cloud browser - Compile them into deterministic code (no LLM at runtime) - Run and deploy on managed browser infrastructure
Closest analog is Playwright codegen but: - Infrastructure is handled (remote browsers, proxies, auth state) - Code runs in a deployable runtime with logs, retries, and optional agent fallback
Agents are great for prototyping and dynamic steps, but for production we usually want versioned code and predictable cost/behavior. Happy to dive into implementation details in the comments.
Demo: https://www.loom.com/share/f83cb83ecd5e48188dd9741724cde49a
-- Andrea & Lucas, Notte Founders
Show HN: Git primitives for autonomous coding agents
Git Surgeon is a command-line tool that helps diagnose and fix common Git repository issues by analyzing the repository's state and providing suggested solutions.
Show HN: Daily Cat
Seeing HTTP Cats on the home page remind me to share a small project I made a couple months ago. It displays a different cat photo from Unsplash every day and will send you notifications if you opt-in.
Show HN: A MitM proxy to see what your LLM tools are sending
I built this out of curiosity about what Claude Code was actually sending to the API. Turns out, watching your tokens tick up in real-time is oddly satisfying.
Sherlock sits between your LLM tools and the API, showing you every request with a live dashboard, and auto-saved copies of every prompt as markdown and json.
Show HN: Using World Models for Consistent AI Filmmaking
The article discusses the growing demand for world models, which are computer-generated environments used in filmmaking, and how they are becoming increasingly sophisticated and realistic. It explores the technical and creative challenges involved in creating these virtual worlds, which are essential for modern visual effects and storytelling in the film industry.
Show HN: One Human + One Agent = One Browser From Scratch in 20K LOC
Related: https://simonwillison.net/2026/Jan/27/one-human-one-agent-on...
Show HN: We Built the 1. EU-Sovereignty Audit for Websites
The article discusses an audit of the European Union's policies and institutions, highlighting the need for greater transparency, accountability, and efficiency in the EU's governance. It emphasizes the importance of addressing concerns about the EU's democratic legitimacy and decision-making processes.
Show HN: I built a small browser engine from scratch in C++
Hi HN! Korean high school senior here, about to start CS in college.
I built a browser engine from scratch in C++ to understand how browsers work. First time using C++, 8 weeks of development, lots of debugging—but it works!
Features:
- HTML parsing with error correction
- CSS cascade and inheritance
- Block/inline layout engine
- Async image loading + caching
- Link navigation + history
Hardest parts:
- String parsing(html, css)
- Rendering
- Image Caching & Layout Reflowing
What I learned (beyond code):
- Systematic debugging is crucial
- Ship with known bugs rather than chase perfection
- The Power of "Why?"
~3,000 lines of C++17/Qt6. Would love feedback on code architecture and C++ best practices!
GitHub: https://github.com/beginner-jhj/mini_browser
Show HN: Shelvy Books
Hey HN! I built a little side project I wanted to share.
Shelvy is a free, visual bookshelf app where you can organize books you're reading, want to read, or have finished. Sign in to save your own collection.
Not monetized, no ads, no tracking beyond basic auth. Just a fun weekend project that grew a bit.
Live: https://shelvybooks.com
Would love any feedback on the UX or feature ideas!