Show stories

waszabi about 8 hours ago

Show HN: Empty Enter Expander – Type less in the terminal with this tool

When you have a lot of aliases it can be difficult to remember how was the one you need named especially if you do not use it very often. You can also have files stored in a bin folder and look there to find the name. Another trick is to prepend your commands with a comma then type the comma and hit the Tab key to see only your own commands. There is an article about it somewhere on the Internet.

I needed something lightweight to always show me the available commands. Something to run with a few keystrokes. Something that stores commands in files and folder structures.

The idea was born at the time of using Linux Debian with the dwm (dynamic window manager). The first version was implemented in bash and it could do three things: start an application, expand text from a template and do a predefined automation on the selected application.

It was launched by a keyboard shortcut and opened the list of commands in a new terminal window. The commands were stored in nested folders and it was able to switch between the three modes (launcher, expander, automator). It also required only few keystrokes to do the desired action.

For instance, I was in the terminal and hit Ctrl+P. It opened a new terminal and listed applications to launch. I hit the Space to switch to the expander mode. Then I hit the g to enter the Git folder and s for the status. The result was that it put the git status to the terminal I was in before. This expander could be used in any application. It could insert the email template into the browser.

Then I migrated to macOS and really missed that tool. So I quickly wrote a zsh vesrion that consists only the expander mode and supports only the terminal. It is activated by hitting Enter on empty command and then it inserts the desired command right into the prompt. For example, when you hit Enter, g and s you will get the git status command to the prompt and you can then execute it with Enter. Of course, those commands and keys are defined by you. There are various and lenghty commands that I use on a daily basis like this and it saves a lot of typing.

The tool is called Empty Enter Expander. It is implemented for the zsh as of now. Please check it out at https://github.com/waszabi/empty-enter-expander and let me know what you like or dislike about it.

32 13
Show HN: Formalizing Principia Mathematica using Lean
ndrwnaguib about 16 hours ago

Show HN: Formalizing Principia Mathematica using Lean

This project aims to formalize the first volume of Prof. Bertrand Russell’s Principia Mathematica using the Lean theorem prover. Throughout the formalization, I tried to rigorously follow Prof. Russell’s proof, with no or little added statements from my side, which were only necessary for the formalization but not the logical argument. Should you notice any inaccuracy (even if it does not necessarily falsify the proof), please let me know as I would like to proceed with the same spirit of rigour. Before starting this project, I had already found Prof. Elkind’s formalization of the Principia using Rocq (formerly Coq), which is much mature work than this one. However, I still thought it would be fun to do it using Lean4.

https://ndrwnaguib.com/principia/

https://github.com/ndrwnaguib/principia

github.com
140 29
Summary
Show HN: Magnitude – open-source, AI-native test framework for web apps
anerli about 18 hours ago

Show HN: Magnitude – open-source, AI-native test framework for web apps

Hey HN, Anders and Tom here - we’ve been building an end-to-end testing framework powered by visual LLM agents to replace traditional web testing.

We know there's a lot of noise about different browser agents. If you've tried any of them, you know they're slow, expensive, and inconsistent. That's why we built an agent specifically for running test cases and optimized it just for that:

- Pure vision instead of error prone "set-of-marks" system (the colorful boxes you see in browser-use for example)

- Use tiny VLM (Moondream) instead of OpenAI/Anthropic computer use for dramatically faster and cheaper execution

- Use two agents: one for planning and adapting test cases and one for executing them quickly and consistently.

The idea is the planner builds up a general plan which the executor runs. We can save this plan and re-run it with only the executor for quick, cheap, and consistent runs. When something goes wrong, it can kick back out to the planner agent and re-adjust the test.

It’s completely open source. Would love to have more people try it out and tell us how we can make it great.

Repo: https://github.com/magnitudedev/magnitude

github.com
154 38
Summary
darajava 1 day ago

Show HN: I used OpenAI's new image API for a personalized coloring book service

I've had an idea for a long time to generate a cute coloring book based on family photos, send it to a printing service, and then deliver it to people.

Last month, when OpenAI's Sora was released for public use I (foolishly) thought I'd manually drag-and-drop each order’s photos into Sora's UI and copy the resulting images back into my system. This took way too much time (about an hour for each of the few books I made and tested with family and friends). It clearly wasn't possible to release this version because I’d be losing a huge amount of time on every order. So instead, I decided I'd finish off the project as best I could, put it "on ice," and wait for the API release.

The API is now released (quicker than I thought it'd be, too!) and I integrated it last night. I'd love your feedback on any and all aspects.

The market is mostly family-based, but from my testing of the physical book I've found that both adults and kids enjoy coloring them in (it's surprisingly cathartic and creative). If you would like to order one you can get 10% off by tapping the total price line item five times.

clevercoloringbook.com
229 122
Summary
jpiech about 16 hours ago

Show HN: GS-Calc – A modern spreadsheet with Python integration

Process large (e.g. 4GB+) data sets in a spreadsheet.

Load GB/32 million-row files in seconds and use them without any crashes using up to about 500GB RAM.

Load/edit in-place/split/merge/clean CSV/text files with up to 32 million rows and 1 million columns.

Use your Python functions as UDF formulas that can return to GS-Calc images and entire CSV files.

Use a set of statistical pivot data functions.

Solver functions virtually without limits for the number of variables.

Create and display all popular chart types with millions of data points instantly.

Suggestions for improvements are welcome (and often implemented quite quickly).

citadel5.com
85 14
Summary
lcolucci 1 day ago

Show HN: Lemon Slice Live – Have a video call with a transformer model

Hey HN, this is Lina, Andrew, and Sidney from Lemon Slice. We’ve trained a custom diffusion transformer (DiT) model that achieves video streaming at 25fps and wrapped it into a demo that allows anyone to turn a photo into a real-time, talking avatar. Here’s an example conversation from co-founder Andrew: https://www.youtube.com/watch?v=CeYp5xQMFZY. Try it for yourself at: https://lemonslice.com/live.

(Btw, we used to be called Infinity AI and did a Show HN under that name last year: https://news.ycombinator.com/item?id=41467704.)

Unlike existing avatar video chat platforms like HeyGen, Tolan, or Apple Memoji filters, we do not require training custom models, rigging a character ahead of time, or having a human drive the avatar. Our tech allows users to create and immediately video-call a custom character by uploading a single image. The character image can be any style - from photorealistic to cartoons, paintings, and more.

To achieve this demo, we had to do the following (among other things! but these were the hardest):

1. Training a fast DiT model. To make our video generation fast, we had to both design a model that made the right trade-offs between speed and quality, and use standard distillation approaches. We first trained a custom video diffusion transformer (DiT) from scratch that achieves excellent lip and facial expression sync to audio. To further optimize the model for speed, we applied teacher-student distillation. The distilled model achieves 25fps video generation at 256-px resolution. Purpose-built transformer ASICs will eventually allow us to stream our video model at 4k resolution.

2. Solving the infinite video problem. Most video DiT models (Sora, Runway, Kling) generate 5-second chunks. They can iteratively extend it by another 5sec by feeding the end of the 1st chunk into the start of the 2nd in an autoregressive manner. Unfortunately the models experience quality degradation after multiple extensions due to accumulation of generation errors. We developed a temporal consistency preservation technique that maintains visual coherence across long sequences. Our technique significantly reduces artifact accumulation and allows us to generate indefinitely-long videos.

3. A complex streaming architecture with minimal latency. Enabling an end-to-end avatar zoom call requires several building blocks, including voice transcription, LLM inference, and text-to-speech generation in addition to video generation. We use Deepgram as our AI voice partner. Modal as the end-to-end compute platform. And Daily.co and Pipecat to help build a parallel processing pipeline that orchestrates everything via continuously streaming chunks. Our system achieves end-to-end latency of 3-6 seconds from user input to avatar response. Our target is <2 second latency.

More technical details here: https://lemonslice.com/live/technical-report.

Current limitations that we want to solve include: (1) enabling whole-body and background motions (we’re training a next-gen model for this), (2) reducing delays and improving resolution (purpose-built ASICs will help), (3) training a model on dyadic conversations so that avatars learn to listen naturally, and (4) allowing the character to “see you” and respond to what they see to create a more natural and engaging conversation.

We believe that generative video will usher in a new media type centered around interactivity: TV shows, movies, ads, and online courses will stop and talk to us. Our entertainment will be a mixture of passive and active experiences depending on what we’re in the mood for. Well, prediction is hard, especially about the future, but that’s how we see it anyway!

We’d love for you to try out the demo and let us know what you think! Post your characters and/or conversation recordings below.

192 81
Show HN: Colanode, open-source and local-first Slack and Notion alternative
hakanshehu 2 days ago

Show HN: Colanode, open-source and local-first Slack and Notion alternative

Hey HN,

I'm Hakan, the founder of Colanode (https://github.com/colanode/colanode), an open-source, local-first collaboration app combining the best of Slack-style chats and Notion-style note-taking, fully self-hostable for complete data control. Here's a quick demo: https://www.youtube.com/watch?v=wp1hoSCEArg

As a heavy Notion user, I often found it tough to get my teams fully onboard since people naturally gravitate toward chat for quick interactions. Maintaining context between chat apps like Slack and documentation apps like Notion became increasingly frustrating. Switching contexts, losing track of information, and managing data across multiple tools created unnecessary friction.

This frustration led me to build Colanode, a single platform integrating structured notes and knowledge management with real-time chat. After building the first version, early feedback highlighted a critical issue: teams/organizations want full control over sensitive data, especially conversations. That's why I decided to open-source Colanode under an Apache 2.0 license, making it fully self-hostable so you can retain complete ownership and privacy over your data.

Colanode is built with simplicity and extensibility in mind, using only open-source tools and avoiding any vendor or cloud lock-in. It features a local-first architecture offering complete offline support. From a technical perspective, Colanode consists of a Node.js server API and an Electron desktop client, with mobile apps coming soon. Everything in Colanode is represented as a node (e.g., message, file, folder, chat, channel, database, record), each with specific attributes and permissions. All reads and writes performed by the desktop client happen locally within a SQLite database, and changes sync seamlessly via a synchronization engine built on top of SQLite, Postgres, and Yjs—a CRDT library for conflict resolution. The server then propagates these changes to other collaborators. You can self-host the server in any environment using Docker, Postgres, Redis, and any S3-compatible storage, and connect using the official desktop client, which supports simultaneous connections to multiple servers and accounts. This local-first approach also prepares us for future integrations with fully local LLMs, further enhancing privacy and performance.

I'd love your feedback and suggestions on Colanode. What features would you like to see? What would you change?

Thanks, looking forward to your thoughts!

github.com
136 45
Show HN: BugStalker - a modern Rust debugger
godzie about 21 hours ago

Show HN: BugStalker - a modern Rust debugger

BugStalker is an open-source tool that helps security researchers and developers detect and track software bugs. It provides a range of features, including automated bug discovery, root cause analysis, and vulnerability reporting, to assist in the software development and security assessment process.

github.com
110 15
Summary
Show HN: Faasta – A self-hosted Serverless platform for WASM-wasi-HTTP in Rust
alexboehm 1 day ago

Show HN: Faasta – A self-hosted Serverless platform for WASM-wasi-HTTP in Rust

I've just released an early version off my project I've been working on for a few months now and would love some feedback.

https://github.com/fourlexboehm/faasta

I was surprised there isn't yet an open/source and standards compliant way to host wasi-http functions in a way that takes advantages of WASM, a multi tenanted application.

If you're not familiar with wasi, Compared to something like AWS Lambda, this approach is much more efficient as a single process can serve 1000s of function invocations concurrently and asynchronously, instead requiring an entire VM.

This is still early days for the project, but feel free to download the cli utility with cargo install cargo-faasta.

Feel free to test deploying functions on my hosted instance at https://website.faasta.xyz.

The service is free to use and currently supports deployments via GitHub OAuth, with a limit of 10 functions per GitHub account.

github.com
86 30
Summary
slig 3 days ago

Show HN: Logiquiz – Daily Self-Referential Puzzles

Hey HN,

About twenty years ago, while I was in college, I first stumbled upon James Propp's Self-Referential Aptitude Test [1] and absolutely loved it. Ever since then, I've had this idea of turning that concept into a daily game, and I've finally built it: https://www.logiquiz.com/

The game interface checks each question against the answers you've given, so it doesn't spoil anything by giving away the answers.

There are five different tests each day, from very easy to very hard.

I'd love to hear what you think!

[1]: https://faculty.uml.edu/jpropp/srat-Q.txt

logiquiz.com
34 17
piotmni 2 days ago

Show HN: I built Lovable for text bots and mini apps

Hi HN,

During the last weeks, I've been working to create a system that allows you to convert prompts into chatbots and mini apps on platforms that everyone uses on a daily basis.

The first integrated platform is Telegram:

Telegram is a powerful platform with many integrations and features like bots, apps, games and even payments. So I thought it would be nice to make it easier to create these apps. I created a bot http://t.me/PlutonicDevBot.

Workflow is pretty simple: create or choose an existing bot and send a text/voice message about what to create just like you prompt everywhere. To get more instructions, use the /help and /howto commands.

Planning to create the same solution for Slack and Discord.

Thanks for taking a look. I would love to hear feedback.

https://plutonic.dev

x.com/PlutonicDev

[1] https://core.telegram.org/bots

[2] https://core.telegram.org/bots/webapps

plutonic.dev
41 16
Summary
Show HN: Infat – Declaritive application assocation manager for macOS
philocalyst 1 day ago

Show HN: Infat – Declaritive application assocation manager for macOS

Bello! Made this to help navigate the tumultuous process of navigating to a new machine on Mac when you have a number of custom utilities setup for editing particular files. This is designed to make that as easy as possible, and add some magic on top of that, like setting mailto handlers or anything else of that breed. Use XDG_CONFIG_HOME to keep it organized.

Credit to https://github.com/moretension/duti for the original inspiration for the project.

Happy to answer and help with whatever.

github.com
88 40
Show HN: VacayBuddy – PTO Management Inside Slack (Open Source)
letnaturebe about 9 hours ago

Show HN: VacayBuddy – PTO Management Inside Slack (Open Source)

I built VacayBuddy, an open-source Slack bot that makes managing Paid Time Off (PTO) simple and seamless — all from within Slack.

Features: • Request and track PTO directly from the Slack app home • Get notified and approve PTO via Slack modals • Admin dashboard to manage team PTO balances and history • Excel integration for migrating existing PTO data • Works out of the box with SQLite for local setup

We use it internally to manage our team’s PTO without emails or spreadsheets. Would love your feedback or contributions!

github.com
4 0
Summary
Show HN: Zev – Remember (or discover) terminal commands
dtnewman 2 days ago

Show HN: Zev – Remember (or discover) terminal commands

This article introduces Zev, an open-source data analysis tool that provides a simple, intuitive interface for working with structured data. Zev aims to make data analysis more accessible by abstracting away the complexities of data manipulation and visualization.

github.com
80 35
Summary
Show HN: I made my own TRMNL e-ink device
stavros 2 days ago

Show HN: I made my own TRMNL e-ink device

The article describes the process of building a custom TRMNL device, a portable computer terminal that can be used for various tasks. It covers the hardware components, software configuration, and the overall design and functionality of the device.

stavros.io
76 23
Summary
Xiione about 22 hours ago

Show HN: An interactive demo of QR codes' error correction

Hi HN! This is a hobby project of mine that recently landed me my first interview and helped me get my first internship offers.

Draw on a QR code, and the health bars will accurately display how close the QR code is to being unscannable. How few errors does it take to break a QR code? How many errors can a QR code handle? Counters at the bottom track your record minimum and maximum damage. (Can you figure out how to break a QR code with 0.0% damage to the actual data region?)

Also, click on the magnifying glass button to toggle between "draw mode" and "inspect mode". I encourage you to use your phone's camera to scan the code as you draw and undo/redo to verify that the code really does break when the app says it does.

I wrote the underlying decoder in C++, and it's compiled to WebAssembly for the website.

I hope you find it interesting.

qris.cool
14 2
Show HN: I built an AI that turns GitHub codebases into easy tutorials
zh2408 7 days ago

Show HN: I built an AI that turns GitHub codebases into easy tutorials

https://the-pocket.github.io/Tutorial-Codebase-Knowledge/

github.com
892 170
mratmeyer about 15 hours ago

Show HN: RSS Lookup – Find RSS Feeds for Any URL (Free, Open Source)

Hi everyone,

I built a tool a few years ago called RSS Lookup(https://www.rsslookup.com/) and recently made a few UI and feed detection updates to it. It's free, and allows you to paste any website URL and it automatically searches for feeds via meta tags, fallback paths, and hardcoded feeds for some popular sites.

It's also open-source(https://github.com/mratmeyer/rsslookup), has no ads, and doesn't track any URLs(uses Cloudflare Turnstile for abuse and Plausible for analytics).

Max

rsslookup.com
6 3
tomsaju about 22 hours ago

Show HN: SnipFast – Extract Highlighted Text from Physical Books

Hi HN,

I built SnipFast - a tool that helps you extract important text from an image of a page. If you’ve ever highlighted sections in a physical book and wanted to save or organize them digitally, SnipFast can help.

This came from my own frustration. I like to note down ideas when reading, but doing that from physical books was time-consuming. I looked for a tool to solve this, didn’t find one, so I built it.

SnipFast lets you:

- Automatically detect and extract highlighted text from a page photo (works with most highlighter pens and languages)

- Click on sentences in the image to manually pick exactly what you want to copy

It’s aimed at readers, students, researchers - basically anyone who annotates physical books and wants to keep those notes digitally.

Under the hood, it uses a custom ML model trained on highlight detection. The app runs on a Kotlin backend with a Postgres database. You can try it for free. Signup is required, but it’s minimal. I offer some credits upfront so people can test it out. After that, there’s a small payment required. The goal is mainly to prevent abuse and to validate whether this is a tool people find valuable enough to pay for.

The UI still needs work, and I’m mainly looking for feedback at this stage. I’d love to hear: does this solve a real problem for you? Was anything confusing? What would make it more useful?

link: https://snipfa.st

Thanks, Tom

snipfa.st
9 0
Summary
ozdemirdev about 12 hours ago

Show HN: Photo.codes – Free, privacy-first photo editor for the web

Hi! I built a free, privacy-first photo editor that runs entirely in your browser. No downloads, no accounts needed. It supports non-destructive editing and lets you create, save, and share custom presets easily. Covers most essential tools like exposure, contrast, color grading, and more. Would love to hear your feedback!

photo.codes
2 2
Summary
Show HN: Rowboat – Open-source IDE for multi-agent systems
segmenta 4 days ago

Show HN: Rowboat – Open-source IDE for multi-agent systems

Hi HN! We’re Arjun, Ramnique, and Akhilesh, and we are building Rowboat (https://www.rowboatlabs.com/), an AI-assisted IDE for building and managing multi-agent systems. You start with a single agent, then scale up to teams of agents that work together, use MCP tools, and improve over time - all through a chat-based copilot.

Our repo is https://github.com/rowboatlabs/rowboat, docs are at https://docs.rowboatlabs.com/, and there’s a demo video here: https://youtu.be/YRTCw9UHRbU

It’s becoming clear that real-world agentic systems work best when multiple agents collaborate, rather than having one agent attempt to do everything. This isn’t too surprising - it’s a bit like how good code consists of multiple functions that each do one thing, rather than cramming everything into one function.

For example, a travel assistant works best when different agents handle specialized tasks: one agent finds the best flights, another optimizes hotel selections, and a third organizes the itinerary. This modular approach makes the system easier to manage, debug, and improve over time.

OpenAI’s Agents SDK provides a neat Python library to support this, but building reliable agentic systems requires constant iterations and tweaking - e.g. updating agent instructions (which can quickly get as complex as actual code), connecting tools, and testing the system and incorporating feedback. Rowboat is an AI IDE to do all this. Rowboat is to AI agents what Cursor is to code.

We’ve taken a code-like approach to agent instructions (prompts). There are special keywords to directly reference other agents, tools or prompts - which are highlighted in the UI. The copilot is the best way to create and edit these instructions - each change comes with a code-style diff.

You can give agents access to tools by integrating any MCP server or connecting your own functions through a webhook. You can instruct the agents on when to use specific tools via ‘@mentions’ in the agent instruction. To enable quick testing, we added a way to mock tool responses using LLM calls.

Rowboat playground lets you test and debug the assistants as you build them. You can see agent transfers, tool invocations and tool responses in real-time. The copilot has the context of the chat, and can improve the agent instructions based on feedback. For example, you could say ‘The agent shouldn’t have done x here. Fix this’ and the copilot can go and make this fix.

You can integrate agentic systems built in Rowboat into your application via the HTTP API or the Python SDK (‘pip install rowboat’). For example, you can build user-facing chatbots, enterprise workflows and employee assistants using Rowboat.

We’ve been working with LLMs since GPT-1 launched in 2018. Most recently, we built Coinbase’s support chatbot after our last AI startup was acquired by them.

Rowboat is Apache 2.0 licensed, giving you full freedom to self-host, modify, or extend it however you like.

We’re excited to share Rowboat with everyone here. We’d love to hear your thoughts!

github.com
158 51
Summary
lorenzopalaia about 20 hours ago

Show HN: StackHound - Stop guessing repo's tech stack, analyze it in seconds

If you’ve ever explored GitHub repos and felt frustrated by how little you can tell about a project’s real tech stack — you’re not alone.

That’s exactly why I built StackHound https://www.producthunt.com/posts/stackhound

It goes beyond the GitHub API to scan dependency files and uncover the actual tools, frameworks, and languages a repo uses — whether it's built with React, Next.js, Tailwind, Flask, or Spring Boot.

Just drop in a GitHub username and repo to analyze it instantly. You can also use our /api/analyze endpoint to plug it into your own tools.

Try the live demo: https://stackhound.vercel.app/ Check out the code (open source!): https://github.com/lorenzopalaia/stackhound

Would love to hear your thoughts — what features would make StackHound even more useful for you?

stackhound.vercel.app
6 2
Summary
Show HN: My from-scratch OS kernel that runs DOOM
UnmappedStack 2 days ago

Show HN: My from-scratch OS kernel that runs DOOM

Hi there! I've been on-and-off working on TacOS for a few months, which follows some UNIX-derived concepts (exec/fork, unix-style VFS, etc) and is now able to run a port of Doom, with a fairly small amount of modifications, using my from-scratch libc. The performance is actually decent compared to what I expected. Very interested to hear your thoughts. Thank you!

github.com
313 81
Summary
Show HN: Index – New Open Source browser agent
skull8888888 3 days ago

Show HN: Index – New Open Source browser agent

Hey HN, Robert from Laminar (lmnr.ai) here.

We built Index - new SOTA Open Source browser agent.

It reached 92% on WebVoyager with Claude 3.7 (extended thinking). o1 was used as a judge, also we manually double checked the judge.

At the core is same old idea - run simple JS script in the browser to identify interactable elements -> draw bounding boxes around them on a screenshot of a browser window -> feed it to the LLM.

What made Index so good:

1. We essentially created browser agent observability. We patched Playwright to record the entire browser session while the agent operates, simultaneously tracing all agent steps and LLM calls. Then we synchronized everything in the UI, creating an unparalleled debugging experience. This allowed us to pinpoint exactly where the agent fails by seeing what it "sees" in session replay alongside execution traces.

2. Our detection script is simple but extremely good. It's carefully crafted via trial and error. We also employed CV and OCR.

3. Agent is very simple, literally just a while loop. All power comes from carefully crafted prompt and ton of eval runs.

Index is a simple python package. It also comes with a beautiful CLI.

pip install lmnr-index

playwright install chromium

index run

We've recently added o4-mini, Gemini 2.5 Pro and Flash. Pro is extremely good and fast. Give it a try via CLI.

You can also use index via serverless API. (https://docs.lmnr.ai/index-agent/api/getting-started)

Or via chat UI - https://lmnr.ai/chat.

To learn more about browser agent observability and evals check out open-source repo (https://github.com/lmnr-ai/lmnr) and our docs (https://docs.lmnr.ai/tracing/browser-agent-observability).

github.com
96 43
Summary
Show HN: Morphik – Open-source RAG that understands PDF images, runs locally
Adityav369 4 days ago

Show HN: Morphik – Open-source RAG that understands PDF images, runs locally

Hey HN, we’re Adi and Arnav. A few months ago, we hit a wall trying to get LLMs to answer questions over research papers and instruction manuals. Everything worked fine, until the answer lived inside an image or diagram embedded in the PDF. Even GPT‑4o flubbed it (we recently tried O3 with the same, and surprisingly it flubbed it too). Naive RAG pipelines just pulled in some text chunks and ignored the rest.

We took an invention disclosure PDF (https://drive.google.com/file/d/1ySzQgbNZkC5dPLtE3pnnVL2rW_9...) containing an IRR‑vs‑frequency graph and asked GPT “From the graph, at what frequency is the IRR maximized?”. We originally tried this on gpt-4o, but while writing this used the new natively multimodal model o4‑mini‑high. After a 30‑second thinking pause, it asked for clarifications, then churned out buggy code, pulled data from the wrong page, and still couldn’t answer the question. We wrote up the full story with screenshots here: https://docs.morphik.ai/blogs/gpt-vs-morphik-multimodal.

We got frustrated enough to try fixing it ourselves.

We built Morphik to do multimodal retrieval over documents like PDFs, where images and diagrams matter as much as the text.

To do this, we use Colpali-style embeddings, which treat each document page as an image and generate multi-vector representations. These embeddings capture layout, typography, and visual context, allowing retrieval to get a whole table or schematic, not just nearby tokens. Along with vector search, this could now retrieve exact pages with relevant diagrams and pass them as images to the LLM to get relevant answers. It’s able to answer the question with an 8B llama 3.1 vision running locally!

Early pharma testers hit our system with queries like "Which EGFR inhibitors at 50 mg showed ≥ 30% tumor reduction?" We correctly returned the right tables and plots, but still hit a bottleneck, we weren’t able to join the dots across multiple reports. So we built a knowledge graph: we tag entities in both text and images, normalize synonyms (Erlotinib → EGFR inhibitor), infer relations (e.g. administered_at, yields_reduction), and stitch everything into a graph. Now a single query could traverse that graph across documents and surface a coherent, cross‑document answer along with the correct pages as images.

To illustrate that, and just for fun, we built a graph of 100 Paul Graham’s essays here: https://pggraph.streamlit.app/ You can search for various nodes, (eg. startup, sam altman, paul graham and see corresponding connections). In our system, we create graphs and store the relevant text chunks along with the entities, so on querying, we can extract the relevant entity, do a search on the graph and pull in the text chunks of all connected nodes, improving cross document queries.

For longer or multi-turn queries, we added persistent KV caching, which stores intermediate key-value states from transformer attention layers. Instead of recomputing attention from scratch every time, we reuse prior layers, speeding up repeated queries and letting us handle much longer context windows.

We’re open‑source under the MIT Expat license: https://github.com/morphik-org/morphik-core

Would love to hear your RAG horror stories, what worked, what didn’t and any feedback on Morphik. We’re here for it.

github.com
193 43
Summary
Show HN: AI Overview Hider for Google
insin about 13 hours ago

Show HN: AI Overview Hider for Google

I'm not going to lie to you, HN, this is just a stylesheet distributed as a browser extension because it's the easiest way for me both use it and keep it updated on all my browsers and devices.

But if you want to hide AI Overviews on desktop and mobile and have someone keeping an eye out for new variants (which is the perfect word for Google finding new ways to infect their search results with them) and hiding them in a timely manner, I'm your man.

Google seem to be continually adding new variants - in the last 12 hours I've pushed releases to hide AI Overviews now appearing as sections in the middle of search results, and as "People also ask" questions.

If you'd rather just grab the CSS, you can get it from the GitHub repo linked to from the website.

soitis.dev
3 0
Summary
anonymousd3vil 1 day ago

Show HN: I Added Translation to My RSS Reader Project

This article discusses a reader project that aims to provide translation services for users, allowing them to translate text from one language to another. The project focuses on building an efficient and user-friendly platform to facilitate language translation.

rahuldshetty.github.io
28 7
Summary
somebee 3 days ago

Show HN: Node.js video tutorials where you can edit and run the code

Hey HN,

I'm Sindre, CTO of Scrimba (YC S20). We originally launched Scrimba to make video learning more interactive for aspiring frontend developers. So instead of passively watching videos, you can jump in an experiment with the code directly inside the video player. Since launch, almost two million people have used Scrimba to grow their skills.

However, one limitation is that we've only supported frontend code, as our interactive videos run in the browser, whereas most of our learners want to go fullstack—building APIs, handling auth, working with databases, and so forth.

To fix this, we spent the last 6 months integrating StackBlitz WebContainers into Scrimba. This enables a full Node.js environment—including a terminal, shell, npm access, and a virtual file system—directly inside our video player. Everything runs in the browser.

Here is a 2-minute recorded demo: https://scrimba.com/s08dpq3nom

If you want to see more, feel free to enroll into any of the seven fullstack courses we've launched so far, on subject like Node, Next, Express, SQL, Vite, and more. We've opened them up for Hacker News today so that you don't even need to create an account to watch the content:

https://scrimba.com/fullstack

Other notable highlights about our "IDE videos":

- Based on events (code edits, cursor moves, etc) instead of pixels

- Roughly 100x smaller than traditional videos

- Recording is simple: just talk while you code

- Can be embedded in blogs, docs, or courses, like MDN does here: https://developer.mozilla.org/en-US/curriculum/core/css-fund...

- Entirely built in Imba, a language I created myself: https://news.ycombinator.com/item?id=28207662

We think this format could be useful for open-source maintainers and API-focused teams looking to create interactive docs or walkthroughs. Our videos are already embedded by MDN, LangChain, and Coursera.

If you maintain a library or SDK and want an interactive video about it, let us know—happy to record one for free that you can use however you like.

Would love to answer any questions or hear people's feedback!

255 84
Show HN: VSCode-remote-glibc-patch – Patch legacy Linux to use VSCode Remote
hsfzxjy about 21 hours ago

Show HN: VSCode-remote-glibc-patch – Patch legacy Linux to use VSCode Remote

This project provides pre-built artifacts to patch glibc on legacy Linux systems, enabling compatibility with the latest VSCode Remote extension.

github.com
3 1
Summary
Show HN: OpenWrt Configurator – Simple config management for OpenWrt devices
jasrusable about 21 hours ago

Show HN: OpenWrt Configurator – Simple config management for OpenWrt devices

I built a small CLI tool that makes it easy to provision configuration to multiple OpenWrt devices.

My goal is to later build a web-ui around this and make an open source UniFi Controller alternative for OpenWrt.

github.com
4 0
Summary