Home

LLMs work best when the user defines their acceptance criteria first
dnw about 9 hours ago

LLMs work best when the user defines their acceptance criteria first

The article discusses the limitations of large language models (LLMs) in writing correct code, highlighting their tendency to produce code with bugs or syntax errors. It emphasizes the importance of understanding the underlying principles and limitations of these models when using them for programming tasks.

blog.katanaquant.com
214 173
Summary
UUID package coming to Go standard library
soypat about 8 hours ago

UUID package coming to Go standard library

The article discusses a potential issue with the Go programming language, where the new hash function in Go 1.19 may cause compatibility issues with existing code that relies on the old hash function. The discussion explores potential solutions and the impact on the Go ecosystem.

github.com
175 97
Summary
AI Error May Have Contributed to Girl's School Bombing in Iran
apolloartemis about 5 hours ago

AI Error May Have Contributed to Girl's School Bombing in Iran

An exclusive report on a devastating AI error that led to a bombing at a girls' school in an undisclosed location, resulting in significant casualties and sparking widespread concern over the safety and reliability of AI systems.

thisweekinworcester.com
38 14
Summary
Show HN: I open-sourced my Steam game, 100% written in Lua, engine is also open
delduca about 11 hours ago

Show HN: I open-sourced my Steam game, 100% written in Lua, engine is also open

Homebrew engine https://github.com/willtobyte/carimbo

github.com
18 10
Summary
Trump has privately shown serious interest in U.S. ground troops in Iran
johnbarron about 5 hours ago

Trump has privately shown serious interest in U.S. ground troops in Iran

The article reports that former President Trump privately expressed serious interest in using U.S. ground troops against Iran, according to current and former U.S. officials. This revelation comes amid tensions between the U.S. and Iran in the final weeks of Trump's presidency.

nbcnews.com
11 3
Summary
Wild crows in Sweden help clean up cigarette butts
jhncls about 12 hours ago

Wild crows in Sweden help clean up cigarette butts

A study in Sweden has found that wild crows are capable of collecting and disposing of cigarette butts, demonstrating their potential to aid in environmental cleanup efforts. The crows were trained to deposit cigarette butts into a dispenser in exchange for a food reward, highlighting their ability to be used as natural cleanup crews.

samodobrevijesti.com
10 4
Summary
Grammarly is using our identities without permission
LordAtlas about 6 hours ago

Grammarly is using our identities without permission

The article discusses Grammarly's use of AI technology to improve writing and grammar checking, including insights from an AI expert on the company's approach and the potential benefits and limitations of its AI-powered tools.

theverge.com
7 1
Summary
Ships in Gulf declare themselves Chinese to dodge attack
KnuthIsGod about 5 hours ago

Ships in Gulf declare themselves Chinese to dodge attack

The article discusses the impact of the COVID-19 pandemic on the global economy, highlighting its uneven effects across different sectors and regions, as well as the challenges and uncertainties faced by policymakers in navigating the recovery.

ft.com
7 0
Summary
Show HN: Llama 3.2 3B and Keiro Research achieves 85% on SimpleQA
mannybruv about 2 hours ago

Show HN: Llama 3.2 3B and Keiro Research achieves 85% on SimpleQA

ran this over the weekend. stack was Llama 3.2 3B running locally + Keiro Research API for retrieval.

85.0% on 4,326 questions. where that lands:

ROMA (357B): 93.9% OpenDeepSearch (671B): 88.3% Sonar Pro: 85.8% Llama 3.2 3B + Keiro: 85.0%

the systems ahead of us are running models 100-200x larger. that's why they're ahead. not better retrieval, not better prompting — just way more parameters.

the interesting part is how small the gap is despite that. 3 points behind a 671B model. 0.8 behind Sonar Pro. at some point you have to ask what you're actually buying with all that compute for this class of task.

Want to know how low the reader model can go before it starts mattering. in this setup it clearly wasn't the limiting factor and also if smaller models with web enabled will perform as good( if not better) as larger models for a lot of non coding tasks

Full benchmark script + results --> https://github.com/h-a-r-s-h-s-r-a-h/benchmark

Keiro research -- https://www.keirolabs.cloud/docs/api-reference/research

keirolabs.cloud
6 1
Summary
Show HN: Context-compact – Summarize agent context instead of truncating it
EmptyDrum about 8 hours ago

Show HN: Context-compact – Summarize agent context instead of truncating it

agents. Not a custom truncation strategy, not a sliding window, not dropping old messages and hoping for the best. The failure mode is well-understood: your context window fills up, you truncate from the top, and the agent loses the thread. It forgets the task it was working on, the file path it just wrote to, the UUID it needs to reference. The conversation breaks. The problem is everyone keeps solving it by throwing away information instead. Truncation is fast to implement and quietly wrong. The agent appears to work until it doesn't, and debugging context loss in a long-running session is painful. context-compact summarizes old messages via your LLM of choice and replaces them with a compact summary. Fires automatically at 85% context utilization. Preserves UUIDs, file paths, and URLs verbatim so identifiers survive compaction. Handles histories longer than the summarization model's own context window by chunking sequentially with a running summary carried forward. Works with Anthropic, OpenAI, or any SDK. Zero dependencies.

github.com
6 2
Summary
Show HN: I built a tool to manage work and personal Git repos
tomquirk about 4 hours ago

Show HN: I built a tool to manage work and personal Git repos

I finally got fed up with making work + personal Git repos work on my machine.

Ever clone a work repository and accidentally make commits using your personal email? Or forget how to make your work repos use your work SSH key? Yeah... same.

So I built the tool I've always wanted. GitPersona - a CLI tool for managing many git profiles on a single machine.

Shoutout to my boy Codex for helping me finally ship this.

github.com
6 1
Summary
AI and the Illegal War
interpol_p about 7 hours ago

AI and the Illegal War

This article explores the potential role of AI in illegal wars, highlighting concerns about its use in surveillance, targeting, and propaganda. It raises ethical questions about the implications of AI-powered weapons and the need for international regulation to prevent misuse.

buttondown.com
5 0
Summary
Show HN: key-carousel - Key rotation for LLM agents
EmptyDrum about 12 hours ago

Show HN: key-carousel - Key rotation for LLM agents

I think in-process key management is the right abstraction for multi-key LLM setups. Not LiteLLM, not a Redis queue, not a custom load balancer.

The failure modes are well-understood: a key gets rate-limited, you wait, you try the next one. Billing errors need a longer cooldown than rate limits. This is not a distributed systems problem — it's a state machine that fits in a library. The problem is everyone keeps solving it with infrastructure instead. Spin up LiteLLM, now you have a Python service to maintain. Reach for Redis, now you have a database for a problem that doesn't need one. key-carousel manages a pool of API key profiles with exponential-backoff cooldowns: 1min → 5min → 25min → 1hr for rate limits, 5hr → 24hr for billing. Falls back to OpenAI or Gemini when Anthropic keys are exhausted. Optional file persistence. Zero dependencies.

github.com
5 1
Summary
Show HN: Agent Office – Slack for (OpenClaw Like) AI Agents
arbayi about 11 hours ago

Show HN: Agent Office – Slack for (OpenClaw Like) AI Agents

The article discusses the development of an 'Agent Office' system, which is a web application that allows users to manage real estate agents and their activities. The system includes features such as agent scheduling, task management, and performance tracking.

github.com
5 1
Summary
Show HN: Sheila, an AI agent that replaced our accounting flow
knewter about 11 hours ago

Show HN: Sheila, an AI agent that replaced our accounting flow

Soapbox, a social media platform, has announced the launch of Sheila, a new feature that allows users to create and share audio messages with their followers. Sheila aims to provide a more personal and engaging way for users to connect and communicate on the platform.

soapbox.pub
5 2
Summary
Armed robots take to the battlefield in Ukraine war
dabinat about 6 hours ago

Armed robots take to the battlefield in Ukraine war

The article discusses the potential impact of the COVID-19 pandemic on the global economy, including concerns about rising inflation, supply chain disruptions, and the possibility of a recession. It explores the challenges faced by governments and central banks in navigating the economic uncertainty.

bbc.com
4 0
Summary
ClaudeSmalltalk: An MCP implementation to interact with Smalltalk images
mpweiher about 4 hours ago

ClaudeSmalltalk: An MCP implementation to interact with Smalltalk images

The article discusses the ClaudeSmalltalk project, an open-source Smalltalk-based programming language and development environment that aims to be a modern, flexible, and powerful alternative to other Smalltalk implementations.

github.com
4 0
Summary
Show HN: WebBridge turns any website into MCP tools by recording browser traffic
nonstopnonsense about 11 hours ago

Show HN: WebBridge turns any website into MCP tools by recording browser traffic

I am a 40+-year-old-slightly-techie-middle-aged-man who occasionally writes code to make life easier. I was a developer once - a very long time ago. I am an Engineer by degree and my instinct goes for solution. I work in tech - but on the "_request_ developers what to build" side, not the "actually build it" side. With AI, I am now able to build more.

So I built WebBridge (yes - not so fancy name there). (Well - Claude built it. I directed. Like every PM does.)

What it actually does:

1. You install a Chrome extension 2. You browse to a site you're logged into - your library, your lab results portal, whatever 3. You click Record, do the thing you want to automate, click Stop 4. Claude reads the captured API traffic and generates a permanent MCP server 5. That server works with any MCP client - Claude (Cowork/Code), Cursor, VS Code, Windsurf, Cline, you name it

The whole thing takes about 10 minutes. No code written by you.

This is for non-tech folks who live inside AI providers, who just want to use it and move on. Legal analysts, market researchers, market watchers, marketing and competitive intelligence, and anyone who wants to use a specific website for a specific purpose, repeatedly.

The README has some use cases showcased: "Public Library Search" and "Legal Compliance Auditing."

There may not be an exact equivalent anywhere to what I purpose-built. I'd welcome being proven wrong on that.

Feedback is welcome - that's why I'm posting.

github.com
4 2
Summary
Show HN: Stopping OpenClaw from breaking your mails
HalfEmptyDrum about 4 hours ago

Show HN: Stopping OpenClaw from breaking your mails

This project puts a lightweight layer in between your gmail-account and OpenClaw. When OpenClaw sends an e-mail, it can only send drafts. You can then in your e-mail program either send the mail or adjust the draft and send it back to OpenClaw. This makes OpenClaw much more usable.

github.com
4 0
Summary
The Dutch Revolt Was Europe's First Bourgeois Revolution
PaulHoule about 10 hours ago

The Dutch Revolt Was Europe's First Bourgeois Revolution

The article explores the Dutch Revolt against Spanish rule in the 16th century, examining how it was a bourgeois revolution driven by the rising Dutch merchant class seeking political and economic autonomy, as well as religious and cultural transformation during the Protestant Reformation.

jacobin.com
4 0
Summary