Show HN: Musoq – Query Anything with SQL Syntax (Git, C#, CSV, Can DBC)
Puchaczov Wednesday, December 18, 2024Hey, For those of you who don't know my little tool Musoq, I wanted to introduce it as a small tool that allows you to query with SQL-like syntax without any database.
It allows you to query various things from niche ones like CAN DBC files, weird ones like C# code, interesting ones with Git querying to regular stuff like CSV, TSV and various others.
I am quite a bit experimenting with various things so I'm hybridizing the engine with LLMs or doing other weird stuff that are more or less practical :-)
I wanted also to share some recent developments in this little project as I hope it might be interesting to some of you.
New Experimental Plugins: * Git Plugin (Beta): I've been working on Git repository querying - managed to test it on the EF Core repo (16k commits) and it seems to work okay * Roslyn Plugin (Beta): Added basic C# code analysis capabilities
For the very first time: I've extended CROSS APPLY to use computed results as arguments! Now the operator can use values from the current row as inputs. Here's an example:
SELECT
f.DirectoryName,
f.FileName
FROM #os.directories('/some/path', false) d
CROSS APPLY #os.files(d.FullName, true) f
WHERE d.Name IN ('Folder1', 'Folder2')
After another pack of fixes I'm finally able to query multiple git repositories AT ONCE! with ProjectsToAnalyze as (
select
dir2.FullName as FullName
from #os.directories('D:\repos', false) dir1
cross apply #os.directories(dir1.FullName, false) dir2
where
dir2.Name = '.git'
)
select
c.Message,
c.Author,
c.CommittedWhen
from ProjectsToAnalyze p cross apply #git.repository(p.FullName) r
cross apply r.Commits c
where c.AuthorEmail = 'my-email@email.ok'
order by c.CommittedWhen desc
Under the Hood:
- Added a Buckets feature for memory management (currently just testing it with the Roslyn plugin)- Moved to .NET 8
- Added CROSS/OUTER APPLY operators
- Made some improvements to error messages and runtime behavior
New piping features: I've been experimenting with piping capabilities: * Image Analysis with LLMs:
./Musoq.exe image encode "image.jpg" | ./Musoq.exe run query "select s.Shop, s.ProductName, s.Price from ..."
* Text Data Extraction: Get-Content "ticket.txt" | ./Musoq.exe run query "select t.TicketNumber, t.CustomerName ... from #stdin.text('Ollama', 'llama3.1') t"
* Data Source Combination: { docker image ls; ./Musoq.exe separator; docker container ls } | ./Musoq.exe run query "..."
I'm working on comprehensive documentation:
I encourage you especially to look at section "Practical Examples and Applications" and "Data Sources" where you can look at all the tables the tool currently provides. <https://puchaczov.github.io/Musoq/>Other Changes:
- Made some improvements to OS and Archive data sources (OS can now query metadata like EXIF)
- Added a few fields to CAN DBC plugin
- Command outputs can now be used as inputs for queries
I'm hoping to:
- Improve stability and add more tests
- Flesh out the documentation
- Work on package distribution (Scoop, Ubuntu packages)
- Share some examples of source code querying with Roslyn
Ideas for later:
- WHERE robust analysis and optimizations
- DISTINCT operator implementation
- PROTOBUF schema support
- Performance improvements
- Query parallelization
- Recursive CTEs
- Subqueries
I'd really appreciate any thoughts or feedback!
The documentation section where I write a short analysis of EF Core with git plugin: <https://puchaczov.github.io/Musoq/practical-examples-and-app...>