Show HN: Globstar – Open-source static analysis toolkit

51 points by sanketsaurav 5 hours ago

Hey HN! We’re Jai and Sanket, co-founders of DeepSource (YC W20). We're open-sourcing Globstar (https://github.com/DeepSourceCorp/globstar), a static analysis toolkit that lets you easily write and run custom code quality and security checkers in YAML [1] or Go [2].

After 5+ years of building AST-based static analyzers that process millions of lines of code daily at DeepSource, we kept hearing a common request from customers: "How do we write custom checks specific to our codebase?" AppSec and DevOps teams have a lot of learned anti-patterns and security rules they want to enforce across their orgs, and being able to do that without being a static analysis expert, came up as an important want.

We initially built an internal framework using tree-sitter [3] for our proprietary infrastructure-as-code analyzers, which enabled us to rapidly create new checkers. We realized that making the framework open-source could solve this problem for everyone.

Our key insight was that writing checkers isn't the hard part anymore. Modern AI assistants like ChatGPT and Claude are excellent at generating tree-sitter queries with very high accuracy. We realized that the tree-sitters' gnarly s-expression syntax isn’t a problem anymore (since the AI will be doing all the generation anyway), and we can instead focus on building a fast, flexible, and reliable checker runtime around it.

So instead of creating yet another DSL, we use tree-sitter's native query syntax. Yes, the expressions look more complex than simplified DSLs, but they give you direct access to your code's actual AST structure – which means your rules work exactly as you'd expect them to. When you need to debug a rule, you're working with the actual structure of your code, not an abstraction that might hide important details.

We've also designed Globstar to have a gradual learning curve: The YAML interface works well for simple checkers, and the Go Interface can handle complex scenarios when you need features like cross-file analysis, scope resolution, data flow analysis, and context awareness. The Go API gives you direct access to tree-sitter bindings, so you can write arbitrarily complex checkers on day one.

Key features:

- Written in Go with native tree-sitter bindings, distributed as a single binary

- MIT-licensed

- Write all your checkers in a “.globstar” folder in your repo, in YAML or Go, and just run “globstar check” without any build steps

- Multi-language support through tree-sitter (20+ languages today)

We have a long way to go and a very exciting roadmap for Globstar, and we’d love to hear your feedback!

[1] https://globstar.dev/guides/writing-yaml-checker

[2] https://globstar.dev/guides/writing-go-checker

[3] https://tree-sitter.github.io/tree-sitter/

markrian 2 hours ago

Interesting! Do you have a page which compares globstar against other similar tools, like Semgrep, ast-grep, Comby, etc?

For instance, something like https://ast-grep.github.io/advanced/tool-comparison.html#com....

  • sanketsaurav an hour ago

    Not at the moment, but we'll put something up soon.

    We're focused on keeping globstar light-weight, so a hosted runtime is not in the roadmap (although we'll add support for running Globstar checkers natively on our commercial product DeepSource). You should be able to write any checkers in Globstar that you can write in the other tools you've listed.

    Our goal is to make it very easy to write these checkers — so we'd be optimizing the runtime and our Go API for that.

xxpor 4 hours ago

Another rule engine checker that doesn't support the language that needs this type of thing the most: C

In this case, it's inexplicable to me since tree-sitter supports C fine.

etyp 2 hours ago

I really love that static analyzers are pushing in this direction! I loved writing Clippy lints and I think applying that "it's just code" with custom checks is a powerful idea. I worked on a static analysis product and the rules for that were horrible, I don't blame the customers for not really wanting to write them.

Is there a general way to apply/remove/act on taint in Go checkers? I may not be digging deeply enough but it seems like the example just uses some `unsafeVars` map that is made with a magic `isUserInputSource` method. It's hard for me to immediately tell what the capabilities there are, I bet I'm missing a bit.

  • sanketsaurav an hour ago

    Thanks! We still have a long way to go and a pretty extensive roadmap.

    > Is there a general way to apply/remove/act on taint in Go checkers? I may not be digging deeply enough but it seems like the example just uses some `unsafeVars` map that is made with a magic `isUserInputSource` method. It's hard for me to immediately tell what the capabilities there are, I bet I'm missing a bit.

    Assuming you're looking at the guide [1], the `isUserInputSource` is just a partial example and not a magic method (we probably should have used a better example there).

    The AST for each node along with the context are exposed in the `analysis.Pass` object [2]. We don't have an example for taint analysis, but here's an example [3] of state tracking that can be used to achieve this. This is a little tedious at the moment and you'll have to do the heavy-lifting in the Go code — but this is on our roadmap to improve. We want to expose a lot more helpers to make doing things like taint analysis easily.

    Here's another idea [4] we're exploring to make the YAML interface more powerful: adding support for utilities (like entropy calculation) that you can call and perform a comparison.

    [1] https://globstar.dev/guides/writing-go-checker#_1-complex-pa...

    [2] https://globstar.dev/reference/checker-go#analysis-function

    [3] https://globstar.dev/reference/checker-go#state-tracking

    [4] https://github.com/DeepSourceCorp/globstar/issues/27

  • injuly 2 hours ago

    Flow analysis, especially propagation, is a hard problem to solve in the general case. IMO, the one tool that had the best, if language-specific, approach was Pyre – Facebook's type checker and static analyzer for Python.

pdimitar 4 hours ago

Wow this looks great. I will be giving it a go VerySoon™!

Looking forward to writing some enhanced linters.

codepathfinder 3 hours ago

Nothing comes closer to CodeQL!

If anyone is interested please checkout, codepathfinder.dev, truly opensource CodeQL alternative.

Feedbacks are appreciated!

  • injuly an hour ago

    Admirable effort :)

    But in its current state I don't think it actually replaces any of CodeQL's use cases. The most straight forward way to do what CodeQL does today, would to be implement a flow analysis IR (say CFG+CallGraph) on top of tree-sitter.

    Even the QL grammar itself can be in tree-sitter.

    • codepathfinder an hour ago

      Thanks for the feedback. That's the exact plan :raised_hands:

      current state of codepathfinder is less than 5% of what codeql has implemented. As security engineer, I personally use it and i'll keep adding + closing the gap.

      Feel free to contribute ideas/feedback/bugs. Super appreciable honestly!