Last night I stumbled on Auto Rust project:
auto-rust is a Rust procedural macro that harnesses the power of Large Language Models (LLMs) to generate code at compile time. Write a function signature, add a doc comment describing the desired functionality, and let auto-rust fill in the implementation details!
I previously wrote about the need to leverage LLMs to raise the abstraction:
I consider that the current practice of checking in GPT generated code, is akin to something like say checking in assembly code generated out by the compiler rather than the higher abstraction source code that we actually wrote.
This crate strikes at the heart of this issue. Of course, to be production ready, it will have to be coupled with a code generating LLM that is stable so that the output is predictable. For development, I think it must have the ability to show generated code so that developers can reason about it in order to improve the prompt. But “hands off” for sure - no manual editing of the generated code. Finally, in a world where everything is generated, tests should be generated as well. The problem I see is the same when the person writing the test is the person writing the code: when writing tests, the person has to really try to think from the behavior rather than implementation point of view and this is often hard. To solve it, we could tune the models differently or maybe even going to the extreme of building two models: one for code writing and one for code testing? Which reminds me of clean-room design, a reverse engineering technique, where the only information passing between the reverse engineering team (tests) and implementation team (code) is specification.
I still think that we will eventually build a from-first-principles programming model, language and tooling that leverages LLM to the fullest rather than leveraging them for code generation on steroids (with a lot of bad side-effects). Maybe this is the moment when Literate Programming finally gets the wide adoption it so deserves? I always had a soft spot for it and have built some small systems and modules with it using CoffeeScript. I was very happy with the development experience and honestly, 10 years after the fact, reading this example, I feel that is stands as the testament of the wisdom of Literate Programming. Sure - it has a ratio of comments to code higher than 1.0 but… isn’t that exactly what we often find ourselves doing when chatting with GPTs and leveraging LLMs? Imagine the same thing but without the loss of supporting artifacts, allowing us to reason about the behavior, not only at the time of building it but forever afterwards?
Last modified on 2024-09-01