So many mentions of CHERI - both in this post and in the linked CACM article. I doubt CHERI will be the future considering how long it's been around for and how few actually successes have come out of it.
Also, hilariously, Fil-C is faster than CHERI today (the fastest CHERI silicon available today will be slower than Fil-C running on my laptop, probably by an order of magnitude). And Fil-C is safer.
Which sort of brings up another issue - this could be a case where Google is trying to angle for regulations that support their favorite strategy while excluding competition they don't like.
What is the distinction between this approach and Address Sanitizer https://clang.llvm.org/docs/AddressSanitizer.html ? If I understand correctly, Fil-C is a modified version of LLVM. Is your metadata more lightweight, catches more bugs? Could it become a pass in regular LLVM?
For example, asan will totally let you access out of bounds of an object. Say buf[index] is an out-of-bounds access that ends up inside of another object. Asan will allow that. Fil-C won't. That's kind of a key spacial safety protection.
Asan is for finding bugs, at best. Fil-C is for actually making your code memory safe.
CHERI is a hardware architecture and instruction set to add safety-related capabilities to processors. See https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/
In this context a capability means a way to track and enforce which memory area a pointer can point into. Typically this has to be coupled with a compiler which will initialize the capability for each pointer.
> Fil-C is currently 1.5x slower than normal C in good cases, and about 4x slower in the worst cases. I'm actively working on performance optimizations for Fil-C, so that 4x number will go down.
I am pretty sure you cannot go much lower than 1.2 here on the best cases. In contrast, CHERI on good hardware will easily be as close to current performance as possible.
Not really buying your thesis here: Attempts to retrofit safely to C have been around a lot longer than Cheri; fil-c is just the latest and there's no obvious reason why it should be more successful.
The speed comparison with a laptop is just disingenuous. Is a device with Cheri integrated slower than one of the same class without?
> Attempts to retrofit safely to C have been around a lot longer than Cheri; fil-c is just the latest and there's no obvious reason why it should be more successful.
It is true that memory-safe C compilers have existed for decades and have seen minimal adoption.
However, improvements to clang/llvm could yield wider impact and benefit than previous efforts, since they may be supported in a widely used C toolchain.
-fbounds-safety is another change that may see more adoption if it makes it into mainline clang/llvm
> Not really buying your thesis here: Attempts to retrofit safely to C have been around a lot longer than Cheri; fil-c is just the latest and there's no obvious reason why it should be more successful.
Here's the difference with Fil-C: It's totally memory safe and fanatically compatible with C/C++, no B.S. The other approaches tend to either create a different language (not compatible), or tend to create a superset of C/C++ with a safe subset (not totally memory safe, only compatible in the sense that unsafe code continues to be unsafe), or just don't fully square the circle (for example SoftBound could have gotten to full compatibility and safety, but the implementation just didn't).
> The speed comparison with a laptop is just disingenuous. Is a device with Cheri integrated slower than one of the same class without?
Not disingenuous at all.
The issue is that:
- high volume silicon tends to outperform low volume silicon. Fil-C runs on x86_64 (and could run on ARM64 if I had the resources to regularly test on it). So, Fil-C runs on the high volume stuff that gets all of the best optimizations.
- silicon optimization is best done incrementally on top of an already fast chip, where the software being optimized for already runs on that chip. Then it's a matter of collecting traces on that software and tuning, rather than having to think through how to optimize a fast chip from first principles. CHERI means a new register file and new instructions so it doesn't lend itself well to the incremental optimization.
So, I suspect Fil-C will always be faster than CHERI. This is especially true if you consider that there are lots of possible optimizations to Fil-C that I just haven't had a chance to land yet.
I've been really impressed with what you're doing with Fil-C, but:
> Here's the difference with Fil-C: It's totally memory safe and fanatically compatible with C/C++, no B.S.
Is this true on both counts? If I'm reading your docs right, you're essentially adding hidden capabilities to pointers. This is a great technique that gives you almost perfect machine-level compatibility by default, but it comes with the standard caveats:
1. Your type safety/confusion guards are essentially tied to pointer "color," and colors are finite. In other words, in a sufficiently large program, an attacker can still perform type confusion by finding types with overlapping colors. Not an issue in small codebases, but maybe in browser- or kernel-sized ones.
2. In terms of compatibility, I'm pretty sure this doesn't allow a handful of pretty common pointer-integer roundtrip operations, at least not without having the user/programmer reassign the capability to the pointer that's been created out of "thin air." You could argue correctly that this is a bad thing that programmers shouldn't be doing, but it's well-defined and common enough IME.
(You also cited my blog's writeup of `totally_safe_transmute` as an example of something that Fil-C would prevent, but I'm not sure I agree: the I/O effect in that example means that the program could thwart the runtime checks themselves. Of course, it's fair to say that /proc/self/mem is a stunt anyways.)
> My type safety and confusion guards have nothing to do with colors of any kind. There are no finite colors to run out of.
I'm having trouble seeing where the type confusion protection properties come from, then. I read through your earlier (I think?) design that involved isoheaps and it made sense in that context, but the newer stuff (in `gimso_semantics.md` and `invisicap.txt`) seems to mostly be around bounds checking instead. Apologies if I'm missing something obvious.
> I’m citing unsafe transmute as something that Fil-C doesn’t prevent. It’s not something that memory safety prevents.
I think the phrasing is confusing, because this is what the manifesto says:
> No program accepted by the Fil-C compiler can possibly go on to escape out of the Fil-C type system.
This to me suggests that Fil-C's type system detects I/O effects, but I take it that wasn't the intended suggestion.
Short answer: if you know how CHERI and SoftBound do it, then Fil-C is basically like that.
Long answer: let's assume 64-bit (8 byte pointers) without loss of generality. Each capability knows, for each 8 bytes in its allocation, whether those 8 bytes are a pointer, and if so, what that pointer's capability is.
Example:
char* p = malloc(64);
This will allocate 64 bytes. p's capability will know, for each of the 8 8-byte slots, if that slot is a pointer and if so, what it's capability is. Since you just allocated the object, none of them have capabilities.
Then if you do:
*(int**)(p + 8) = malloc(sizeof(int));
Then p's capability will know that at offset 8, there is a pointer, and it will know that the capability is whatever came out of the malloc.
Hence, each capability is dynamically tracking where the pointers are. So it's not a static type but rather something that can change over time.
There's a bunch of engineering that goes into this being safe under races (it is) and for supporting pointer atomics (they just work).
I read the Fil-C overview, and I was confused by one thing: how does Fil-C handle integer-to-pointer conversions? Rust has the new strict provenance API that is somewhat explicitly designed to avoid a need to materialize a pointer capability from just an integer, but C and C++ have no such thing. So if the code does:
int deref(uintptr_t p)
{
return *(int*)p;
}
Does this fail unconditionally? Or is there some trick by which it can succeed if p is valid? And, if the latter is the case, then how is memory safety preserved?
is at least a semi-reliable way to increment p by one. I guess this is a decent way to look like C and to keep a widely-used pattern functional. Rust’s with_addr seems like a more explicit and less magical way to accomplish the same thing. If Fil-C really takes off, would you want to add something like with_addr? Is allowing the pair of conversions on the same line of code something that can be fully specified and can be guaranteed to compile correctly such that it never accidentally produces a pointer with no capability?
The pair of conversions is guaranreed to always produce a pointer with a capability. That’s how I implemented it and it’s low-tech enough that it could be specified.
How far can the pair of conversions be pushed? Will this work:
(int*)(f((uintptr_t)p))
Does it matter if f is inline?
Could someone implement Rust’s with_addr as:
(int*)((uintptr_t)p, addr)
FWIW, I kind of like zptrtable, and I think Fil-C sounds awesome. And I’m impressed that you were able to port large code bases with as few changes as it seems to have taken.
Your first example will hilariously work if `f` is inline and simple enough and optimizations are turned on. I'm not sure I like that, so I might change that. I'd like to only guarantee the you get a capability in cases where that guarantee holds regardless of optimization (and I could achieve that with some more compiler hacking).
Not sure about the semantics of with_addr. But note that you can do this in Fil-C:
char* p = ...;
uintptr_t i = ...;
p -= (uintptr_t)p; // now p is NULL but still has its original capability
p += i; // now p points to whatever i's address was, but has p's original capability
I have a helper like that called `zmkptr` (it's just an inline function that does exactly the above).
And this needs to either result in a compiler error or generate some kind of code.
Rust’s with_addr wins points for being explicit and unambiguous. It obviously loses points for not being C. And Rust benefits here from all of this being in the standard library and from some of the widely-available tooling (miri) getting mad if code doesn’t use it. I can imagine a future Fil-Rust project doing essentially the same thing as Fil-C except starting with Rust code. It might be interesting to see how the GC part would interact with the rest of the language.
My compiler analysis says that if you have two possible pointers that a capability might come from, like in your first example, then you get no capability at all. I think that's a better semantics than picking some capability at random.
If you want to be explicit about where the capability comes from, use `zmkptr`.
> Here's the difference with Fil-C: It's totally memory safe and fanatically compatible with C/C++, no B.S. The other approaches tend to either create a different language (not compatible), or tend to create a superset of C/C++ with a safe subset (not totally memory safe, only compatible in the sense that unsafe code continues to be unsafe), or just don't fully square the circle (for example SoftBound could have gotten to full compatibility and safety, but the implementation just didn't).
That's exactly the kind of thing that the boosters of all those previous efforts said. But somehow it never quite worked out.
> That's exactly the kind of thing that the boosters of all those previous efforts said
I don't think this is true.
- D, Swift, Rust, Zig: different languages, and while they do have FFI, using it means you're only as safe as your C code
- CHERI: requires hardware support to be practical
- Checked C, CCured, ?SAFECode IIRC?: too expensive
- AddrSan@runtime, ARM MTE, SoftBound: mitigations with too many holes
I don't know of many (to be honest, can't think of any) other serious attempts at making a system that tries to cover all three of
Awful performance. Usually 2x worse than C and 4x worse in the worst case. Given the comment by Fil-C's creator minimizing the performance issue [0], I wouldn't get my hopes up.
I’ll summarize: language implementations get faster over time. Young ones tend to be slow. Fil-C is a young implementation that still has lots of unoptimized things. Also, Fil-C being 2x slower than C means it’s already faster than many safe languages. And, for a lot of C use cases perf doesn’t matter as much as the hype suggests.
The fact that young implementations are slow is something that’s worth understanding even if you don’t care about fil-C. It suggests, for example, that if someone invents a new language and their initial implementation is slow, then you can’t use that fact to assume that it’ll be slow forever. I think that’s generally a useful lesson.
I care about performance a lot and Fil-C has gotten about 100x faster since the first prototype. It’ll keep getting faster.
Here's one: even just switching from gcc or msvc to clang, in projects that really want to, takes years.
Here's another one: the Fil-C compiler is really young, so it almost certainly still has bugs. Those compilers that folks actually use in anger tend to get qualified on ~billions of lines of code before anyone other than the compiler devs touches them. The Fil-C compiler is too young to have that level of qualification.
So "immediately everywhere" isn't going to happen. At best it'll be "over a period of time and incrementally".
> That's exactly the kind of thing that the boosters of all those previous efforts said. But somehow it never quite worked out.
No, they really didn't. Let's review some of the big cases.
- SafeC: not based on a mainstream C compiler, so can't handle gcc/clang extensions (Fil-C can). Had no story for threads or shared memory (Fil-C does). Hence, not really compatible.
- CCured: incompatible (cannot compile C code with it without making changes, or running their tool that tries to automate the changes - but even then, common C idioms like unions don't quite work). Didn't use a major C compiler,
- SoftBound: not totally memory safe (no attempt to provide safety for linking or function calls). But at least it's highly compatible.
I can list more examples. Fil-C is the first to get both compatibility and safety right.
> Fil-C is the first to get both compatibility and safety right.
Has any impartial third party reached that conclusion? Because honestly the way I remember it everyone says this kind of thing when it's their own project, a lot of the people behind these previous efforts were just as confident as you are.
Not in any official capacity but it’s been looked at by other C compiler experts, other programming language experts, GC experts, and security experts. Folks who have looked at it deeply agree with those claims. And I hope they would have told me if they didn’t believe anything about my claims!
Also, it always had a material performance impact. People write C++, and to a lesser extent C, because they really, really care about performance. If they didn’t care about performance there are easier languages to use.
Talking about performance impact is missing the bigger picture of how languages become performant. "Really really care about performance" describes some C/C++ programmers, but definitely not all of them. Finally, Fil-C is already faster than a lot of other safe languages (definitely faster than TypeScript, yet lots of stuff ships in TypeScript).
Language implementations get faster over time and young ones tend to be slow. The Fil-C implementation is young. So were all of the previous attempts at memory-safe C - usually an implementation that had years of at most a few person years of investment (because it was done in an academic setting). Young implementations tend to be slow because the optimization investment hasn't happened in anger. So, "past academic attempts were slow" is not a great reason to avoid investigating memory safe C.
Performance focus is not the reason why all of the world's C/C++ code gets written. Maybe that's even a minority reason. Lots of stuff uses C/C++ because of reasons like:
- It started out in C/C++ so it continues to be in C/C++. So many huge projects are in this boat.
- You're critically relying on a library whose only bindings are in C/C++, or the C/C++ bindings are the most mature, or the most easy to use.
- You're doing low-level systems stuff, and having pointers that you can pass to syscalls is a core part of your logic.
- You want to play nice with the dynamic linking situation on the OS you're targeting. (C/C++ get dynamic linking right in a way other languages don't.)
I'd guess less than half of the C/C++ code that's being written today is being written because the programmer was thinking "oh man, this'll be too slow in any other language".
Finally, Fil-C is already faster than a lot of memory safe languages. It's just not as fast as Yolo-C, but I don't think you can safely bet that this will be true forever.
For CHERI to be fully safe, it basically needs a GC. They just call it something else. They need it to clean up capabilities to things that were freed, which is the same thing that Fil-C uses GC for.
How about incentives to write safe code even in C? They do not exist.
You are not rewarded for:
1) Formal proofs or careful programming. No one cares if a piece of software works quietly for years.
2) Preventing others from ruining a working piece of software. To the contrary, you will be called a gatekeeper and worse things.
You are rewarded for:
1) Wild ideas, quickly and badly implemented with the proper amount of marketing.
2) Churn, "social" coding, and LGTM.
3) Ironically, if you are a security researcher, finding exploits can help, too. As above, preventing exploits in the first place is regarded as a waste of time.
All of the above is true at Google. But of course they have a technical solution to a social problem. Which might catch one category of bugs at best.
Being completely serious, people will use whatever works. If what works is written in C, people will use it. The average person seriously doesn't care what language a thing is written in. The average person cares that the software in question works. Despite being written in C, most software today works reasonably well. Is it perfect? No. Will the rusty equivalent be perfect on day 1? No.
I can't help but think that those lazy mathematicians might benefit from a congressional order to clean up that twin prime problem too.
If memory safety was "just the right regulations" easy, it would have already been solved. Every competent developer loves getting things right.
I can imagine a lot more "compliance" than success may be the result of any "progress" with that approach.
The basic problem is challenging, but makes it hard-hard is the addition of a mountain of incidental complexity. Memory safety as a retrofit on languages, tools and code bases is a much bigger challenge than starting with something simple and memory safe, and then working back up to something with all the bells and whistles that mature tool ecosystems provide for squeaking out that last bit of efficiency. Programs get judged 100% on efficiency (how fast can you get this working? how fast does it run? how much is our energy/hardware/cloud bill?), and only 99% or so on safety.
If the world decided it could get by on a big drop in software/computer performance for a few years while we restarted with safer/simpler tools, change would be quick. But the economics would favor every defector so much that ... that approach is completely unrealistic.
It is going to get solved. The payoff is too high, and the pain is too great, for it not to. But not based on a concept of a plan or regulation.
> If memory safety was "just the right regulations" easy, it would have already been solved.
Memory safety is already a solved problem in regulated industries. It's not a hard problem as such. People just don't want to solve it and don't have any incentive to: companies aren't penalised for writing buggy software, and individual engineers are if anything rewarded for it.
> Every competent developer loves getting things right.
Unfortunately a lot of developers care more about being able to claim mastery of something hard. No-one gets cred for just writing the thing in Java and not worrying about memory issues, even though that's been a better technical choice for decades for the overwhelming majority of cases.
> Memory safety is already a solved problem in regulated industries. It's not a hard problem as such.
It's not hard, no, but it is expensive, because those regulations have a battery of tests run by a thirdy party that you will pay money to each time you want to recertify.
I've worked in two regulated industries; the recertification is the expensive part, not the memory errors.
The problem is a practical coding work efficiency (and quality) one. You are right that there are no intractable memory problems even in the unsafest least helpful languages.
Regulated industries have overwhelmingly boring and expensive software compared to others. They do things like banning recursion and dynamic arrays lol. Memory safety in every aspect possible just isn't worth it for most applications. And the degree of memory safety that is worth it is a lot less than Rust developers seem to think, and the degree of memory safety granted by Rust is less than they think as well.
Memory safety isn't worth it as long as leaking all your users' data (and granting attackers control over their systems) doesn't cost much. As attacks get more sophisticated and software gets more important, the costs of memory unsafety go up.
What you've said is true but I still think the problem is overblown, and solutions at the hardware level are disregarded in favor of dubious and more costly software rewrite solutions. If something like CHERI was common then it would automatically find most security-related memory usage bugs, and thus lead to existing software getting fixed for all hardware.
IME you can't reliably extract the intent from the C code, much less the binary, so you can't really fix these bugs without a human rewriting the source. The likes of CHERI might make exploitation harder, but it seems to me that ROP-style workarounds will always be possible, because fundamentally if the program is doing things that look like what it was meant to do then the hardware can never distinguish whether it's actually doing what it was meant to do or not. Even if you were able to come up with a system that ensured that standards-compliant C programs did not have memory bugs (which is already unlikely), that would still require a software rewrite approach in practice because all nontrivial C programs/libraries have latent undefined behaviour.
> IME you can't reliably extract the intent from the C code, much less the binary, so you can't really fix these bugs without a human rewriting the source.
I am pretty sure that the parent is talking about hardware memory safety which doesn't require any "human rewriting the source".
The same thing can be said about a Rust vector OOB panic or any other bug in any safe language. Bugs happen which is why programmers are employed in the first place!
> The same thing can be said about a Rust vector OOB panic or any other bug in any safe language. Bugs happen which is why programmers are employed in the first place!
Sure, the point is you're going to need the programmer either way, so "hardware security lets us detect the problem without rewriting the code" isn't really a compelling advantage for that approach.
If a program halts, that is a narrow security issue that will not leak data. Humans need to fix bugs, but that is nothing new. A memory bug with such features would be hardly more significant than any other bug, and people would get better at fixing them over time because they would be easier to detect.
> If a program halts, that is a narrow security issue that will not leak data.
Maybe. Depends what the fallback for the business that was using it is when that program doesn't run.
> Humans need to fix bugs, but that is nothing new. A memory bug with such features would be hardly more significant than any other bug
Perhaps. But it seems to me that the changes that you'd need to make to fix such a bug are much the same changes that you'd need to make to port the code to Rust or what have you, since ultimately in either case you have to prove that the memory access is correct. Indeed I'd argue that an approach that lets you find these bugs at compile time rather than run time has a distinct advantage.
>Perhaps. But it seems to me that the changes that you'd need to make to fix such a bug are much the same changes that you'd need to make to port the code to Rust or what have you, since ultimately in either case you have to prove that the memory access is correct.
No, you wouldn't need to prove that the memory access is correct if you relied on hardware features. Or I should say, that proof will be mostly done by compiler and library writers who implement the low level stuff like array allocations. The net lines of code changed would definitely be less than a complete rewrite, and would not require rediscovery of specifications that normally has to happen in the course of a rewrite.
>Indeed I'd argue that an approach that lets you find these bugs at compile time rather than run time has a distinct advantage.
It is an advantage but it's not free. Every compilation takes longer in a more restrictive language. The benefits would rapidly diminish with the number of instances of the program that run tests, which is incidentally one metric that correlates positively with how significant bugs actually are. You could think of it as free unit tests, almost. The extra hardware does have a cost but that cost is WAAAY lower than the cost of a wholesale rewrite.
> No, you wouldn't need to prove that the memory access is correct if you relied on hardware features. Or I should say, that proof will be mostly done by compiler and library writers who implement the low level stuff like array allocations. The net lines of code changed would definitely be less than a complete rewrite, and would not require rediscovery of specifications that normally has to happen in the course of a rewrite.
I don't see how the hardware features make this part any easier than a Rust-style borrow checker or avoid requiring the same rediscovery of specifications. Checking at runtime has some advantages (it means that if there are codepaths that are never actually run, you can skip getting those correct - although it's sometimes hard to tell the difference between a codepath that's never run and a codepath that's rarely run), but for every memory access that does happen, your compiler/runtime/hardware is answering the same question either way - "why is this memory access legitimate?" - and that's going to require the same amount of logic (and potentially involve arbitrarily complex aspects of the rest of the code) to answer in either setting.
That's possible but unlikely. I would be OK with requiring software bugs like that to be fixed, unless it can be explained away as impossible for some reason. We could almost certainly move toward requiring this kind of stuff to be fixed much more easily than we could do the commonly proposed "rewrite it in another language bro" path.
There's no such thing as hardware memory safety, with absolutely no change to the semantics of the machine as seen by the compiled C program. There are going to be false positives.
There may be some cases where code would need to be adjusted or annotated to use CHERI well, but that has to be easier than translating to or interfacing with another language.
Did you forget a /s? It seems that if you can't convince a majority of programmers that your new language is good enough to learn, maybe it actually isn't as good as its proponents claim. It is likely the case that rewriting everything in a new language for marginally less bugs is a worse outcome than just dealing with the bugs.
Mind you, the government has tried this before with Ada. Not to knock Ada but let's just say that government would ruin everything and stifle the industry. Certainly, any new regulations about anything as broad as how memory is allowed to be managed is going to strangle the software industry.
If this has to be forced, it probably isn't necessary or very beneficial. How much will it cost to conform to these "standards" versus not? Who stands to gain by making non-conformant software illegal? I think it is clearly far too expensive to rewrite all software and retrain all programmers to conform to arbitrary standards. Hardware solutions to improve memory safety already exist and may ultimately be the best way to achieve the goal.
It seems to me that Rust programmers, unhappy with the pace of adoption of Rust, seek to make other languages illegal because they do things different from Rust.
That doesn't address existing codebases. Neither the Linux kernel nor the Chromium project is going to replace all its memory-unsafe code, so there are design challenges that need to be solved that are more complicated than "these memory-safe languages are available for your problem domain".
With due respect, the blog you have linked looks like the average Rust marketing material. It does absolutely nothing to address my concerns. I did a `Ctrl-F` and found zero hits of any of the following terms:
* CFI
* isoheaps or type-stable allocators
* Shadow stacks
(There is just a single hit of "C++"...)
Ignoring the appeal to authority, I have a hard time believing that incrementally rewriting my C++ code in Rust or just writing new code in Rust ("vulnerabilities exponentially decay" and all that) is going to give me more actual security than the mitigations stated above. Most, if not all, high-profile exploits stem from out-of-bounds accesses and type confusions, which these mitigations prevent at very low cost.
I am not interested in adhering to some arbitrary purity standard (like "memory safety" in this case). Almost always, purity ideologies are both irrational and harmful. What I am actually interested is to prevent real problems like remote code execution and Heartbleed-esque leakage of private data and for this, mitigations like CFI, shadow stacks and bounds checking are enough.
> They prevent but do not entirely mitigate.
Ignoring the semantic difference between "prevent" and "mitigate", if at the end of the day, the security provided by the two different approaches are quite similar, I don't get the problem.
If you have an example of a successful widespread exploit that would have happened even with these mitigations, please share.
They’re not enough. For example the field I work in (mobile exploits) continues to bypass CFI (PAC) via clever data-only attacks or abusing TOCTOU issues.
where do you even base these claims? Do you know what C# and Java threads have that Rust doesn't? data races. And don't get me stated on the biggest paradigm failure that is OOP.
Projects I've seen at work. Projects posted on Hacker News. Data races aren't usually an issue for backend services, and modern Java/C# is multi-paradigm.
> Data races aren't usually an issue for backend services
I beg to differ unless all your logic is in the database with strong isolation guarantees.
Speaking of C# for backends that are using EF actively, I bet there are bugs in pretty much all of them caused by incorrect applications of optimistic concurrency.
There are domains where C# (and F#) productivity stems from similar reasons why writing a game in something that isn't Rust might be more productive without even sacrificing performance (or, at least, not to the drastic extent).
I can give you an example:
var number = 0;
var delay = Task.Delay(1000);
for (var i = 0; i < 10; i++)
{
Task.Run(() =>
{
while (!delay.IsCompleted)
{
Interlocked.Increment(ref number);
}
});
}
await delay;
How would you write this idiomatically in Rust without using unsafe?
To avoid misunderstanding, I think Rust is a great language and even if you are a C# developer who does not plan to actively use it, learning Rust is of great benefit still because it forces you to tackle the concepts that implicitly underpin C#/F# in an explicit way.
There's a few things here that make this hard in Rust:
First, the main controller may panic and die, leaving all those tasks still running; while they run, they still access the two local variables, `number` and `delay`, which are now out of scope. My best understanding is that this doesn't result in undefined behavior in C#, but it's going to be some sort of crash with unpredictable results.
I think the expectation is that tasks use all cores, so the tasks also have to be essentially Send + 'static, which kinda complicates everything in Rust. Some sort of scoped spawning would help, but that doesn't seem to be part of core Tokio.
In C#, the number variable is a simple integer, and while updating it is done safely, there's nothing that forces the programmer to use Interlocked.Read or anything like that. So the value is going to be potentially stale. In Rust, it has to be declared atomic at the start.
Despite the `await delay`, there's nothing that awaits the tasks to finish; that counter is going to continue incrementing for a while even after `await delay`, and if its value is fetched multiple times in the main task, it's going to give different results.
In C#, the increment is done in Acquire-Release mode. Given nothing waits for tasks to complete, perhaps I'd be happy with Relaxed increments and reads.
So in conclusion: I agree, but I think you're arguing against Async Rust, rather than Rust. If so, that's fair. It's pretty much universally agreed that Async Rust is difficult and not very ergonomic right now.
On the other hand, I'm happy Rust forced me to go through the issues, and now I understand the potential pitfalls and performance implications a C#-like solution would have.
Does this lead to the decision fatigue you mention in another sub-thread? It seems like it would, so I'll give you that.
For posterity, here's the Rust version I arrived at:
let number = Arc::new(AtomicUsize::new(0));
let finished = Arc::new(AtomicBool::new(false));
let finished_clone = Arc::clone(&finished);
let delay = task::spawn(async move {
sleep(Duration::from_secs(1)).await;
finished_clone.store(true, Ordering::Release);
});
for _ in 0..10 {
let number_clone = Arc::clone(&number);
let finished_clone = Arc::clone(&finished);
task::spawn(async move {
while !finished_clone.load(Ordering::Acquire) {
number_clone.fetch_add(1, Ordering::SeqCst);
task::yield_now().await;
}
});
}
delay.await.unwrap();
use std::{
sync::{
Arc,
atomic::{AtomicBool, AtomicUsize, Ordering},
},
time::Duration,
};
fn main() {
let num = Arc::new(AtomicUsize::new(0));
let finished = Arc::new(AtomicBool::new(false));
for _ in 0..10 {
std::thread::spawn({
let num = num.clone();
let finished = finished.clone();
move || {
while !finished.load(Ordering::SeqCst) {
num.fetch_add(1, Ordering::SeqCst);
}
}
});
}
std::thread::sleep(Duration::from_millis(1000));
finished.store(true, Ordering::SeqCst);
}
What if we want to avoid explicitly spawning threads and blocking the current one every time we do this? Task.Run does not create a new thread besides those that are already in the threadpool (which can auto-scale, sure, but you get the idea, assuming the use of Tokio here).
I was implying that yes, while it is doable, it comes at 5x cognitive cost because of micromanagement it requires. This is somewhat doctored example but the "decision fatigue" that comes with writing Rust is very real. You write C# code, like in the example above, quickly without having to ponder on how you should approach it and move on to other parts of the application while in Rust there's a good chance you will be forced to deal with it in a much stricter way. It's less so of an issue in regular code but the moment you touch async - something that .NET's task and state machine abstractions solve on your behalf you will be forced to deal with by hand. This is, obviously, a tradeoff. There is no way for .NET to use async to implement bare metal cooperative multi-tasking, while it is very real and highly impressive ability of Rust. But you don't always need that, and C# offers an ability to compete with Rust and C++ in performance in critical paths when you need to sit down and optimize it unmatched by other languages of "similar" class (e.g. Java, Go). At the end of the day, both languages have domains they are strong at. C# suffers from design decisions that it cannot walk back and subpar developer culture (and poor program architecture preferences), Rust suffers from being abrasive in some scenarios and overly ceremonious in others. But other than that both provide excellent sets of tradeoffs. In 2025, we're spoiled with choice when it comes to performant memory-safe programming languages.
To be honest this sounds like something someone inexperienced would do in any language.
If you're not comfortable in a language, then sure you ponder and pontificate and wonder about what the right approach is, but if you're experienced and familiar then you just do it plain and simple.
What you're describing is not at all a language issue, it's an issue of familiarity and competency.
It's literally not 5x the cost, it would take me 3 minutes to whip up a tokio example. I've done both. I like C# too, I totally understand why you like it so much. This is not a C# vs Rust argument for me. All I'm saying is that Rust is a productive language.
Rust is manual by design because people need to micro-manage resources. If you are experienced in it, it still takes a very little time to code your scenario.
Obviously if you don't like the manual-ness of Rust, just use something else. For what you described I'd reach for Elixir or Golang.
I was disagreeing with you that it's not easy or too difficult. Rust just takes a bit of effort and ramping up to get good at. I hated it as well to begin with.
But again -- niches. Rust serves its niche extremely well. For other niches there are other languages and runtimes.
- I recommend reading the comment history of @neonsunset. He has shared quite some insights, snippets and benchmarks to make the case that if you do not need the absolute bare metal control C or Rust provides, you are better of with either .net or Java.
- Whereas in .net you have the best native interop imaginable for a high level language with a vast SDK. I understood that Java has improved on JNI, but I am not sure how well that compares.
- Programming languages are like a religion, highly inflammable, so I can imagine you would not be swayed by some rando on the internet. I would already be happy if you choose Go over Python, as with the former you win some type safety (but still have a weak type system) and have a good package manager and deployment story.
- Go was designed for Google, to prevent their college grads from implementing bad abstractions. But good abstractions are valuable. A weak type system isn't a great idea (opinion, but reasonable opinion). Back then .net was not really open source (I believe) and not as slim and fast as it is now, and even then, I think Google wants to have control about their own language for their own internal needs.
- Therefore, if you are not Google, Go should likely not be your top pick. Limited area of application, regrettable decisions, tailored for Google.
> if you do not need the absolute bare metal control C or Rust provides, you are better of with either .net or Java.
That, like your next point, is a relatively fair statement but it's prone to filter bubble bias as I am sure you are aware. I for example have extracted much more value out of Golang... and I had 9 years with Java, back in the EJB years.
Java and .NET are both VM-based and have noticeable startup time. As such, they are best suited for servers and not for tooling. Golang and Rust (and Zig, and D, V and many other compiled languages) are much better in those areas.
> Programming languages are like a religion, highly inflammable, so I can imagine you would not be swayed by some rando on the internet
Those years are long past me. I form my own opinions and I have enough experience to be able to make fairly accurate assessments with minimum information.
> I would already be happy if you choose Go over Python, as with the former you win some type safety (but still have a weak type system) and have a good package manager and deployment story.
I do exactly that. In fact I am a prominent Python hater. Its fans have gone to admirable heights in their striving to fill the gaps but I wonder will they one day realize this is unnecessary and just go where those gaps don't exist. Maybe never?
And yeah I use Golang for my own purposes. Even thinking of authoring my own bespoke sync and backup solution stepping on Syncthing, GIT and SSH+rsync and package it in Golang. Shell scripts become unpredictable from one scale and on.
> Go was designed for Google, to prevent their college grads from implementing bad abstractions. But good abstractions are valuable. A weak type system isn't a great idea (opinion, but reasonable opinion). Back then .net was not really open source (I believe) and not as slim and fast as it is now, and even then, I think Google wants to have control about their own language for their own internal needs.
That and your next point I fully agree with. That being said, Golang is good enough and I believe many of us here preach "don't let perfect be the enemy of good". And Golang symbolizes exactly that IMO; it's good enough but when your requirements start shooting up then it becomes mediocre. I have stumbled upon Golang's limitations, fairly quickly sadly, that's why I am confining it to certain kinds of projects only (and personal tinkering).
> I recommend reading the comment history of @neonsunset.
I don't mind doing that (per se) but I find appeals to authority and credentialism a bit irritating, I admit.
Plus, his reply was in my eyes a fairly low-effort snark.
(I would note that EJB is something from the past. Like .net has also really grown.)
> that's why I am confining it to certain kinds of projects only (and personal
tinkering).
You have a fair view of Go, I think. I could see that it makes sense to use it as a replacement for bash scripts, especially if you know the language well.
Personally I am wanting to dive into using F# for my shell scripting needs. The language leans well into those kind of applications with the pipe operator.
If you ever have the appetite, you should take a look at it as it can be run in interpreted/REPL mode too, which is a nice bonus for quick one-of scripts.
> Java and .NET are both VM-based and have noticeable startup time. As such, they are best suited for servers and not for tooling. Golang and Rust (and Zig, and D, V and many other compiled languages) are much better in those areas.
For JIT-based deployments, it is measured in 100-500ms depending on the size of application, sometimes below. .NET has first-party support for NativeAOT deployment mode for a variety workloads: web servers, CLI tools, GUI applications and more.
Go is a VM-based language, where VM provides facilities such as virtual threading with goroutines (which is higher level of abstraction than .NET's execution model), GC, reflection, special handling for FFI. Identical to what .NET does. I don't think the cost and performance profile of BEAM needs additional commentary :)
Go also has weaker GC and compiler implementations and, on optimized code, cannot reach the performance grade of C++ and Rust, something C# can do.
> Those years are long past me. I form my own opinions and I have enough experience to be able to make fairly accurate assessments with minimum information.
The comments under your profile seem to suggest the opposite. Perhaps "minimum information" is impeding fair judgement?
> I don't mind doing that (per se) but I find appeals to authority and credentialism a bit irritating, I admit.
Is there a comment you have in mind which you think is engaging in credentialism?
> Is there a comment you have in mind which you think is engaging in credentialism?
The other guy who told me to inspect your profile. Not you.
> The comments under your profile seem to suggest the opposite.
Sigh. You seem to act in bad faith which immediately loses my interest.
You'll have to take your high and mighty attitude to somebody else. Seems like even HN is not immune from... certain tropes, shall we call them, generously.
Even when it's used by mediocre developers, which is probably more than 90% of us, myself very much included? All I've been seeing is Rust being used by very enthusiastic and/or talented developers, who will be productive in any language.
If your baseline is a language that is missing some features that were in standard ML, sure. If you were already using OCaml or F#, Rust doesn't make you any more productive. If you were already using Haskell or Scala, Rust's lack of HKT will actively slow you down.
We need the BEAM VM's guarantees, not yesterday but like 20 years ago, everywhere. The language itself does not matter. But we need that runtime's guarantees!
They have but it's mostly a labor of love and it's very difficult to fit a static type system into a dynamically typed language.
We already have some false positives. Happily the team is very motivated and is grinding away at them, for which we the community are forever grateful.
I worked on a provenance system which would be so completely the wrong solution to this problem that I only bring it up because the 100,000 foot view is still relevant.
I think we are eventually going to end up with some sort of tagged memory with what this is for (such as credentials) and rules about who is allowed to touch it and where it's allowed to go. Instead of writing yet another tool that won't let fields called "password" or "token" or "key" be printed into the logs, but misses "pk", it's going to be no printing any memory block in this arena, period.
I also think we aren't doing enough basic things with backend systems like keeping just a user ID in the main database schema and putting all of the PII in a completely different cluster of machines, that has 1/10 to 1/100th of the sudoer entries on it of any other service in the company. I know these systems are out there, my complaint is we should be talking about them all the time, to push the Recency Effect and/or Primacy Effect hard.
Perl has had a limited data tagging system for decades now, called "taint checking".
If enabled (through a command line switch), all data coming in from the outside (sockets, STDIN etc.) are "tainted", and if you e.g. concatenate a non-tainted and a tainted string, it becomes tainted. Certain sensitive operations, like system() calls or open(), raise an error when used with tainted data.
If you match tainted data with a regex, the match groups are automatically untainted.
It's not perfect, but it demonstrates that such data tagging is possible, and quite feasible if integrated early enough in the language.
Rails has a “safe” attribute that only works for html output, and doesn’t work right for urls (a bug that somehow became my responsibility to fix many times). It’s a limited version of the same thing and I believe Elixir has the same design, and I’ve already seen a reproduction of the Rails flaw in Elixir.
But they are Boolean values and they need to be an enumeration or likely a bitfield. Even just for web I’ve already identified four in this thread. HTML unsafe, url unsafe, PII unsafe, credentials unsafe. I hesitate to add SQL unsafe because the only solution to sql injection is NO STRING CONCATENATION. But so many SQL libraries use concatenation even for prepared statements that maybe it should be. Only allow string constants for sql queries.
While I agree with you as a matter of an ideal, the step from one database to two is infinitely larger than from two to more. Given budget, time and engineering constraints, sticking everything in one database is by far the sanest solution for the vast majority of code out there.
I am 100% in favor of industry standards to enforce safety. It should go way past just memory safety, though. Engineering standards should include practices and minimum requirements to prevent safety issues as a whole.
Happy to see the mention of Kotlin's memory safety features here; goes a bit beyond Java with its null safety, encouragement of immutability and smart casting.
I was actually a little surprised to see that in there, I wouldn't really consider those features to be "memory safety" as I traditionally see it over Java.
They don't really lead to exploitable conditions, except maybe DoS if you have poor error handling and your application dies with null pointer exceptions.
> This is easily corrected in C by adding a small extension
Unfortunately, easily corrected it is not. Yes, probably >95% of arrays have a nearby easily-accessible length parameter that indicates their maximum legal length (excluding the security disaster of null-terminate strings). But the problem is there's no consistent way that people do this. Sometimes people put pointer first and size second, sometimes it's the other way around. Sometimes the size is a size_t, sometimes an unsigned, sometimes an int. Or sometimes it's not a pointer-and-size, but pointer and one-past-the-end pointer pair. Sometimes multiple arrays share the same size parameter.
So instead of an easy solution getting you 90% for effectively free, you get like 30% with the easy solution, and have to make it more complicated to handle the existing diversity to push it back up to that 90%.
Yeah, but looking back at the attempts to extend C a bit for safety, the only one that seems to have got market traction is MISRA and even then it's pretty limited. D, rust, zig etc all seem to have much more buy in. There must be some reason why a new language works better here -I mean, you're basing your business off D, not a C extension, right?
That's.. kind of my point. This mechanism has seen more adoption in a new language than in the existing language. I'm sure it would work technically - there must be some other reason why it's easier to get a new language adopted .
How many security holes are caused by not sanitizing inputs, as opposed to memory safety? It feels like not sanitizing inputs is what enables memory safety exploits, in addition to many other classes of security hole, yet nobody seems to talk about it.
- Buffer overflow: somebody didn't sanitize the input (length of buffer).
- Stack smashing: somebody didn't sanitize the input (length of input).
- Format string vulnerability: somebody didn't sanitize the format string data.
- Integer converstion vulnerability: somebody didn't sanitize the integer input.
- SQL injection: somebody didn't sanitize the input before querying a database.
- Cross-site scripting: somebody didn't sanitize input and it got submitted and executed on behalf of the user.
- Remote file inclusion / Directory traversal: somebody didn't sanitize an input variable leading to a file path.
...and on, and on, and on. If people were as obsessed with input sanitization as they are with memory, I'll bet you a much larger percentage of attacks would be stopped. Too bad input sanitization isn't sexy.
SQL Injection and XSS are actually great examples of vuln classes where the winning strategy is safe APIs rather than diligent sanitization. "Just sanitize all your user inputs" is hard to do correctly because it is difficult to create automatic rules that detect every single possible violation.
Prepared statements and safe HTML construction APIs plus some linters that scream at you when you use the unsafe APIs works like magic.
You're correct. It's about distinction between code and data.
You should simply discern between HTML elements (code) and HTML text nodes (data).
Same with prepared statements: Clear distinction between SQL code vs SQL data.
You just need to ensure that your data is never interpreted as code.
> You just need to ensure that your data is never interpreted as code.
That's sanitization. Many different languages implement this. The old-school method is "tainting" data so it can't be used as part of execution without an explicit function call to "untaint" it. Same is used for "secret" data in various programs where you don't want it leaked.
> - SQL injection: somebody didn't sanitize the input before querying a database.
> - Cross-site scripting: somebody didn't sanitize input and it got submitted and executed on behalf of the user.
To be technical about it, this is generally a failure of escaping rather than sanitizing.
You're supposed to be able to put anything into a database field and not have it affect the query. You ought to be able to paste JavaScript into a field and have it be displayed as JavaScript, not executed. The inputs remain as they are -- no sanitization -- they just have to be escaped properly.
That being said, I'm 100% on board about the importance of sanitation/validation. To the extent I think it ought to be part of the design of languages just like types. I.e. if a function parameter is only allowed to be three string values, or a string between 0 and 10 bytes, or a string limited to lowercase ASCII, these should be expressable.
Sanitizing inputs in important in addition to memory safety.
Sanitizing inputs won't protect you against all bugs. For example, you may store a string in a 256-byte buffer, so you check that the string is no longer than 256 characters, but you forget the zero-terminator, and you have a buffer overflow. Or maybe you properly limited the string to 255 characters, but along the way, you added support for multibyte characters, and you get another buffer overflow.
Bound checking would have caught that.
Injection can happen at any point. You may sanitize user input to avoid SQL injection, but at some later point it may get out of the database and in a format string, but it was sanitized for SQL, not for format strings, leading to a potential injection.
"Sanitising" doesn't work. It's an exploit mitigation strategy, not a sound way of actually preventing bugs. And it doesn't prevent many of the vulnerabilities you list, because many of the things that cause issues don't come from "input" at all (e.g. a lot of buffer overflows can be triggered with "legitimate" input that isn't and couldn't be caught by input sanitisation).
> It's an exploit mitigation strategy, not a sound way of actually preventing bugs.
Actually it is a simple and effective way to prevent general bugs.
If you have an input field called birthday, you can inject it directly into your database. Doing that could cause an SQL injection exploit, so people use prepared statements.
But even if you use prepared statements, you'll still end up with a database column with all kinds of birthday formatting (M-D-Y, Y-M-D, slashes, spaces, colons, etc etc). These different, non-standard formats will eventually cause a bug.
Input sanitization forces you to standardize on one birthday format, and inject it into your database in one format (let's say "YYYY-MM-DD", no other characters allowed). Then all the code expects - and gets - one format. This reduces bugs from unexpected formats.
It has the added side-effect of also eliminating the SQL injection bug, regardless of prepared statement.
> Input sanitization forces you to standardize on one birthday format, and inject it into your database in one format (let's say "YYYY-MM-DD", no other characters allowed). Then all the code expects - and gets - one format.
That's not sanitisation as the word is normally used. That's parsing and canonicalisation. That's a good path to actual security - it leads you to the "make invalid states unrepresentable" style and using decent type systems.
A buffer overflow is just exceeding the size of a buffer, and stack smashing is a modification of either the stack itself or the stack pointer to result in different operations on the stack. These methods can coincide, and they can also exist independently of each other
Yep. Hence my comment. Smashing the stack is an afterward condition of some kind of memory error. If you’re going to do that, why not mention heap overflow and others? It just seems unusual.
- modern languages prevent you from having to think about it at all. You shouldn’t have to sanitize, it should work
- there are a load of issues that have nothing to do with sanitization but everything to do with memory. Race conditions, type confusion, UAF to name a couple. If we focus on sanitization, we still need memory safety to fix those
Also sanitization is non-trivial for complex inputs
> Looking forward, we're also seeing exciting and promising developments in hardware. Technologies like ARM's Memory Tagging Extension (MTE) and the Capability Hardware Enhanced RISC Instructions (CHERI) architecture offer a complementary defense, particularly for existing code.
>>> IIRC, with CPython the NX bit doesn't work when any imported C extension has nested functions / trampolines
>> How should CPython support the mseal() syscall? [which was merged in Linux kernel 6.10]
> We are collaborating with industry and academic partners to develop potential standards, and our joint authorship of the recent CACM call-to-action marks an important first step in this process. In addition, as outlined in our Secure by Design whitepaper and in our memory safety strategy, we are deeply committed to building security into the foundation of our products and services.
> That's why we're also investing in techniques to improve the safety of our existing C++ codebase by design, such as deploying hardened libc++.
A graded memory safety standard is one aspect of security.
> Tailor memory safety requirements based on need: The framework should establish different levels of safety assurance, akin to SLSA levels, recognizing that different applications have different security needs and cost constraints. Similarly, we likely need distinct guidance for developing new systems and improving existing codebases. For instance, we probably do not need every single piece of code to be formally proven. This allows for tailored security, ensuring appropriate levels of memory safety for various contexts.
> Enable objective assessment: The framework should define clear criteria and potentially metrics for assessing memory safety and compliance with a given level of assurance. The goal would be to objectively compare the memory safety assurance of different software components or systems, much like we assess energy efficiency today. This will move us beyond subjective claims and towards objective and comparable security properties across products.
Does ECC do anything with memory safety? This is about physical errors while the article is talking about software bugs. Those two are almost orthogonal.
Intel supports ECC, so does AMD, so why would they lobby against it? Intel uses it for market segmentation, but I don't think it is a big deal.
It is just that those who build consumer-grade hardware don't want to spend 12% more on RAM for slightly less performance. Among them are essentially all ARM devices, including smartphones and Apple silicon Macs.
The article in question is published on Google's blog. Has Google resolved memory safety issues in its C++ code base? Did G port their code base to Rust or some other memsafe language? What's preventing them from doing that by themselves?
What's preventing Microsoft, or Apple, or the coagulate Linux kernel team, or any other kernel team, from adopting memsafe technology or practice by themselves for themselves?
The last thing we need are what are evidently incompetent organizations that can't take care of their own products making standards, or useless academics making standards to try to force other people to follow rules because they know better than everyone else.
If the team that designed and implemented KeyKos, or that designed Erlang, were pushing for standardized definitions or levels of memory safety, it would be less ridiculous.
At the same time, consciousness of security issues and memory safety has been growing quickly, and memory safety in programming languages has literally exploded in importance. It's treated in every new PL I've seen.
Putting pressure on big companies to fix their awful products is fine. No pressure needs to be applied to the rest of the industry, because it's already outpacing all of the producing entities that are calling for standards.
If Google has failed so far to resolve mem safety issues in their decades old giant code base, then I'd rather hear standardization ideas from someone who succeeded. If G succeeded at resolving those issues, then that's a concrete positive example for the rest of industry to consider following. They ought to lead by example.
It seems like decades-old giant code bases are precisely the ones hardest to migrate to memory safety. That's where coercion and enforcement is needed most. You and I don't need to be told to start a new project in not-C++ do we? Nearly every trained programmer has been brainwashed (in a good way) with formal methods, type systems, bounds checking, and security concerns. Now those same people who champion this stuff say it isn't enough, and therefore we need to do more of the same but with coercion. That's a failure to understand the problem.
> If Google has failed so far to resolve mem safety issues in their decades old giant code base, then I'd rather hear standardization ideas from someone who succeeded. If G succeeded at resolving those issues, then that's a concrete positive example for the rest of industry to consider following. They ought to lead by example.
Google saw "the percentage of memory safety vulnerabilities in Android dropped from 76% to 24% over 6 years as development shifted to memory safe languages" - which I'd say is a positive example.
It's not that they've already fully succeeded (I don't think anyone has on codebases of this size), but neither is it that they tried and failed - it's an ongoing effort.
> You and I don't need to be told to start a new project in not-C++ do we?
Don't need to be told because we all already avoid C++, or don't need to be told because it doesn't really matter if we do use C++?
I'd disagree with both. There are still many new projects (or new components of larger systems) being written in C++, and it's new code that tends to have the most vulnerabilities.
So many mentions of CHERI - both in this post and in the linked CACM article. I doubt CHERI will be the future considering how long it's been around for and how few actually successes have come out of it.
Also, hilariously, Fil-C is faster than CHERI today (the fastest CHERI silicon available today will be slower than Fil-C running on my laptop, probably by an order of magnitude). And Fil-C is safer.
Which sort of brings up another issue - this could be a case where Google is trying to angle for regulations that support their favorite strategy while excluding competition they don't like.
What is the distinction between this approach and Address Sanitizer https://clang.llvm.org/docs/AddressSanitizer.html ? If I understand correctly, Fil-C is a modified version of LLVM. Is your metadata more lightweight, catches more bugs? Could it become a pass in regular LLVM?
Fil-C is memory safe. Asan isn't.
For example, asan will totally let you access out of bounds of an object. Say buf[index] is an out-of-bounds access that ends up inside of another object. Asan will allow that. Fil-C won't. That's kind of a key spacial safety protection.
Asan is for finding bugs, at best. Fil-C is for actually making your code memory safe.
Sorry, I’m not a C specialist. What are Fil-C and CHERI? eg. Safe subsets of C, static analysis tools, C toolsets to ensure memory safety?
CHERI is a hardware architecture and instruction set to add safety-related capabilities to processors. See https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/ In this context a capability means a way to track and enforce which memory area a pointer can point into. Typically this has to be coupled with a compiler which will initialize the capability for each pointer.
Fil-C seems to be a C variant that adds capabilities and garbage collection. See https://github.com/pizlonator/llvm-project-deluge/blob/delug...
It isn't CHERI, however Solaris SPARC ADI has more than proven its usefulness, it isn't more widely deployed due to the reasons we all know.
> Fil-C is currently 1.5x slower than normal C in good cases, and about 4x slower in the worst cases. I'm actively working on performance optimizations for Fil-C, so that 4x number will go down.
I am pretty sure you cannot go much lower than 1.2 here on the best cases. In contrast, CHERI on good hardware will easily be as close to current performance as possible.
Do you say that CHERI will be "as close to current performance as possible" with the same energy consumption?
Cheri hardware is going to be more than 1.2x slower than the fastest non Cheri hardware.
Not really buying your thesis here: Attempts to retrofit safely to C have been around a lot longer than Cheri; fil-c is just the latest and there's no obvious reason why it should be more successful.
The speed comparison with a laptop is just disingenuous. Is a device with Cheri integrated slower than one of the same class without?
> Attempts to retrofit safely to C have been around a lot longer than Cheri; fil-c is just the latest and there's no obvious reason why it should be more successful.
It is true that memory-safe C compilers have existed for decades and have seen minimal adoption.
However, improvements to clang/llvm could yield wider impact and benefit than previous efforts, since they may be supported in a widely used C toolchain.
-fbounds-safety is another change that may see more adoption if it makes it into mainline clang/llvm
https://clang.llvm.org/docs/BoundsSafetyAdoptionGuide.html
> Not really buying your thesis here: Attempts to retrofit safely to C have been around a lot longer than Cheri; fil-c is just the latest and there's no obvious reason why it should be more successful.
Here's the difference with Fil-C: It's totally memory safe and fanatically compatible with C/C++, no B.S. The other approaches tend to either create a different language (not compatible), or tend to create a superset of C/C++ with a safe subset (not totally memory safe, only compatible in the sense that unsafe code continues to be unsafe), or just don't fully square the circle (for example SoftBound could have gotten to full compatibility and safety, but the implementation just didn't).
> The speed comparison with a laptop is just disingenuous. Is a device with Cheri integrated slower than one of the same class without?
Not disingenuous at all.
The issue is that:
- high volume silicon tends to outperform low volume silicon. Fil-C runs on x86_64 (and could run on ARM64 if I had the resources to regularly test on it). So, Fil-C runs on the high volume stuff that gets all of the best optimizations.
- silicon optimization is best done incrementally on top of an already fast chip, where the software being optimized for already runs on that chip. Then it's a matter of collecting traces on that software and tuning, rather than having to think through how to optimize a fast chip from first principles. CHERI means a new register file and new instructions so it doesn't lend itself well to the incremental optimization.
So, I suspect Fil-C will always be faster than CHERI. This is especially true if you consider that there are lots of possible optimizations to Fil-C that I just haven't had a chance to land yet.
I've been really impressed with what you're doing with Fil-C, but:
> Here's the difference with Fil-C: It's totally memory safe and fanatically compatible with C/C++, no B.S.
Is this true on both counts? If I'm reading your docs right, you're essentially adding hidden capabilities to pointers. This is a great technique that gives you almost perfect machine-level compatibility by default, but it comes with the standard caveats:
1. Your type safety/confusion guards are essentially tied to pointer "color," and colors are finite. In other words, in a sufficiently large program, an attacker can still perform type confusion by finding types with overlapping colors. Not an issue in small codebases, but maybe in browser- or kernel-sized ones.
2. In terms of compatibility, I'm pretty sure this doesn't allow a handful of pretty common pointer-integer roundtrip operations, at least not without having the user/programmer reassign the capability to the pointer that's been created out of "thin air." You could argue correctly that this is a bad thing that programmers shouldn't be doing, but it's well-defined and common enough IME.
(You also cited my blog's writeup of `totally_safe_transmute` as an example of something that Fil-C would prevent, but I'm not sure I agree: the I/O effect in that example means that the program could thwart the runtime checks themselves. Of course, it's fair to say that /proc/self/mem is a stunt anyways.)
My type safety and confusion guards have nothing to do with colors of any kind. There are no finite colors to run out of.
Fil-C totally allows pointer to integer round tripping in many cases, if the compiler can see it’s safe.
I’m citing unsafe transmute as something that Fil-C doesn’t prevent. It’s not something that memory safety prevents.
> My type safety and confusion guards have nothing to do with colors of any kind. There are no finite colors to run out of.
I'm having trouble seeing where the type confusion protection properties come from, then. I read through your earlier (I think?) design that involved isoheaps and it made sense in that context, but the newer stuff (in `gimso_semantics.md` and `invisicap.txt`) seems to mostly be around bounds checking instead. Apologies if I'm missing something obvious.
> I’m citing unsafe transmute as something that Fil-C doesn’t prevent. It’s not something that memory safety prevents.
I think the phrasing is confusing, because this is what the manifesto says:
> No program accepted by the Fil-C compiler can possibly go on to escape out of the Fil-C type system.
This to me suggests that Fil-C's type system detects I/O effects, but I take it that wasn't the intended suggestion.
Here’s a write up that goes into more detail: https://github.com/pizlonator/llvm-project-deluge/blob/delug...
I read that, but I'm not seeing where the type is encoded in the capability.
Short answer: if you know how CHERI and SoftBound do it, then Fil-C is basically like that.
Long answer: let's assume 64-bit (8 byte pointers) without loss of generality. Each capability knows, for each 8 bytes in its allocation, whether those 8 bytes are a pointer, and if so, what that pointer's capability is.
Example:
This will allocate 64 bytes. p's capability will know, for each of the 8 8-byte slots, if that slot is a pointer and if so, what it's capability is. Since you just allocated the object, none of them have capabilities.Then if you do:
Then p's capability will know that at offset 8, there is a pointer, and it will know that the capability is whatever came out of the malloc.Hence, each capability is dynamically tracking where the pointers are. So it's not a static type but rather something that can change over time.
There's a bunch of engineering that goes into this being safe under races (it is) and for supporting pointer atomics (they just work).
BTW I wrote another doc to try to explain what's happening. Hope this helps.
https://github.com/pizlonator/llvm-project-deluge/blob/delug...
Thank you, I found this very helpful!
I read the Fil-C overview, and I was confused by one thing: how does Fil-C handle integer-to-pointer conversions? Rust has the new strict provenance API that is somewhat explicitly designed to avoid a need to materialize a pointer capability from just an integer, but C and C++ have no such thing. So if the code does:
Does this fail unconditionally? Or is there some trick by which it can succeed if p is valid? And, if the latter is the case, then how is memory safety preserved?edit: I found zptrtable and this construct:
https://github.com/pizlonator/pizlonated-quickjs/commit/258a...
The latter seems to indicate that:
is at least a semi-reliable way to increment p by one. I guess this is a decent way to look like C and to keep a widely-used pattern functional. Rust’s with_addr seems like a more explicit and less magical way to accomplish the same thing. If Fil-C really takes off, would you want to add something like with_addr? Is allowing the pair of conversions on the same line of code something that can be fully specified and can be guaranteed to compile correctly such that it never accidentally produces a pointer with no capability?Your deref function will fail, yeah.
The pair of conversions is guaranreed to always produce a pointer with a capability. That’s how I implemented it and it’s low-tech enough that it could be specified.
How far can the pair of conversions be pushed? Will this work:
Does it matter if f is inline?Could someone implement Rust’s with_addr as:
FWIW, I kind of like zptrtable, and I think Fil-C sounds awesome. And I’m impressed that you were able to port large code bases with as few changes as it seems to have taken.Your first example will hilariously work if `f` is inline and simple enough and optimizations are turned on. I'm not sure I like that, so I might change that. I'd like to only guarantee the you get a capability in cases where that guarantee holds regardless of optimization (and I could achieve that with some more compiler hacking).
Not sure about the semantics of with_addr. But note that you can do this in Fil-C:
I have a helper like that called `zmkptr` (it's just an inline function that does exactly the above).with_addr is basically that, but with a name and some documentation:
https://doc.rust-lang.org/std/primitive.pointer.html#method....
As I understand it, Rust added this in part for experiments with CHERI but mostly for miri.
Interestingly, the implementation of with_addr is very similar to your code.
How do you handle cases where there are multiple possible sources of the capability? For example:
I’m not sure I would allow this into any code I maintain, but still. There’s also the classic xor-list, and someone has probably done it like: And this needs to either result in a compiler error or generate some kind of code.Rust’s with_addr wins points for being explicit and unambiguous. It obviously loses points for not being C. And Rust benefits here from all of this being in the standard library and from some of the widely-available tooling (miri) getting mad if code doesn’t use it. I can imagine a future Fil-Rust project doing essentially the same thing as Fil-C except starting with Rust code. It might be interesting to see how the GC part would interact with the rest of the language.
My compiler analysis says that if you have two possible pointers that a capability might come from, like in your first example, then you get no capability at all. I think that's a better semantics than picking some capability at random.
If you want to be explicit about where the capability comes from, use `zmkptr`.
> Here's the difference with Fil-C: It's totally memory safe and fanatically compatible with C/C++, no B.S. The other approaches tend to either create a different language (not compatible), or tend to create a superset of C/C++ with a safe subset (not totally memory safe, only compatible in the sense that unsafe code continues to be unsafe), or just don't fully square the circle (for example SoftBound could have gotten to full compatibility and safety, but the implementation just didn't).
That's exactly the kind of thing that the boosters of all those previous efforts said. But somehow it never quite worked out.
> That's exactly the kind of thing that the boosters of all those previous efforts said
I don't think this is true. - D, Swift, Rust, Zig: different languages, and while they do have FFI, using it means you're only as safe as your C code - CHERI: requires hardware support to be practical - Checked C, CCured, ?SAFECode IIRC?: too expensive - AddrSan@runtime, ARM MTE, SoftBound: mitigations with too many holes
I don't know of many (to be honest, can't think of any) other serious attempts at making a system that tries to cover all three of
A) lets you write normal C
B) covers all the gaps
C) doesn't kill performance
Well I've got 2/3 so far!
Maybe 3/3 depending on your workload and definition of "killing performance". It's less than 2x slower for some stuff.
The good news is Fil-C is getting faster all the time, and there are still so many obvious optimizations that I haven't gotten around to.
Solaris SPARC ADI.
https://docs.oracle.com/en/operating-systems/solaris/oracle-...
I don't get it, what's the catch? Why isn't everyone using Fil-C immediately everywhere?
Awful performance. Usually 2x worse than C and 4x worse in the worst case. Given the comment by Fil-C's creator minimizing the performance issue [0], I wouldn't get my hopes up.
[0] https://news.ycombinator.com/item?id=43190938
I guess you missed the point of that post.
I’ll summarize: language implementations get faster over time. Young ones tend to be slow. Fil-C is a young implementation that still has lots of unoptimized things. Also, Fil-C being 2x slower than C means it’s already faster than many safe languages. And, for a lot of C use cases perf doesn’t matter as much as the hype suggests.
The fact that young implementations are slow is something that’s worth understanding even if you don’t care about fil-C. It suggests, for example, that if someone invents a new language and their initial implementation is slow, then you can’t use that fact to assume that it’ll be slow forever. I think that’s generally a useful lesson.
I care about performance a lot and Fil-C has gotten about 100x faster since the first prototype. It’ll keep getting faster.
Lots of reasons.
Here's one: even just switching from gcc or msvc to clang, in projects that really want to, takes years.
Here's another one: the Fil-C compiler is really young, so it almost certainly still has bugs. Those compilers that folks actually use in anger tend to get qualified on ~billions of lines of code before anyone other than the compiler devs touches them. The Fil-C compiler is too young to have that level of qualification.
So "immediately everywhere" isn't going to happen. At best it'll be "over a period of time and incrementally".
> That's exactly the kind of thing that the boosters of all those previous efforts said. But somehow it never quite worked out.
No, they really didn't. Let's review some of the big cases.
- SafeC: not based on a mainstream C compiler, so can't handle gcc/clang extensions (Fil-C can). Had no story for threads or shared memory (Fil-C does). Hence, not really compatible.
- CCured: incompatible (cannot compile C code with it without making changes, or running their tool that tries to automate the changes - but even then, common C idioms like unions don't quite work). Didn't use a major C compiler,
- SoftBound: not totally memory safe (no attempt to provide safety for linking or function calls). But at least it's highly compatible.
I can list more examples. Fil-C is the first to get both compatibility and safety right.
> Fil-C is the first to get both compatibility and safety right.
Has any impartial third party reached that conclusion? Because honestly the way I remember it everyone says this kind of thing when it's their own project, a lot of the people behind these previous efforts were just as confident as you are.
Not in any official capacity but it’s been looked at by other C compiler experts, other programming language experts, GC experts, and security experts. Folks who have looked at it deeply agree with those claims. And I hope they would have told me if they didn’t believe anything about my claims!
Also, it always had a material performance impact. People write C++, and to a lesser extent C, because they really, really care about performance. If they didn’t care about performance there are easier languages to use.
Talking about performance impact is missing the bigger picture of how languages become performant. "Really really care about performance" describes some C/C++ programmers, but definitely not all of them. Finally, Fil-C is already faster than a lot of other safe languages (definitely faster than TypeScript, yet lots of stuff ships in TypeScript).
Language implementations get faster over time and young ones tend to be slow. The Fil-C implementation is young. So were all of the previous attempts at memory-safe C - usually an implementation that had years of at most a few person years of investment (because it was done in an academic setting). Young implementations tend to be slow because the optimization investment hasn't happened in anger. So, "past academic attempts were slow" is not a great reason to avoid investigating memory safe C.
Performance focus is not the reason why all of the world's C/C++ code gets written. Maybe that's even a minority reason. Lots of stuff uses C/C++ because of reasons like:
- It started out in C/C++ so it continues to be in C/C++. So many huge projects are in this boat.
- You're critically relying on a library whose only bindings are in C/C++, or the C/C++ bindings are the most mature, or the most easy to use.
- You're doing low-level systems stuff, and having pointers that you can pass to syscalls is a core part of your logic.
- You want to play nice with the dynamic linking situation on the OS you're targeting. (C/C++ get dynamic linking right in a way other languages don't.)
I'd guess less than half of the C/C++ code that's being written today is being written because the programmer was thinking "oh man, this'll be too slow in any other language".
Finally, Fil-C is already faster than a lot of memory safe languages. It's just not as fast as Yolo-C, but I don't think you can safely bet that this will be true forever.
> Fil-C is faster than CHERI
Except for those GC pauses...
No GC pauses. Fil-C uses a concurrent GC.
For CHERI to be fully safe, it basically needs a GC. They just call it something else. They need it to clean up capabilities to things that were freed, which is the same thing that Fil-C uses GC for.
How about incentives to write safe code even in C? They do not exist.
You are not rewarded for:
1) Formal proofs or careful programming. No one cares if a piece of software works quietly for years.
2) Preventing others from ruining a working piece of software. To the contrary, you will be called a gatekeeper and worse things.
You are rewarded for:
1) Wild ideas, quickly and badly implemented with the proper amount of marketing.
2) Churn, "social" coding, and LGTM.
3) Ironically, if you are a security researcher, finding exploits can help, too. As above, preventing exploits in the first place is regarded as a waste of time.
All of the above is true at Google. But of course they have a technical solution to a social problem. Which might catch one category of bugs at best.
Being completely serious, people will use whatever works. If what works is written in C, people will use it. The average person seriously doesn't care what language a thing is written in. The average person cares that the software in question works. Despite being written in C, most software today works reasonably well. Is it perfect? No. Will the rusty equivalent be perfect on day 1? No.
Yeah, we know the market won't solve this. That's why people are talking about government standards.
I can't help but think that those lazy mathematicians might benefit from a congressional order to clean up that twin prime problem too.
If memory safety was "just the right regulations" easy, it would have already been solved. Every competent developer loves getting things right.
I can imagine a lot more "compliance" than success may be the result of any "progress" with that approach.
The basic problem is challenging, but makes it hard-hard is the addition of a mountain of incidental complexity. Memory safety as a retrofit on languages, tools and code bases is a much bigger challenge than starting with something simple and memory safe, and then working back up to something with all the bells and whistles that mature tool ecosystems provide for squeaking out that last bit of efficiency. Programs get judged 100% on efficiency (how fast can you get this working? how fast does it run? how much is our energy/hardware/cloud bill?), and only 99% or so on safety.
If the world decided it could get by on a big drop in software/computer performance for a few years while we restarted with safer/simpler tools, change would be quick. But the economics would favor every defector so much that ... that approach is completely unrealistic.
It is going to get solved. The payoff is too high, and the pain is too great, for it not to. But not based on a concept of a plan or regulation.
> If memory safety was "just the right regulations" easy, it would have already been solved.
Memory safety is already a solved problem in regulated industries. It's not a hard problem as such. People just don't want to solve it and don't have any incentive to: companies aren't penalised for writing buggy software, and individual engineers are if anything rewarded for it.
> Every competent developer loves getting things right.
Unfortunately a lot of developers care more about being able to claim mastery of something hard. No-one gets cred for just writing the thing in Java and not worrying about memory issues, even though that's been a better technical choice for decades for the overwhelming majority of cases.
> Memory safety is already a solved problem in regulated industries. It's not a hard problem as such.
It's not hard, no, but it is expensive, because those regulations have a battery of tests run by a thirdy party that you will pay money to each time you want to recertify.
I've worked in two regulated industries; the recertification is the expensive part, not the memory errors.
> Memory safety is already a solved problem
Most famously in Rust. Even there it takes work.
The problem is a practical coding work efficiency (and quality) one. You are right that there are no intractable memory problems even in the unsafest least helpful languages.
Regulated industries have overwhelmingly boring and expensive software compared to others. They do things like banning recursion and dynamic arrays lol. Memory safety in every aspect possible just isn't worth it for most applications. And the degree of memory safety that is worth it is a lot less than Rust developers seem to think, and the degree of memory safety granted by Rust is less than they think as well.
Memory safety isn't worth it as long as leaking all your users' data (and granting attackers control over their systems) doesn't cost much. As attacks get more sophisticated and software gets more important, the costs of memory unsafety go up.
What you've said is true but I still think the problem is overblown, and solutions at the hardware level are disregarded in favor of dubious and more costly software rewrite solutions. If something like CHERI was common then it would automatically find most security-related memory usage bugs, and thus lead to existing software getting fixed for all hardware.
IME you can't reliably extract the intent from the C code, much less the binary, so you can't really fix these bugs without a human rewriting the source. The likes of CHERI might make exploitation harder, but it seems to me that ROP-style workarounds will always be possible, because fundamentally if the program is doing things that look like what it was meant to do then the hardware can never distinguish whether it's actually doing what it was meant to do or not. Even if you were able to come up with a system that ensured that standards-compliant C programs did not have memory bugs (which is already unlikely), that would still require a software rewrite approach in practice because all nontrivial C programs/libraries have latent undefined behaviour.
> IME you can't reliably extract the intent from the C code, much less the binary, so you can't really fix these bugs without a human rewriting the source.
I am pretty sure that the parent is talking about hardware memory safety which doesn't require any "human rewriting the source".
> I am pretty sure that the parent is talking about hardware memory safety which doesn't require any "human rewriting the source".
It does though. The hardware might catch an error (or an "error") and halt the program, but you still need a human to fix it.
> but you still need a human to fix it.
The same thing can be said about a Rust vector OOB panic or any other bug in any safe language. Bugs happen which is why programmers are employed in the first place!
> The same thing can be said about a Rust vector OOB panic or any other bug in any safe language. Bugs happen which is why programmers are employed in the first place!
Sure, the point is you're going to need the programmer either way, so "hardware security lets us detect the problem without rewriting the code" isn't really a compelling advantage for that approach.
If a program halts, that is a narrow security issue that will not leak data. Humans need to fix bugs, but that is nothing new. A memory bug with such features would be hardly more significant than any other bug, and people would get better at fixing them over time because they would be easier to detect.
> If a program halts, that is a narrow security issue that will not leak data.
Maybe. Depends what the fallback for the business that was using it is when that program doesn't run.
> Humans need to fix bugs, but that is nothing new. A memory bug with such features would be hardly more significant than any other bug
Perhaps. But it seems to me that the changes that you'd need to make to fix such a bug are much the same changes that you'd need to make to port the code to Rust or what have you, since ultimately in either case you have to prove that the memory access is correct. Indeed I'd argue that an approach that lets you find these bugs at compile time rather than run time has a distinct advantage.
>Perhaps. But it seems to me that the changes that you'd need to make to fix such a bug are much the same changes that you'd need to make to port the code to Rust or what have you, since ultimately in either case you have to prove that the memory access is correct.
No, you wouldn't need to prove that the memory access is correct if you relied on hardware features. Or I should say, that proof will be mostly done by compiler and library writers who implement the low level stuff like array allocations. The net lines of code changed would definitely be less than a complete rewrite, and would not require rediscovery of specifications that normally has to happen in the course of a rewrite.
>Indeed I'd argue that an approach that lets you find these bugs at compile time rather than run time has a distinct advantage.
It is an advantage but it's not free. Every compilation takes longer in a more restrictive language. The benefits would rapidly diminish with the number of instances of the program that run tests, which is incidentally one metric that correlates positively with how significant bugs actually are. You could think of it as free unit tests, almost. The extra hardware does have a cost but that cost is WAAAY lower than the cost of a wholesale rewrite.
> No, you wouldn't need to prove that the memory access is correct if you relied on hardware features. Or I should say, that proof will be mostly done by compiler and library writers who implement the low level stuff like array allocations. The net lines of code changed would definitely be less than a complete rewrite, and would not require rediscovery of specifications that normally has to happen in the course of a rewrite.
I don't see how the hardware features make this part any easier than a Rust-style borrow checker or avoid requiring the same rediscovery of specifications. Checking at runtime has some advantages (it means that if there are codepaths that are never actually run, you can skip getting those correct - although it's sometimes hard to tell the difference between a codepath that's never run and a codepath that's rarely run), but for every memory access that does happen, your compiler/runtime/hardware is answering the same question either way - "why is this memory access legitimate?" - and that's going to require the same amount of logic (and potentially involve arbitrarily complex aspects of the rest of the code) to answer in either setting.
The human might say, sorry my C program is not compatible with your hardware memory safety device. I won't/can't fix that.
That's possible but unlikely. I would be OK with requiring software bugs like that to be fixed, unless it can be explained away as impossible for some reason. We could almost certainly move toward requiring this kind of stuff to be fixed much more easily than we could do the commonly proposed "rewrite it in another language bro" path.
There's no such thing as hardware memory safety, with absolutely no change to the semantics of the machine as seen by the compiled C program. There are going to be false positives.
> There are going to be false positives
Of course, but compare it with rewriting it to a completely different language.
There may be some cases where code would need to be adjusted or annotated to use CHERI well, but that has to be easier than translating to or interfacing with another language.
How many modern apps are running inside a browser, one way or another? The world’s already taken that big drop on performance.
When you can’t convince people it’s better, you need to force them to do it.
Did you forget a /s? It seems that if you can't convince a majority of programmers that your new language is good enough to learn, maybe it actually isn't as good as its proponents claim. It is likely the case that rewriting everything in a new language for marginally less bugs is a worse outcome than just dealing with the bugs.
I agree. I don’t think we need a government computer language force. Terry is a prophet.
Mind you, the government has tried this before with Ada. Not to knock Ada but let's just say that government would ruin everything and stifle the industry. Certainly, any new regulations about anything as broad as how memory is allowed to be managed is going to strangle the software industry.
If this has to be forced, it probably isn't necessary or very beneficial. How much will it cost to conform to these "standards" versus not? Who stands to gain by making non-conformant software illegal? I think it is clearly far too expensive to rewrite all software and retrain all programmers to conform to arbitrary standards. Hardware solutions to improve memory safety already exist and may ultimately be the best way to achieve the goal.
It seems to me that Rust programmers, unhappy with the pace of adoption of Rust, seek to make other languages illegal because they do things different from Rust.
Use Rust for kernel/system programming, use Lisp/Go/Java/C# for backend, use Typescript+wasm for frontend. We have everything already.
That doesn't address existing codebases. Neither the Linux kernel nor the Chromium project is going to replace all its memory-unsafe code, so there are design challenges that need to be solved that are more complicated than "these memory-safe languages are available for your problem domain".
What is your opinion on deploying C++ codebases with mitigations like CFI and bounds checking?
Let us say I have a large C++ codebase which I am unwilling to rewrite in Rust. But I:
* Enable STL bounds checking using appropriate flags (like `-DGLIBCXX_ASSERTIONS`).
* Enable mitigations like CFI and shadow stacks.
How much less safe is "C++ w/ mitigations" than Rust? How much of the "70% CVE" statistic is relevant to my codebase?
We recorded an episode (there's a transcript) about this exact issue:
https://securitycryptographywhatever.com/2024/10/15/a-little...
With due respect, the blog you have linked looks like the average Rust marketing material. It does absolutely nothing to address my concerns. I did a `Ctrl-F` and found zero hits of any of the following terms:
* CFI
* isoheaps or type-stable allocators
* Shadow stacks
(There is just a single hit of "C++"...)
Ignoring the appeal to authority, I have a hard time believing that incrementally rewriting my C++ code in Rust or just writing new code in Rust ("vulnerabilities exponentially decay" and all that) is going to give me more actual security than the mitigations stated above. Most, if not all, high-profile exploits stem from out-of-bounds accesses and type confusions, which these mitigations prevent at very low cost.
Thanks for replying, though.
If what you're interested in is an "everything must be Rust" vs. "everything must be C++" knock-down drag-out, I'm not interested.
They prevent but do not entirely mitigate.
I am not interested in adhering to some arbitrary purity standard (like "memory safety" in this case). Almost always, purity ideologies are both irrational and harmful. What I am actually interested is to prevent real problems like remote code execution and Heartbleed-esque leakage of private data and for this, mitigations like CFI, shadow stacks and bounds checking are enough.
> They prevent but do not entirely mitigate.
Ignoring the semantic difference between "prevent" and "mitigate", if at the end of the day, the security provided by the two different approaches are quite similar, I don't get the problem.
If you have an example of a successful widespread exploit that would have happened even with these mitigations, please share.
They’re not enough. For example the field I work in (mobile exploits) continues to bypass CFI (PAC) via clever data-only attacks or abusing TOCTOU issues.
Nah, ima use rust for all of that because I’m too lazy to manage multiple tech stacks.
This, but I'm getting tired of people using Rust for things that really should be in C# or Java.
"really should be" ?
"in C# or Java" ?
where do you even base these claims? Do you know what C# and Java threads have that Rust doesn't? data races. And don't get me stated on the biggest paradigm failure that is OOP.
Projects I've seen at work. Projects posted on Hacker News. Data races aren't usually an issue for backend services, and modern Java/C# is multi-paradigm.
> Data races aren't usually an issue for backend services
I beg to differ unless all your logic is in the database with strong isolation guarantees.
Speaking of C# for backends that are using EF actively, I bet there are bugs in pretty much all of them caused by incorrect applications of optimistic concurrency.
[flagged]
Both have a garbage collection though that lead to higher developer productivity than compared to rust's affine types.
> higher developer productivity
Where have you been the past 5 years? Rust developers are insanely productive.
Can we put this myth to rest already? Rust being an "unproductive language" is thoroughly dis-proven.
There are domains where C# (and F#) productivity stems from similar reasons why writing a game in something that isn't Rust might be more productive without even sacrificing performance (or, at least, not to the drastic extent).
I can give you an example:
How would you write this idiomatically in Rust without using unsafe?To avoid misunderstanding, I think Rust is a great language and even if you are a C# developer who does not plan to actively use it, learning Rust is of great benefit still because it forces you to tackle the concepts that implicitly underpin C#/F# in an explicit way.
There's a few things here that make this hard in Rust:
First, the main controller may panic and die, leaving all those tasks still running; while they run, they still access the two local variables, `number` and `delay`, which are now out of scope. My best understanding is that this doesn't result in undefined behavior in C#, but it's going to be some sort of crash with unpredictable results.
I think the expectation is that tasks use all cores, so the tasks also have to be essentially Send + 'static, which kinda complicates everything in Rust. Some sort of scoped spawning would help, but that doesn't seem to be part of core Tokio.
In C#, the number variable is a simple integer, and while updating it is done safely, there's nothing that forces the programmer to use Interlocked.Read or anything like that. So the value is going to be potentially stale. In Rust, it has to be declared atomic at the start.
Despite the `await delay`, there's nothing that awaits the tasks to finish; that counter is going to continue incrementing for a while even after `await delay`, and if its value is fetched multiple times in the main task, it's going to give different results.
In C#, the increment is done in Acquire-Release mode. Given nothing waits for tasks to complete, perhaps I'd be happy with Relaxed increments and reads.
So in conclusion: I agree, but I think you're arguing against Async Rust, rather than Rust. If so, that's fair. It's pretty much universally agreed that Async Rust is difficult and not very ergonomic right now.
On the other hand, I'm happy Rust forced me to go through the issues, and now I understand the potential pitfalls and performance implications a C#-like solution would have.
Does this lead to the decision fatigue you mention in another sub-thread? It seems like it would, so I'll give you that.
For posterity, here's the Rust version I arrived at:
https://play.rust-lang.org/?version=stable&mode=debug&editio...I am not sure what are you trying to represent with this example, but here is the exact same thing without any unsafe:
use rayon::prelude::*;
use std::time::{Instant, Duration};
use std::sync::atomic::{AtomicUsize, Ordering};
fn main() {
}You can make it even simpler if you would use Mutex instead of atomics. Atomics are more performant though.
> How would you write this idiomatically in Rust without using unsafe?
Channels and selects. It's trivial.
Please post a snippet.
use std::{ sync::{ Arc, atomic::{AtomicBool, AtomicUsize, Ordering}, }, time::Duration, };
fn main() { let num = Arc::new(AtomicUsize::new(0)); let finished = Arc::new(AtomicBool::new(false));
What if we want to avoid explicitly spawning threads and blocking the current one every time we do this? Task.Run does not create a new thread besides those that are already in the threadpool (which can auto-scale, sure, but you get the idea, assuming the use of Tokio here).
What you're asking for is thread parking. Use tokio for that, it's still trivial.
I was implying that yes, while it is doable, it comes at 5x cognitive cost because of micromanagement it requires. This is somewhat doctored example but the "decision fatigue" that comes with writing Rust is very real. You write C# code, like in the example above, quickly without having to ponder on how you should approach it and move on to other parts of the application while in Rust there's a good chance you will be forced to deal with it in a much stricter way. It's less so of an issue in regular code but the moment you touch async - something that .NET's task and state machine abstractions solve on your behalf you will be forced to deal with by hand. This is, obviously, a tradeoff. There is no way for .NET to use async to implement bare metal cooperative multi-tasking, while it is very real and highly impressive ability of Rust. But you don't always need that, and C# offers an ability to compete with Rust and C++ in performance in critical paths when you need to sit down and optimize it unmatched by other languages of "similar" class (e.g. Java, Go). At the end of the day, both languages have domains they are strong at. C# suffers from design decisions that it cannot walk back and subpar developer culture (and poor program architecture preferences), Rust suffers from being abrasive in some scenarios and overly ceremonious in others. But other than that both provide excellent sets of tradeoffs. In 2025, we're spoiled with choice when it comes to performant memory-safe programming languages.
To be honest this sounds like something someone inexperienced would do in any language.
If you're not comfortable in a language, then sure you ponder and pontificate and wonder about what the right approach is, but if you're experienced and familiar then you just do it plain and simple.
What you're describing is not at all a language issue, it's an issue of familiarity and competency.
It's literally not 5x the cost, it would take me 3 minutes to whip up a tokio example. I've done both. I like C# too, I totally understand why you like it so much. This is not a C# vs Rust argument for me. All I'm saying is that Rust is a productive language.
Rust is manual by design because people need to micro-manage resources. If you are experienced in it, it still takes a very little time to code your scenario.
Obviously if you don't like the manual-ness of Rust, just use something else. For what you described I'd reach for Elixir or Golang.
I was disagreeing with you that it's not easy or too difficult. Rust just takes a bit of effort and ramping up to get good at. I hated it as well to begin with.
But again -- niches. Rust serves its niche extremely well. For other niches there are other languages and runtimes.
> Elixir or Golang
That would be an incredible downgrade.
Downgrade compared to what?
One-liners work best in comedy, dude.
- I recommend reading the comment history of @neonsunset. He has shared quite some insights, snippets and benchmarks to make the case that if you do not need the absolute bare metal control C or Rust provides, you are better of with either .net or Java.
- Whereas in .net you have the best native interop imaginable for a high level language with a vast SDK. I understood that Java has improved on JNI, but I am not sure how well that compares.
- Programming languages are like a religion, highly inflammable, so I can imagine you would not be swayed by some rando on the internet. I would already be happy if you choose Go over Python, as with the former you win some type safety (but still have a weak type system) and have a good package manager and deployment story.
- Go was designed for Google, to prevent their college grads from implementing bad abstractions. But good abstractions are valuable. A weak type system isn't a great idea (opinion, but reasonable opinion). Back then .net was not really open source (I believe) and not as slim and fast as it is now, and even then, I think Google wants to have control about their own language for their own internal needs.
- Therefore, if you are not Google, Go should likely not be your top pick. Limited area of application, regrettable decisions, tailored for Google.
---
(Sorry for a reply all over the place.)
---
> if you do not need the absolute bare metal control C or Rust provides, you are better of with either .net or Java.
That, like your next point, is a relatively fair statement but it's prone to filter bubble bias as I am sure you are aware. I for example have extracted much more value out of Golang... and I had 9 years with Java, back in the EJB years.
Java and .NET are both VM-based and have noticeable startup time. As such, they are best suited for servers and not for tooling. Golang and Rust (and Zig, and D, V and many other compiled languages) are much better in those areas.
> Programming languages are like a religion, highly inflammable, so I can imagine you would not be swayed by some rando on the internet
Those years are long past me. I form my own opinions and I have enough experience to be able to make fairly accurate assessments with minimum information.
> I would already be happy if you choose Go over Python, as with the former you win some type safety (but still have a weak type system) and have a good package manager and deployment story.
I do exactly that. In fact I am a prominent Python hater. Its fans have gone to admirable heights in their striving to fill the gaps but I wonder will they one day realize this is unnecessary and just go where those gaps don't exist. Maybe never?
And yeah I use Golang for my own purposes. Even thinking of authoring my own bespoke sync and backup solution stepping on Syncthing, GIT and SSH+rsync and package it in Golang. Shell scripts become unpredictable from one scale and on.
> Go was designed for Google, to prevent their college grads from implementing bad abstractions. But good abstractions are valuable. A weak type system isn't a great idea (opinion, but reasonable opinion). Back then .net was not really open source (I believe) and not as slim and fast as it is now, and even then, I think Google wants to have control about their own language for their own internal needs.
That and your next point I fully agree with. That being said, Golang is good enough and I believe many of us here preach "don't let perfect be the enemy of good". And Golang symbolizes exactly that IMO; it's good enough but when your requirements start shooting up then it becomes mediocre. I have stumbled upon Golang's limitations, fairly quickly sadly, that's why I am confining it to certain kinds of projects only (and personal tinkering).
> I recommend reading the comment history of @neonsunset.
I don't mind doing that (per se) but I find appeals to authority and credentialism a bit irritating, I admit.
Plus, his reply was in my eyes a fairly low-effort snark.
(I would note that EJB is something from the past. Like .net has also really grown.)
> that's why I am confining it to certain kinds of projects only (and personal tinkering).
You have a fair view of Go, I think. I could see that it makes sense to use it as a replacement for bash scripts, especially if you know the language well. Personally I am wanting to dive into using F# for my shell scripting needs. The language leans well into those kind of applications with the pipe operator.
If you ever have the appetite, you should take a look at it as it can be run in interpreted/REPL mode too, which is a nice bonus for quick one-of scripts.
https://blog.lucca.io/2022/05/19/fsharp-script
> Java and .NET are both VM-based and have noticeable startup time. As such, they are best suited for servers and not for tooling. Golang and Rust (and Zig, and D, V and many other compiled languages) are much better in those areas.
For JIT-based deployments, it is measured in 100-500ms depending on the size of application, sometimes below. .NET has first-party support for NativeAOT deployment mode for a variety workloads: web servers, CLI tools, GUI applications and more.
Go is a VM-based language, where VM provides facilities such as virtual threading with goroutines (which is higher level of abstraction than .NET's execution model), GC, reflection, special handling for FFI. Identical to what .NET does. I don't think the cost and performance profile of BEAM needs additional commentary :)
Go also has weaker GC and compiler implementations and, on optimized code, cannot reach the performance grade of C++ and Rust, something C# can do.
> Those years are long past me. I form my own opinions and I have enough experience to be able to make fairly accurate assessments with minimum information.
The comments under your profile seem to suggest the opposite. Perhaps "minimum information" is impeding fair judgement?
> I don't mind doing that (per se) but I find appeals to authority and credentialism a bit irritating, I admit.
Is there a comment you have in mind which you think is engaging in credentialism?
> Is there a comment you have in mind which you think is engaging in credentialism?
The other guy who told me to inspect your profile. Not you.
> The comments under your profile seem to suggest the opposite.
Sigh. You seem to act in bad faith which immediately loses my interest.
You'll have to take your high and mighty attitude to somebody else. Seems like even HN is not immune from... certain tropes, shall we call them, generously.
Disengaging.
Even when it's used by mediocre developers, which is probably more than 90% of us, myself very much included? All I've been seeing is Rust being used by very enthusiastic and/or talented developers, who will be productive in any language.
> Rust developers are insanely productive.
If your baseline is a language that is missing some features that were in standard ML, sure. If you were already using OCaml or F#, Rust doesn't make you any more productive. If you were already using Haskell or Scala, Rust's lack of HKT will actively slow you down.
Rust is the silver bullet we all been waiting for?
No, that's the other end of the stick. Its en par with other languages
Well, put any language against Rust and Rustaceans would argue Rust is better than those languages so ... Silver bullet no?
If the word “Rustaceans” is actually in common use then rust loses by default.
"Use lisp for backend" lol
Its easier and saner than you think.
No massive churn, quite performant and the code i wrote 20 years ago runs without modification.
Can other blub languages claim this?
It worked for Yahoo early days, SISCOG, ITA Software,...
Even a site called HN if you happen to know it.
I do this for my personal hobby projects, but that's as much to deter use by technology enthusiasts as anything.
Works well, got problems with it (outside pure ignorance), pick something else; same difference to me. lol.
Nope, Elixir for backend.
We need the BEAM VM's guarantees, not yesterday but like 20 years ago, everywhere. The language itself does not matter. But we need that runtime's guarantees!
What is it specifically about the BEAM VM that positions it above, say, Go on K8S?
Not having to learn K8s
Ability to manage tens of thousands of stateful connections without the 95th percentile requests jumping to 5 seconds.
Just to start with.
I get the appeal, but without strong typing it's a no-go in my book. Get me an elixir with proper types and we can talk.
I believe they have already started this effort. https://elixir-lang.org/blog/2023/06/22/type-system-updates-...
They have but it's mostly a labor of love and it's very difficult to fit a static type system into a dynamically typed language.
We already have some false positives. Happily the team is very motivated and is grinding away at them, for which we the community are forever grateful.
Oh I agree. The Erlang/Elixir ecosystem is in danger of Rust inventing a BEAM-like runtime and making it irrelevant.
Elixir is Lisp with sprinkles on top
Lisp for people who hate parentheses.
No, it's LISP for people who understand multicore CPUs exist for a long time now.
Modern LISP dialect authors still believe threads are a super clever idea which is just... /facepalm.
I worked on a provenance system which would be so completely the wrong solution to this problem that I only bring it up because the 100,000 foot view is still relevant.
I think we are eventually going to end up with some sort of tagged memory with what this is for (such as credentials) and rules about who is allowed to touch it and where it's allowed to go. Instead of writing yet another tool that won't let fields called "password" or "token" or "key" be printed into the logs, but misses "pk", it's going to be no printing any memory block in this arena, period.
I also think we aren't doing enough basic things with backend systems like keeping just a user ID in the main database schema and putting all of the PII in a completely different cluster of machines, that has 1/10 to 1/100th of the sudoer entries on it of any other service in the company. I know these systems are out there, my complaint is we should be talking about them all the time, to push the Recency Effect and/or Primacy Effect hard.
Perl has had a limited data tagging system for decades now, called "taint checking".
If enabled (through a command line switch), all data coming in from the outside (sockets, STDIN etc.) are "tainted", and if you e.g. concatenate a non-tainted and a tainted string, it becomes tainted. Certain sensitive operations, like system() calls or open(), raise an error when used with tainted data.
If you match tainted data with a regex, the match groups are automatically untainted.
It's not perfect, but it demonstrates that such data tagging is possible, and quite feasible if integrated early enough in the language.
Rails has a “safe” attribute that only works for html output, and doesn’t work right for urls (a bug that somehow became my responsibility to fix many times). It’s a limited version of the same thing and I believe Elixir has the same design, and I’ve already seen a reproduction of the Rails flaw in Elixir.
But they are Boolean values and they need to be an enumeration or likely a bitfield. Even just for web I’ve already identified four in this thread. HTML unsafe, url unsafe, PII unsafe, credentials unsafe. I hesitate to add SQL unsafe because the only solution to sql injection is NO STRING CONCATENATION. But so many SQL libraries use concatenation even for prepared statements that maybe it should be. Only allow string constants for sql queries.
While I agree with you as a matter of an ideal, the step from one database to two is infinitely larger than from two to more. Given budget, time and engineering constraints, sticking everything in one database is by far the sanest solution for the vast majority of code out there.
I think microservices kind of break that wall with a herd of stampeding elephants being chased by even angrier bees.
If you're still using one database in 2025, even if you wouldn't touch microservices with a ten foot pole, then you've got some problems.
OLAP, KV, cache heirarchies, you aren't running a singular database, except maybe for a standalone app.
Solaris SPARC ADI has had tagged memory for quite some time now.
Summary: We know, we know, but don't make us rewrite everything in Rust.
This post seems a lot more informative to me: "It Is Time to Standardize Principles and Practices for Software Memory Safety" (https://cacm.acm.org/opinion/it-is-time-to-standardize-princ...)
I am 100% in favor of industry standards to enforce safety. It should go way past just memory safety, though. Engineering standards should include practices and minimum requirements to prevent safety issues as a whole.
The key to progress in a lot of cases is to do it incrementally. If you make something too hard to chew, people won't bite.
Programmers will invent new languages and demand new hardware architectures rather than ~~go to therapy~~ use a garbage collector.
Related:
https://news.ycombinator.com/item?id=42962020 - It is time to standardize principles and practices for software memory safety (2025-02-06, 100 comments)
Happy to see the mention of Kotlin's memory safety features here; goes a bit beyond Java with its null safety, encouragement of immutability and smart casting.
I was actually a little surprised to see that in there, I wouldn't really consider those features to be "memory safety" as I traditionally see it over Java.
They don't really lead to exploitable conditions, except maybe DoS if you have poor error handling and your application dies with null pointer exceptions.
Dont forget being very strict typed too.
The most common memory safety bug in released software is array overflow. This is easily corrected in C by adding a small extension:
https://www.digitalmars.com/articles/C-biggest-mistake.html
> This is easily corrected in C by adding a small extension
Unfortunately, easily corrected it is not. Yes, probably >95% of arrays have a nearby easily-accessible length parameter that indicates their maximum legal length (excluding the security disaster of null-terminate strings). But the problem is there's no consistent way that people do this. Sometimes people put pointer first and size second, sometimes it's the other way around. Sometimes the size is a size_t, sometimes an unsigned, sometimes an int. Or sometimes it's not a pointer-and-size, but pointer and one-past-the-end pointer pair. Sometimes multiple arrays share the same size parameter.
So instead of an easy solution getting you 90% for effectively free, you get like 30% with the easy solution, and have to make it more complicated to handle the existing diversity to push it back up to that 90%.
This extension was added in D at its start, and 25 years of experience with it shows that it is possibly D's best loved feature. It's a huge win.
> The most common memory safety bug in released software is array overflow.
Do you have a source for this? I thought it was use after free.
CWE Top 25: https://cwe.mitre.org/top25/archive/2024/2024_cwe_top25.html
Out of bounds write and read are more prevalent then UAF. There are multiple types of bugs that can produce OOB read or write though.
Not offhand, but every list of the common security bugs shows it's the top, by a wide margin.
That (or, more broadly, memory lifecycle bugs) would be my guess too.
Which WG14 keeps refusing to do, regardless of how often this is pointed out.
I know, which was the motivation for D.
They could fix typeof to make it something useful. Like the ability to take the type of something and pass it around.
And add slice and buffer typedefs to the standard library. Especially since they added counted_by to the language.
Add a way to define a slice the points to a c string.
Yeah, but looking back at the attempts to extend C a bit for safety, the only one that seems to have got market traction is MISRA and even then it's pretty limited. D, rust, zig etc all seem to have much more buy in. There must be some reason why a new language works better here -I mean, you're basing your business off D, not a C extension, right?
This idea is just adapting the D version of it.
That's.. kind of my point. This mechanism has seen more adoption in a new language than in the existing language. I'm sure it would work technically - there must be some other reason why it's easier to get a new language adopted .
> This is easily corrected in C by adding a small extension
It's so easy that thousands of developers trying for 40+ years haven't been able to do it yet.
They don't see the value in it. You have to use it a while to see how much time it saves you not chasing down memory corruption bugs.
while I agree on the idea I clearly see that instead of proposing solutions, a bag of programming languages are suggested.
No mention of Carbon, I see.
Carbon naysayers keep forgeting to read the part it is an experimental language.
It's not as if an experimental label would keep Google from deploying something. We all know their lack of software testing.
It only got an initial backend like a couple of months ago.
The site clearly mentions folks to use something else, if they want to write safe code today.
:(
How many security holes are caused by not sanitizing inputs, as opposed to memory safety? It feels like not sanitizing inputs is what enables memory safety exploits, in addition to many other classes of security hole, yet nobody seems to talk about it.
- Buffer overflow: somebody didn't sanitize the input (length of buffer).
- Stack smashing: somebody didn't sanitize the input (length of input).
- Format string vulnerability: somebody didn't sanitize the format string data.
- Integer converstion vulnerability: somebody didn't sanitize the integer input.
- SQL injection: somebody didn't sanitize the input before querying a database.
- Cross-site scripting: somebody didn't sanitize input and it got submitted and executed on behalf of the user.
- Remote file inclusion / Directory traversal: somebody didn't sanitize an input variable leading to a file path.
...and on, and on, and on. If people were as obsessed with input sanitization as they are with memory, I'll bet you a much larger percentage of attacks would be stopped. Too bad input sanitization isn't sexy.
SQL Injection and XSS are actually great examples of vuln classes where the winning strategy is safe APIs rather than diligent sanitization. "Just sanitize all your user inputs" is hard to do correctly because it is difficult to create automatic rules that detect every single possible violation.
Prepared statements and safe HTML construction APIs plus some linters that scream at you when you use the unsafe APIs works like magic.
You're correct. It's about distinction between code and data.
You should simply discern between HTML elements (code) and HTML text nodes (data). Same with prepared statements: Clear distinction between SQL code vs SQL data.
You just need to ensure that your data is never interpreted as code.
> You just need to ensure that your data is never interpreted as code.
That's sanitization. Many different languages implement this. The old-school method is "tainting" data so it can't be used as part of execution without an explicit function call to "untaint" it. Same is used for "secret" data in various programs where you don't want it leaked.
> - SQL injection: somebody didn't sanitize the input before querying a database.
> - Cross-site scripting: somebody didn't sanitize input and it got submitted and executed on behalf of the user.
To be technical about it, this is generally a failure of escaping rather than sanitizing.
You're supposed to be able to put anything into a database field and not have it affect the query. You ought to be able to paste JavaScript into a field and have it be displayed as JavaScript, not executed. The inputs remain as they are -- no sanitization -- they just have to be escaped properly.
That being said, I'm 100% on board about the importance of sanitation/validation. To the extent I think it ought to be part of the design of languages just like types. I.e. if a function parameter is only allowed to be three string values, or a string between 0 and 10 bytes, or a string limited to lowercase ASCII, these should be expressable.
Sanitizing inputs in important in addition to memory safety.
Sanitizing inputs won't protect you against all bugs. For example, you may store a string in a 256-byte buffer, so you check that the string is no longer than 256 characters, but you forget the zero-terminator, and you have a buffer overflow. Or maybe you properly limited the string to 255 characters, but along the way, you added support for multibyte characters, and you get another buffer overflow.
Bound checking would have caught that.
Injection can happen at any point. You may sanitize user input to avoid SQL injection, but at some later point it may get out of the database and in a format string, but it was sanitized for SQL, not for format strings, leading to a potential injection.
"Sanitising" doesn't work. It's an exploit mitigation strategy, not a sound way of actually preventing bugs. And it doesn't prevent many of the vulnerabilities you list, because many of the things that cause issues don't come from "input" at all (e.g. a lot of buffer overflows can be triggered with "legitimate" input that isn't and couldn't be caught by input sanitisation).
> It's an exploit mitigation strategy, not a sound way of actually preventing bugs.
Actually it is a simple and effective way to prevent general bugs.
If you have an input field called birthday, you can inject it directly into your database. Doing that could cause an SQL injection exploit, so people use prepared statements.
But even if you use prepared statements, you'll still end up with a database column with all kinds of birthday formatting (M-D-Y, Y-M-D, slashes, spaces, colons, etc etc). These different, non-standard formats will eventually cause a bug.
Input sanitization forces you to standardize on one birthday format, and inject it into your database in one format (let's say "YYYY-MM-DD", no other characters allowed). Then all the code expects - and gets - one format. This reduces bugs from unexpected formats.
It has the added side-effect of also eliminating the SQL injection bug, regardless of prepared statement.
> Input sanitization forces you to standardize on one birthday format, and inject it into your database in one format (let's say "YYYY-MM-DD", no other characters allowed). Then all the code expects - and gets - one format.
That's not sanitisation as the word is normally used. That's parsing and canonicalisation. That's a good path to actual security - it leads you to the "make invalid states unrepresentable" style and using decent type systems.
Your distinction between stack smashing and buffer overflow is puzzling
A buffer overflow is just exceeding the size of a buffer, and stack smashing is a modification of either the stack itself or the stack pointer to result in different operations on the stack. These methods can coincide, and they can also exist independently of each other
Yep. Hence my comment. Smashing the stack is an afterward condition of some kind of memory error. If you’re going to do that, why not mention heap overflow and others? It just seems unusual.
Two reasons
- modern languages prevent you from having to think about it at all. You shouldn’t have to sanitize, it should work
- there are a load of issues that have nothing to do with sanitization but everything to do with memory. Race conditions, type confusion, UAF to name a couple. If we focus on sanitization, we still need memory safety to fix those
Also sanitization is non-trivial for complex inputs
> Looking forward, we're also seeing exciting and promising developments in hardware. Technologies like ARM's Memory Tagging Extension (MTE) and the Capability Hardware Enhanced RISC Instructions (CHERI) architecture offer a complementary defense, particularly for existing code.
IIRC there's some way that a Python C extension can accidentally disable the NX bit for the whole process.. https://news.ycombinator.com/item?id=40474510#40486181 :
>>> IIRC, with CPython the NX bit doesn't work when any imported C extension has nested functions / trampolines
>> How should CPython support the mseal() syscall? [which was merged in Linux kernel 6.10]
> We are collaborating with industry and academic partners to develop potential standards, and our joint authorship of the recent CACM call-to-action marks an important first step in this process. In addition, as outlined in our Secure by Design whitepaper and in our memory safety strategy, we are deeply committed to building security into the foundation of our products and services.
> That's why we're also investing in techniques to improve the safety of our existing C++ codebase by design, such as deploying hardened libc++.
Secureblue; https://github.com/secureblue/Trivalent has hardened_malloc.
Memory safety notes and Wikipedia concept URIs: https://news.ycombinator.com/item?id=33563857
...
A graded memory safety standard is one aspect of security.
> Tailor memory safety requirements based on need: The framework should establish different levels of safety assurance, akin to SLSA levels, recognizing that different applications have different security needs and cost constraints. Similarly, we likely need distinct guidance for developing new systems and improving existing codebases. For instance, we probably do not need every single piece of code to be formally proven. This allows for tailored security, ensuring appropriate levels of memory safety for various contexts.
> Enable objective assessment: The framework should define clear criteria and potentially metrics for assessing memory safety and compliance with a given level of assurance. The goal would be to objectively compare the memory safety assurance of different software components or systems, much like we assess energy efficiency today. This will move us beyond subjective claims and towards objective and comparable security properties across products.
[dead]
[dead]
[flagged]
Humans too!
Why not mandate ecc ram?
Does ECC do anything with memory safety? This is about physical errors while the article is talking about software bugs. Those two are almost orthogonal.
Lobbying from Intel probably.
Intel supports ECC, so does AMD, so why would they lobby against it? Intel uses it for market segmentation, but I don't think it is a big deal.
It is just that those who build consumer-grade hardware don't want to spend 12% more on RAM for slightly less performance. Among them are essentially all ARM devices, including smartphones and Apple silicon Macs.
Don't look at the specs for LPDDR6...
The article in question is published on Google's blog. Has Google resolved memory safety issues in its C++ code base? Did G port their code base to Rust or some other memsafe language? What's preventing them from doing that by themselves?
What's preventing Microsoft, or Apple, or the coagulate Linux kernel team, or any other kernel team, from adopting memsafe technology or practice by themselves for themselves?
The last thing we need are what are evidently incompetent organizations that can't take care of their own products making standards, or useless academics making standards to try to force other people to follow rules because they know better than everyone else.
If the team that designed and implemented KeyKos, or that designed Erlang, were pushing for standardized definitions or levels of memory safety, it would be less ridiculous.
At the same time, consciousness of security issues and memory safety has been growing quickly, and memory safety in programming languages has literally exploded in importance. It's treated in every new PL I've seen.
Putting pressure on big companies to fix their awful products is fine. No pressure needs to be applied to the rest of the industry, because it's already outpacing all of the producing entities that are calling for standards.
The idea that Google is "evidently incompetent" for failing to resolve memory safety issues in their decades-old, giant codebase is dumb.
If Google has failed so far to resolve mem safety issues in their decades old giant code base, then I'd rather hear standardization ideas from someone who succeeded. If G succeeded at resolving those issues, then that's a concrete positive example for the rest of industry to consider following. They ought to lead by example.
It seems like decades-old giant code bases are precisely the ones hardest to migrate to memory safety. That's where coercion and enforcement is needed most. You and I don't need to be told to start a new project in not-C++ do we? Nearly every trained programmer has been brainwashed (in a good way) with formal methods, type systems, bounds checking, and security concerns. Now those same people who champion this stuff say it isn't enough, and therefore we need to do more of the same but with coercion. That's a failure to understand the problem.
> If Google has failed so far to resolve mem safety issues in their decades old giant code base, then I'd rather hear standardization ideas from someone who succeeded. If G succeeded at resolving those issues, then that's a concrete positive example for the rest of industry to consider following. They ought to lead by example.
Google saw "the percentage of memory safety vulnerabilities in Android dropped from 76% to 24% over 6 years as development shifted to memory safe languages" - which I'd say is a positive example.
It's not that they've already fully succeeded (I don't think anyone has on codebases of this size), but neither is it that they tried and failed - it's an ongoing effort.
> You and I don't need to be told to start a new project in not-C++ do we?
Don't need to be told because we all already avoid C++, or don't need to be told because it doesn't really matter if we do use C++?
I'd disagree with both. There are still many new projects (or new components of larger systems) being written in C++, and it's new code that tends to have the most vulnerabilities.