The need for memory safety standards

security.googleblog.com

120 points by todsacerdoti 3 days ago

pizlonator 3 days ago

So many mentions of CHERI - both in this post and in the linked CACM article. I doubt CHERI will be the future considering how long it's been around for and how few actually successes have come out of it.

Also, hilariously, Fil-C is faster than CHERI today (the fastest CHERI silicon available today will be slower than Fil-C running on my laptop, probably by an order of magnitude). And Fil-C is safer.

Which sort of brings up another issue - this could be a case where Google is trying to angle for regulations that support their favorite strategy while excluding competition they don't like.

philzook 2 days ago

What is the distinction between this approach and Address Sanitizer https://clang.llvm.org/docs/AddressSanitizer.html ? If I understand correctly, Fil-C is a modified version of LLVM. Is your metadata more lightweight, catches more bugs? Could it become a pass in regular LLVM?
- pizlonator 2 days ago
  
  Fil-C is memory safe. Asan isn't.
  For example, asan will totally let you access out of bounds of an object. Say buf[index] is an out-of-bounds access that ends up inside of another object. Asan will allow that. Fil-C won't. That's kind of a key spacial safety protection.
  Asan is for finding bugs, at best. Fil-C is for actually making your code memory safe.
thephyber 3 days ago

Sorry, I’m not a C specialist. What are Fil-C and CHERI? eg. Safe subsets of C, static analysis tools, C toolsets to ensure memory safety?
- warkdarrior 3 days ago
  
  CHERI is a hardware architecture and instruction set to add safety-related capabilities to processors. See https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/ In this context a capability means a way to track and enforce which memory area a pointer can point into. Typically this has to be coupled with a compiler which will initialize the capability for each pointer.
  Fil-C seems to be a C variant that adds capabilities and garbage collection. See https://github.com/pizlonator/llvm-project-deluge/blob/delug...
pjmlp 2 days ago

It isn't CHERI, however Solaris SPARC ADI has more than proven its usefulness, it isn't more widely deployed due to the reasons we all know.
whatagreatboy 2 days ago

> Fil-C is currently 1.5x slower than normal C in good cases, and about 4x slower in the worst cases. I'm actively working on performance optimizations for Fil-C, so that 4x number will go down.
I am pretty sure you cannot go much lower than 1.2 here on the best cases. In contrast, CHERI on good hardware will easily be as close to current performance as possible.
- ngrilly 2 days ago
  
  Do you say that CHERI will be "as close to current performance as possible" with the same energy consumption?
- pizlonator 2 days ago
  
  Cheri hardware is going to be more than 1.2x slower than the fastest non Cheri hardware.
ajb 3 days ago

Not really buying your thesis here: Attempts to retrofit safely to C have been around a lot longer than Cheri; fil-c is just the latest and there's no obvious reason why it should be more successful.
The speed comparison with a laptop is just disingenuous. Is a device with Cheri integrated slower than one of the same class without?
- musicale a day ago
  
  > Attempts to retrofit safely to C have been around a lot longer than Cheri; fil-c is just the latest and there's no obvious reason why it should be more successful.
  It is true that memory-safe C compilers have existed for decades and have seen minimal adoption.
  However, improvements to clang/llvm could yield wider impact and benefit than previous efforts, since they may be supported in a widely used C toolchain.
  -fbounds-safety is another change that may see more adoption if it makes it into mainline clang/llvm
  https://clang.llvm.org/docs/BoundsSafetyAdoptionGuide.html
- pizlonator 2 days ago
  
  > Not really buying your thesis here: Attempts to retrofit safely to C have been around a lot longer than Cheri; fil-c is just the latest and there's no obvious reason why it should be more successful.
  Here's the difference with Fil-C: It's totally memory safe and fanatically compatible with C/C++, no B.S. The other approaches tend to either create a different language (not compatible), or tend to create a superset of C/C++ with a safe subset (not totally memory safe, only compatible in the sense that unsafe code continues to be unsafe), or just don't fully square the circle (for example SoftBound could have gotten to full compatibility and safety, but the implementation just didn't).
  > The speed comparison with a laptop is just disingenuous. Is a device with Cheri integrated slower than one of the same class without?
  Not disingenuous at all.
  The issue is that:
  - high volume silicon tends to outperform low volume silicon. Fil-C runs on x86_64 (and could run on ARM64 if I had the resources to regularly test on it). So, Fil-C runs on the high volume stuff that gets all of the best optimizations.
  - silicon optimization is best done incrementally on top of an already fast chip, where the software being optimized for already runs on that chip. Then it's a matter of collecting traces on that software and tuning, rather than having to think through how to optimize a fast chip from first principles. CHERI means a new register file and new instructions so it doesn't lend itself well to the incremental optimization.
  So, I suspect Fil-C will always be faster than CHERI. This is especially true if you consider that there are lots of possible optimizations to Fil-C that I just haven't had a chance to land yet.
  - woodruffw 2 days ago
    
    I've been really impressed with what you're doing with Fil-C, but:
    > Here's the difference with Fil-C: It's totally memory safe and fanatically compatible with C/C++, no B.S.
    Is this true on both counts? If I'm reading your docs right, you're essentially adding hidden capabilities to pointers. This is a great technique that gives you almost perfect machine-level compatibility by default, but it comes with the standard caveats:
    1. Your type safety/confusion guards are essentially tied to pointer "color," and colors are finite. In other words, in a sufficiently large program, an attacker can still perform type confusion by finding types with overlapping colors. Not an issue in small codebases, but maybe in browser- or kernel-sized ones.
    2. In terms of compatibility, I'm pretty sure this doesn't allow a handful of pretty common pointer-integer roundtrip operations, at least not without having the user/programmer reassign the capability to the pointer that's been created out of "thin air." You could argue correctly that this is a bad thing that programmers shouldn't be doing, but it's well-defined and common enough IME.
    (You also cited my blog's writeup of `totally_safe_transmute` as an example of something that Fil-C would prevent, but I'm not sure I agree: the I/O effect in that example means that the program could thwart the runtime checks themselves. Of course, it's fair to say that /proc/self/mem is a stunt anyways.)
    
    pizlonator 2 days ago
    
    My type safety and confusion guards have nothing to do with colors of any kind. There are no finite colors to run out of.
    Fil-C totally allows pointer to integer round tripping in many cases, if the compiler can see it’s safe.
    I’m citing unsafe transmute as something that Fil-C doesn’t prevent. It’s not something that memory safety prevents.
    
    woodruffw 2 days ago
    
    > My type safety and confusion guards have nothing to do with colors of any kind. There are no finite colors to run out of.
    I'm having trouble seeing where the type confusion protection properties come from, then. I read through your earlier (I think?) design that involved isoheaps and it made sense in that context, but the newer stuff (in `gimso_semantics.md` and `invisicap.txt`) seems to mostly be around bounds checking instead. Apologies if I'm missing something obvious.
    > I’m citing unsafe transmute as something that Fil-C doesn’t prevent. It’s not something that memory safety prevents.
    I think the phrasing is confusing, because this is what the manifesto says:
    > No program accepted by the Fil-C compiler can possibly go on to escape out of the Fil-C type system.
    This to me suggests that Fil-C's type system detects I/O effects, but I take it that wasn't the intended suggestion.
    
    pizlonator 2 days ago
    
    Here’s a write up that goes into more detail: https://github.com/pizlonator/llvm-project-deluge/blob/delug...
    
    woodruffw 2 days ago
    
    I read that, but I'm not seeing where the type is encoded in the capability.
    
    pizlonator 2 days ago
    
    Short answer: if you know how CHERI and SoftBound do it, then Fil-C is basically like that.
    Long answer: let's assume 64-bit (8 byte pointers) without loss of generality. Each capability knows, for each 8 bytes in its allocation, whether those 8 bytes are a pointer, and if so, what that pointer's capability is.
    Example:
    char* p = malloc(64);
    This will allocate 64 bytes. p's capability will know, for each of the 8 8-byte slots, if that slot is a pointer and if so, what it's capability is. Since you just allocated the object, none of them have capabilities.
    Then if you do:
    *(int**)(p + 8) = malloc(sizeof(int));
    Then p's capability will know that at offset 8, there is a pointer, and it will know that the capability is whatever came out of the malloc.
    Hence, each capability is dynamically tracking where the pointers are. So it's not a static type but rather something that can change over time.
    There's a bunch of engineering that goes into this being safe under races (it is) and for supporting pointer atomics (they just work).
    
    pizlonator a day ago
    
    BTW I wrote another doc to try to explain what's happening. Hope this helps.
    https://github.com/pizlonator/llvm-project-deluge/blob/delug...
    
    woodruffw a day ago
    
    Thank you, I found this very helpful!
  - amluto 2 days ago
    
    I read the Fil-C overview, and I was confused by one thing: how does Fil-C handle integer-to-pointer conversions? Rust has the new strict provenance API that is somewhat explicitly designed to avoid a need to materialize a pointer capability from just an integer, but C and C++ have no such thing. So if the code does:
    int deref(uintptr_t p) { return *(int*)p; }
    Does this fail unconditionally? Or is there some trick by which it can succeed if p is valid? And, if the latter is the case, then how is memory safety preserved?
    edit: I found zptrtable and this construct:
    https://github.com/pizlonator/pizlonated-quickjs/commit/258a...
    The latter seems to indicate that:
    (int*)((uintptr_t)p + 1)
    is at least a semi-reliable way to increment p by one. I guess this is a decent way to look like C and to keep a widely-used pattern functional. Rust’s with_addr seems like a more explicit and less magical way to accomplish the same thing. If Fil-C really takes off, would you want to add something like with_addr? Is allowing the pair of conversions on the same line of code something that can be fully specified and can be guaranteed to compile correctly such that it never accidentally produces a pointer with no capability?
    
    pizlonator 2 days ago
    
    Your deref function will fail, yeah.
    The pair of conversions is guaranreed to always produce a pointer with a capability. That’s how I implemented it and it’s low-tech enough that it could be specified.
    
    amluto 2 days ago
    
    How far can the pair of conversions be pushed? Will this work:
    (int*)(f((uintptr_t)p))
    Does it matter if f is inline?
    Could someone implement Rust’s with_addr as:
    (int*)((uintptr_t)p, addr)
    FWIW, I kind of like zptrtable, and I think Fil-C sounds awesome. And I’m impressed that you were able to port large code bases with as few changes as it seems to have taken.
    
    pizlonator 2 days ago
    
    Your first example will hilariously work if `f` is inline and simple enough and optimizations are turned on. I'm not sure I like that, so I might change that. I'd like to only guarantee the you get a capability in cases where that guarantee holds regardless of optimization (and I could achieve that with some more compiler hacking).
    Not sure about the semantics of with_addr. But note that you can do this in Fil-C:
    char* p = ...; uintptr_t i = ...; p -= (uintptr_t)p; // now p is NULL but still has its original capability p += i; // now p points to whatever i's address was, but has p's original capability
    I have a helper like that called `zmkptr` (it's just an inline function that does exactly the above).
    
    amluto 2 days ago
    
    with_addr is basically that, but with a name and some documentation:
    https://doc.rust-lang.org/std/primitive.pointer.html#method....
    As I understand it, Rust added this in part for experiments with CHERI but mostly for miri.
    Interestingly, the implementation of with_addr is very similar to your code.
    How do you handle cases where there are multiple possible sources of the capability? For example:
    int *a = …, *b = …; (int*)(((uintptr_t)a + (uintptr_t)b) << 2)
    I’m not sure I would allow this into any code I maintain, but still. There’s also the classic xor-list, and someone has probably done it like:
    struct node { struct node *p; }; struct node *next(struct node *here, struct node *prev) { return (struct node *)((uintptr_t)here ^ (uintptr_t)prev); }
    And this needs to either result in a compiler error or generate some kind of code.
    Rust’s with_addr wins points for being explicit and unambiguous. It obviously loses points for not being C. And Rust benefits here from all of this being in the standard library and from some of the widely-available tooling (miri) getting mad if code doesn’t use it. I can imagine a future Fil-Rust project doing essentially the same thing as Fil-C except starting with Rust code. It might be interesting to see how the GC part would interact with the rest of the language.
    
    pizlonator 2 days ago
    
    My compiler analysis says that if you have two possible pointers that a capability might come from, like in your first example, then you get no capability at all. I think that's a better semantics than picking some capability at random.
    If you want to be explicit about where the capability comes from, use `zmkptr`.
  - lmm 2 days ago
    
    > Here's the difference with Fil-C: It's totally memory safe and fanatically compatible with C/C++, no B.S. The other approaches tend to either create a different language (not compatible), or tend to create a superset of C/C++ with a safe subset (not totally memory safe, only compatible in the sense that unsafe code continues to be unsafe), or just don't fully square the circle (for example SoftBound could have gotten to full compatibility and safety, but the implementation just didn't).
    That's exactly the kind of thing that the boosters of all those previous efforts said. But somehow it never quite worked out.
    
    achierius 2 days ago
    
    > That's exactly the kind of thing that the boosters of all those previous efforts said
    I don't think this is true. - D, Swift, Rust, Zig: different languages, and while they do have FFI, using it means you're only as safe as your C code - CHERI: requires hardware support to be practical - Checked C, CCured, ?SAFECode IIRC?: too expensive - AddrSan@runtime, ARM MTE, SoftBound: mitigations with too many holes
    I don't know of many (to be honest, can't think of any) other serious attempts at making a system that tries to cover all three of
    A) lets you write normal C
    B) covers all the gaps
    C) doesn't kill performance
    
    pizlonator 2 days ago
    
    Well I've got 2/3 so far!
    Maybe 3/3 depending on your workload and definition of "killing performance". It's less than 2x slower for some stuff.
    The good news is Fil-C is getting faster all the time, and there are still so many obvious optimizations that I haven't gotten around to.
    
    pjmlp 2 days ago
    
    Solaris SPARC ADI.
    https://docs.oracle.com/en/operating-systems/solaris/oracle-...
    
    Nathanba 2 days ago
    
    I don't get it, what's the catch? Why isn't everyone using Fil-C immediately everywhere?
    
    throwaway4655 a day ago
    
    Awful performance. Usually 2x worse than C and 4x worse in the worst case. Given the comment by Fil-C's creator minimizing the performance issue [0], I wouldn't get my hopes up.
    [0] https://news.ycombinator.com/item?id=43190938
    
    pizlonator a day ago
    
    I guess you missed the point of that post.
    I’ll summarize: language implementations get faster over time. Young ones tend to be slow. Fil-C is a young implementation that still has lots of unoptimized things. Also, Fil-C being 2x slower than C means it’s already faster than many safe languages. And, for a lot of C use cases perf doesn’t matter as much as the hype suggests.
    The fact that young implementations are slow is something that’s worth understanding even if you don’t care about fil-C. It suggests, for example, that if someone invents a new language and their initial implementation is slow, then you can’t use that fact to assume that it’ll be slow forever. I think that’s generally a useful lesson.
    I care about performance a lot and Fil-C has gotten about 100x faster since the first prototype. It’ll keep getting faster.
    
    pizlonator 2 days ago
    
    Lots of reasons.
    Here's one: even just switching from gcc or msvc to clang, in projects that really want to, takes years.
    Here's another one: the Fil-C compiler is really young, so it almost certainly still has bugs. Those compilers that folks actually use in anger tend to get qualified on ~billions of lines of code before anyone other than the compiler devs touches them. The Fil-C compiler is too young to have that level of qualification.
    So "immediately everywhere" isn't going to happen. At best it'll be "over a period of time and incrementally".
    
    pizlonator 2 days ago
    
    > That's exactly the kind of thing that the boosters of all those previous efforts said. But somehow it never quite worked out.
    No, they really didn't. Let's review some of the big cases.
    - SafeC: not based on a mainstream C compiler, so can't handle gcc/clang extensions (Fil-C can). Had no story for threads or shared memory (Fil-C does). Hence, not really compatible.
    - CCured: incompatible (cannot compile C code with it without making changes, or running their tool that tries to automate the changes - but even then, common C idioms like unions don't quite work). Didn't use a major C compiler,
    - SoftBound: not totally memory safe (no attempt to provide safety for linking or function calls). But at least it's highly compatible.
    I can list more examples. Fil-C is the first to get both compatibility and safety right.
    
    lmm 2 days ago
    
    > Fil-C is the first to get both compatibility and safety right.
    Has any impartial third party reached that conclusion? Because honestly the way I remember it everyone says this kind of thing when it's their own project, a lot of the people behind these previous efforts were just as confident as you are.
    
    pizlonator 2 days ago
    
    Not in any official capacity but it’s been looked at by other C compiler experts, other programming language experts, GC experts, and security experts. Folks who have looked at it deeply agree with those claims. And I hope they would have told me if they didn’t believe anything about my claims!
    
    jandrewrogers 2 days ago
    
    Also, it always had a material performance impact. People write C++, and to a lesser extent C, because they really, really care about performance. If they didn’t care about performance there are easier languages to use.
    
    pizlonator 2 days ago
    
    Talking about performance impact is missing the bigger picture of how languages become performant. "Really really care about performance" describes some C/C++ programmers, but definitely not all of them. Finally, Fil-C is already faster than a lot of other safe languages (definitely faster than TypeScript, yet lots of stuff ships in TypeScript).
    Language implementations get faster over time and young ones tend to be slow. The Fil-C implementation is young. So were all of the previous attempts at memory-safe C - usually an implementation that had years of at most a few person years of investment (because it was done in an academic setting). Young implementations tend to be slow because the optimization investment hasn't happened in anger. So, "past academic attempts were slow" is not a great reason to avoid investigating memory safe C.
    Performance focus is not the reason why all of the world's C/C++ code gets written. Maybe that's even a minority reason. Lots of stuff uses C/C++ because of reasons like:
    - It started out in C/C++ so it continues to be in C/C++. So many huge projects are in this boat.
    - You're critically relying on a library whose only bindings are in C/C++, or the C/C++ bindings are the most mature, or the most easy to use.
    - You're doing low-level systems stuff, and having pointers that you can pass to syscalls is a core part of your logic.
    - You want to play nice with the dynamic linking situation on the OS you're targeting. (C/C++ get dynamic linking right in a way other languages don't.)
    I'd guess less than half of the C/C++ code that's being written today is being written because the programmer was thinking "oh man, this'll be too slow in any other language".
    Finally, Fil-C is already faster than a lot of memory safe languages. It's just not as fast as Yolo-C, but I don't think you can safely bet that this will be true forever.
musicale a day ago

> Fil-C is faster than CHERI
Except for those GC pauses...
- pizlonator 19 hours ago
  
  No GC pauses. Fil-C uses a concurrent GC.
  For CHERI to be fully safe, it basically needs a GC. They just call it something else. They need it to clean up capabilities to things that were freed, which is the same thing that Fil-C uses GC for.

thrwy18237111 2 days ago

How about incentives to write safe code even in C? They do not exist.

You are not rewarded for:

1) Formal proofs or careful programming. No one cares if a piece of software works quietly for years.

2) Preventing others from ruining a working piece of software. To the contrary, you will be called a gatekeeper and worse things.

You are rewarded for:

1) Wild ideas, quickly and badly implemented with the proper amount of marketing.

2) Churn, "social" coding, and LGTM.

3) Ironically, if you are a security researcher, finding exploits can help, too. As above, preventing exploits in the first place is regarded as a waste of time.

All of the above is true at Google. But of course they have a technical solution to a social problem. Which might catch one category of bugs at best.

BirAdam 3 days ago

Being completely serious, people will use whatever works. If what works is written in C, people will use it. The average person seriously doesn't care what language a thing is written in. The average person cares that the software in question works. Despite being written in C, most software today works reasonably well. Is it perfect? No. Will the rusty equivalent be perfect on day 1? No.

wmf 2 days ago

Yeah, we know the market won't solve this. That's why people are talking about government standards.
- Nevermark 2 days ago
  
  I can't help but think that those lazy mathematicians might benefit from a congressional order to clean up that twin prime problem too.
  If memory safety was "just the right regulations" easy, it would have already been solved. Every competent developer loves getting things right.
  I can imagine a lot more "compliance" than success may be the result of any "progress" with that approach.
  The basic problem is challenging, but makes it hard-hard is the addition of a mountain of incidental complexity. Memory safety as a retrofit on languages, tools and code bases is a much bigger challenge than starting with something simple and memory safe, and then working back up to something with all the bells and whistles that mature tool ecosystems provide for squeaking out that last bit of efficiency. Programs get judged 100% on efficiency (how fast can you get this working? how fast does it run? how much is our energy/hardware/cloud bill?), and only 99% or so on safety.
  If the world decided it could get by on a big drop in software/computer performance for a few years while we restarted with safer/simpler tools, change would be quick. But the economics would favor every defector so much that ... that approach is completely unrealistic.
  It is going to get solved. The payoff is too high, and the pain is too great, for it not to. But not based on a concept of a plan or regulation.
  - lmm 2 days ago
    
    > If memory safety was "just the right regulations" easy, it would have already been solved.
    Memory safety is already a solved problem in regulated industries. It's not a hard problem as such. People just don't want to solve it and don't have any incentive to: companies aren't penalised for writing buggy software, and individual engineers are if anything rewarded for it.
    > Every competent developer loves getting things right.
    Unfortunately a lot of developers care more about being able to claim mastery of something hard. No-one gets cred for just writing the thing in Java and not worrying about memory issues, even though that's been a better technical choice for decades for the overwhelming majority of cases.
    
    lelanthran 2 days ago
    
    > Memory safety is already a solved problem in regulated industries. It's not a hard problem as such.
    It's not hard, no, but it is expensive, because those regulations have a battery of tests run by a thirdy party that you will pay money to each time you want to recertify.
    I've worked in two regulated industries; the recertification is the expensive part, not the memory errors.
    
    Nevermark a day ago
    
    > Memory safety is already a solved problem
    Most famously in Rust. Even there it takes work.
    The problem is a practical coding work efficiency (and quality) one. You are right that there are no intractable memory problems even in the unsafest least helpful languages.
    
    wakawaka28 2 days ago
    
    Regulated industries have overwhelmingly boring and expensive software compared to others. They do things like banning recursion and dynamic arrays lol. Memory safety in every aspect possible just isn't worth it for most applications. And the degree of memory safety that is worth it is a lot less than Rust developers seem to think, and the degree of memory safety granted by Rust is less than they think as well.
    
    lmm 2 days ago
    
    Memory safety isn't worth it as long as leaking all your users' data (and granting attackers control over their systems) doesn't cost much. As attacks get more sophisticated and software gets more important, the costs of memory unsafety go up.
    
    wakawaka28 2 days ago
    
    What you've said is true but I still think the problem is overblown, and solutions at the hardware level are disregarded in favor of dubious and more costly software rewrite solutions. If something like CHERI was common then it would automatically find most security-related memory usage bugs, and thus lead to existing software getting fixed for all hardware.
    
    lmm 2 days ago
    
    IME you can't reliably extract the intent from the C code, much less the binary, so you can't really fix these bugs without a human rewriting the source. The likes of CHERI might make exploitation harder, but it seems to me that ROP-style workarounds will always be possible, because fundamentally if the program is doing things that look like what it was meant to do then the hardware can never distinguish whether it's actually doing what it was meant to do or not. Even if you were able to come up with a system that ensured that standards-compliant C programs did not have memory bugs (which is already unlikely), that would still require a software rewrite approach in practice because all nontrivial C programs/libraries have latent undefined behaviour.
    
    gw2 2 days ago
    
    > IME you can't reliably extract the intent from the C code, much less the binary, so you can't really fix these bugs without a human rewriting the source.
    I am pretty sure that the parent is talking about hardware memory safety which doesn't require any "human rewriting the source".
    
    lmm 2 days ago
    
    > I am pretty sure that the parent is talking about hardware memory safety which doesn't require any "human rewriting the source".
    It does though. The hardware might catch an error (or an "error") and halt the program, but you still need a human to fix it.
    
    gw2 2 days ago
    
    > but you still need a human to fix it.
    The same thing can be said about a Rust vector OOB panic or any other bug in any safe language. Bugs happen which is why programmers are employed in the first place!
    
    lmm 2 days ago
    
    > The same thing can be said about a Rust vector OOB panic or any other bug in any safe language. Bugs happen which is why programmers are employed in the first place!
    Sure, the point is you're going to need the programmer either way, so "hardware security lets us detect the problem without rewriting the code" isn't really a compelling advantage for that approach.
    
    wakawaka28 2 days ago
    
    If a program halts, that is a narrow security issue that will not leak data. Humans need to fix bugs, but that is nothing new. A memory bug with such features would be hardly more significant than any other bug, and people would get better at fixing them over time because they would be easier to detect.
    
    lmm 2 days ago
    
    > If a program halts, that is a narrow security issue that will not leak data.
    Maybe. Depends what the fallback for the business that was using it is when that program doesn't run.
    > Humans need to fix bugs, but that is nothing new. A memory bug with such features would be hardly more significant than any other bug
    Perhaps. But it seems to me that the changes that you'd need to make to fix such a bug are much the same changes that you'd need to make to port the code to Rust or what have you, since ultimately in either case you have to prove that the memory access is correct. Indeed I'd argue that an approach that lets you find these bugs at compile time rather than run time has a distinct advantage.
    
    wakawaka28 a day ago
    
    >Perhaps. But it seems to me that the changes that you'd need to make to fix such a bug are much the same changes that you'd need to make to port the code to Rust or what have you, since ultimately in either case you have to prove that the memory access is correct.
    No, you wouldn't need to prove that the memory access is correct if you relied on hardware features. Or I should say, that proof will be mostly done by compiler and library writers who implement the low level stuff like array allocations. The net lines of code changed would definitely be less than a complete rewrite, and would not require rediscovery of specifications that normally has to happen in the course of a rewrite.
    >Indeed I'd argue that an approach that lets you find these bugs at compile time rather than run time has a distinct advantage.
    It is an advantage but it's not free. Every compilation takes longer in a more restrictive language. The benefits would rapidly diminish with the number of instances of the program that run tests, which is incidentally one metric that correlates positively with how significant bugs actually are. You could think of it as free unit tests, almost. The extra hardware does have a cost but that cost is WAAAY lower than the cost of a wholesale rewrite.
    
    lmm a day ago
    
    > No, you wouldn't need to prove that the memory access is correct if you relied on hardware features. Or I should say, that proof will be mostly done by compiler and library writers who implement the low level stuff like array allocations. The net lines of code changed would definitely be less than a complete rewrite, and would not require rediscovery of specifications that normally has to happen in the course of a rewrite.
    I don't see how the hardware features make this part any easier than a Rust-style borrow checker or avoid requiring the same rediscovery of specifications. Checking at runtime has some advantages (it means that if there are codepaths that are never actually run, you can skip getting those correct - although it's sometimes hard to tell the difference between a codepath that's never run and a codepath that's rarely run), but for every memory access that does happen, your compiler/runtime/hardware is answering the same question either way - "why is this memory access legitimate?" - and that's going to require the same amount of logic (and potentially involve arbitrarily complex aspects of the rest of the code) to answer in either setting.
    
    kazinator 2 days ago
    
    The human might say, sorry my C program is not compatible with your hardware memory safety device. I won't/can't fix that.
    
    wakawaka28 2 days ago
    
    That's possible but unlikely. I would be OK with requiring software bugs like that to be fixed, unless it can be explained away as impossible for some reason. We could almost certainly move toward requiring this kind of stuff to be fixed much more easily than we could do the commonly proposed "rewrite it in another language bro" path.
    
    kazinator 2 days ago
    
    There's no such thing as hardware memory safety, with absolutely no change to the semantics of the machine as seen by the compiled C program. There are going to be false positives.
    
    gw2 2 days ago
    
    > There are going to be false positives
    Of course, but compare it with rewriting it to a completely different language.
    
    wakawaka28 2 days ago
    
    There may be some cases where code would need to be adjusted or annotated to use CHERI well, but that has to be easier than translating to or interfacing with another language.
  - egypturnash 2 days ago
    
    How many modern apps are running inside a browser, one way or another? The world’s already taken that big drop on performance.
- grandempire 2 days ago
  
  When you can’t convince people it’s better, you need to force them to do it.
  - wakawaka28 2 days ago
    
    Did you forget a /s? It seems that if you can't convince a majority of programmers that your new language is good enough to learn, maybe it actually isn't as good as its proponents claim. It is likely the case that rewriting everything in a new language for marginally less bugs is a worse outcome than just dealing with the bugs.
    
    grandempire 2 days ago
    
    I agree. I don’t think we need a government computer language force. Terry is a prophet.
    
    wakawaka28 2 days ago
    
    Mind you, the government has tried this before with Ada. Not to knock Ada but let's just say that government would ruin everything and stifle the industry. Certainly, any new regulations about anything as broad as how memory is allowed to be managed is going to strangle the software industry.
- wakawaka28 2 days ago
  
  If this has to be forced, it probably isn't necessary or very beneficial. How much will it cost to conform to these "standards" versus not? Who stands to gain by making non-conformant software illegal? I think it is clearly far too expensive to rewrite all software and retrain all programmers to conform to arbitrary standards. Hardware solutions to improve memory safety already exist and may ultimately be the best way to achieve the goal.
  It seems to me that Rust programmers, unhappy with the pace of adoption of Rust, seek to make other languages illegal because they do things different from Rust.

anonzzzies 3 days ago

Use Rust for kernel/system programming, use Lisp/Go/Java/C# for backend, use Typescript+wasm for frontend. We have everything already.

tptacek 2 days ago

That doesn't address existing codebases. Neither the Linux kernel nor the Chromium project is going to replace all its memory-unsafe code, so there are design challenges that need to be solved that are more complicated than "these memory-safe languages are available for your problem domain".
- gw2 2 days ago
  
  What is your opinion on deploying C++ codebases with mitigations like CFI and bounds checking?
  Let us say I have a large C++ codebase which I am unwilling to rewrite in Rust. But I:
  * Enable STL bounds checking using appropriate flags (like `-DGLIBCXX_ASSERTIONS`).
  * Enable mitigations like CFI and shadow stacks.
  How much less safe is "C++ w/ mitigations" than Rust? How much of the "70% CVE" statistic is relevant to my codebase?
  - tptacek 2 days ago
    
    We recorded an episode (there's a transcript) about this exact issue:
    https://securitycryptographywhatever.com/2024/10/15/a-little...
    
    gw2 2 days ago
    
    With due respect, the blog you have linked looks like the average Rust marketing material. It does absolutely nothing to address my concerns. I did a `Ctrl-F` and found zero hits of any of the following terms:
    * CFI
    * isoheaps or type-stable allocators
    * Shadow stacks
    (There is just a single hit of "C++"...)
    Ignoring the appeal to authority, I have a hard time believing that incrementally rewriting my C++ code in Rust or just writing new code in Rust ("vulnerabilities exponentially decay" and all that) is going to give me more actual security than the mitigations stated above. Most, if not all, high-profile exploits stem from out-of-bounds accesses and type confusions, which these mitigations prevent at very low cost.
    Thanks for replying, though.
    
    tptacek 2 days ago
    
    If what you're interested in is an "everything must be Rust" vs. "everything must be C++" knock-down drag-out, I'm not interested.
    
    saagarjha 2 days ago
    
    They prevent but do not entirely mitigate.
    
    gw2 2 days ago
    
    I am not interested in adhering to some arbitrary purity standard (like "memory safety" in this case). Almost always, purity ideologies are both irrational and harmful. What I am actually interested is to prevent real problems like remote code execution and Heartbleed-esque leakage of private data and for this, mitigations like CFI, shadow stacks and bounds checking are enough.
    > They prevent but do not entirely mitigate.
    Ignoring the semantic difference between "prevent" and "mitigate", if at the end of the day, the security provided by the two different approaches are quite similar, I don't get the problem.
    If you have an example of a successful widespread exploit that would have happened even with these mitigations, please share.
    
    saagarjha 2 days ago
    
    They’re not enough. For example the field I work in (mobile exploits) continues to bypass CFI (PAC) via clever data-only attacks or abusing TOCTOU issues.
J_Shelby_J 2 days ago

Nah, ima use rust for all of that because I’m too lazy to manage multiple tech stacks.
bigtimesink 3 days ago

This, but I'm getting tired of people using Rust for things that really should be in C# or Java.
- laerus 3 days ago
  
  "really should be" ?
  "in C# or Java" ?
  where do you even base these claims? Do you know what C# and Java threads have that Rust doesn't? data races. And don't get me stated on the biggest paradigm failure that is OOP.
  - bigtimesink 3 days ago
    
    Projects I've seen at work. Projects posted on Hacker News. Data races aren't usually an issue for backend services, and modern Java/C# is multi-paradigm.
    
    lostmsu 2 days ago
    
    > Data races aren't usually an issue for backend services
    I beg to differ unless all your logic is in the database with strong isolation guarantees.
    Speaking of C# for backends that are using EF actively, I bet there are bugs in pretty much all of them caused by incorrect applications of optimistic concurrency.
    
    pdimitar 2 days ago
    
    [flagged]
  - legulere 3 days ago
    
    Both have a garbage collection though that lead to higher developer productivity than compared to rust's affine types.
    
    brink 3 days ago
    
    > higher developer productivity
    Where have you been the past 5 years? Rust developers are insanely productive.
    Can we put this myth to rest already? Rust being an "unproductive language" is thoroughly dis-proven.
    
    neonsunset 2 days ago
    
    There are domains where C# (and F#) productivity stems from similar reasons why writing a game in something that isn't Rust might be more productive without even sacrificing performance (or, at least, not to the drastic extent).
    I can give you an example:
    var number = 0; var delay = Task.Delay(1000); for (var i = 0; i < 10; i++) { Task.Run(() => { while (!delay.IsCompleted) { Interlocked.Increment(ref number); } }); } await delay;
    How would you write this idiomatically in Rust without using unsafe?
    To avoid misunderstanding, I think Rust is a great language and even if you are a C# developer who does not plan to actively use it, learning Rust is of great benefit still because it forces you to tackle the concepts that implicitly underpin C#/F# in an explicit way.
    
    whytevuhuni 2 days ago
    
    There's a few things here that make this hard in Rust:
    First, the main controller may panic and die, leaving all those tasks still running; while they run, they still access the two local variables, `number` and `delay`, which are now out of scope. My best understanding is that this doesn't result in undefined behavior in C#, but it's going to be some sort of crash with unpredictable results.
    I think the expectation is that tasks use all cores, so the tasks also have to be essentially Send + 'static, which kinda complicates everything in Rust. Some sort of scoped spawning would help, but that doesn't seem to be part of core Tokio.
    In C#, the number variable is a simple integer, and while updating it is done safely, there's nothing that forces the programmer to use Interlocked.Read or anything like that. So the value is going to be potentially stale. In Rust, it has to be declared atomic at the start.
    Despite the `await delay`, there's nothing that awaits the tasks to finish; that counter is going to continue incrementing for a while even after `await delay`, and if its value is fetched multiple times in the main task, it's going to give different results.
    In C#, the increment is done in Acquire-Release mode. Given nothing waits for tasks to complete, perhaps I'd be happy with Relaxed increments and reads.
    So in conclusion: I agree, but I think you're arguing against Async Rust, rather than Rust. If so, that's fair. It's pretty much universally agreed that Async Rust is difficult and not very ergonomic right now.
    On the other hand, I'm happy Rust forced me to go through the issues, and now I understand the potential pitfalls and performance implications a C#-like solution would have.
    Does this lead to the decision fatigue you mention in another sub-thread? It seems like it would, so I'll give you that.
    
    whytevuhuni 2 days ago
    
    For posterity, here's the Rust version I arrived at:
    let number = Arc::new(AtomicUsize::new(0)); let finished = Arc::new(AtomicBool::new(false)); let finished_clone = Arc::clone(&finished); let delay = task::spawn(async move { sleep(Duration::from_secs(1)).await; finished_clone.store(true, Ordering::Release); }); for _ in 0..10 { let number_clone = Arc::clone(&number); let finished_clone = Arc::clone(&finished); task::spawn(async move { while !finished_clone.load(Ordering::Acquire) { number_clone.fetch_add(1, Ordering::SeqCst); task::yield_now().await; } }); } delay.await.unwrap();
    https://play.rust-lang.org/?version=stable&mode=debug&editio...
    
    anarki8 19 hours ago
    
    I am not sure what are you trying to represent with this example, but here is the exact same thing without any unsafe:
    use rayon::prelude::*;
    use std::time::{Instant, Duration};
    use std::sync::atomic::{AtomicUsize, Ordering};
    fn main() {
    let finish_at = Instant::now() + Duration::from_secs(1); let number = AtomicUsize::new(0); (0..10).into_par_iter().for_each(|_| { while finish_at > Instant::now() { number.fetch_add(1, Ordering::Relaxed); } });
    }
    You can make it even simpler if you would use Mutex instead of atomics. Atomics are more performant though.
    
    pdimitar 2 days ago
    
    > How would you write this idiomatically in Rust without using unsafe?
    Channels and selects. It's trivial.
    
    neonsunset 2 days ago
    
    Please post a snippet.
    
    brink 2 days ago
    
    use std::{ sync::{ Arc, atomic::{AtomicBool, AtomicUsize, Ordering}, }, time::Duration, };
    fn main() { let num = Arc::new(AtomicUsize::new(0)); let finished = Arc::new(AtomicBool::new(false));
    for _ in 0..10 { std::thread::spawn({ let num = num.clone(); let finished = finished.clone(); move || { while !finished.load(Ordering::SeqCst) { num.fetch_add(1, Ordering::SeqCst); } } }); } std::thread::sleep(Duration::from_millis(1000)); finished.store(true, Ordering::SeqCst); }
    
    neonsunset 2 days ago
    
    What if we want to avoid explicitly spawning threads and blocking the current one every time we do this? Task.Run does not create a new thread besides those that are already in the threadpool (which can auto-scale, sure, but you get the idea, assuming the use of Tokio here).
    
    brink 2 days ago
    
    What you're asking for is thread parking. Use tokio for that, it's still trivial.
    
    neonsunset 2 days ago
    
    I was implying that yes, while it is doable, it comes at 5x cognitive cost because of micromanagement it requires. This is somewhat doctored example but the "decision fatigue" that comes with writing Rust is very real. You write C# code, like in the example above, quickly without having to ponder on how you should approach it and move on to other parts of the application while in Rust there's a good chance you will be forced to deal with it in a much stricter way. It's less so of an issue in regular code but the moment you touch async - something that .NET's task and state machine abstractions solve on your behalf you will be forced to deal with by hand. This is, obviously, a tradeoff. There is no way for .NET to use async to implement bare metal cooperative multi-tasking, while it is very real and highly impressive ability of Rust. But you don't always need that, and C# offers an ability to compete with Rust and C++ in performance in critical paths when you need to sit down and optimize it unmatched by other languages of "similar" class (e.g. Java, Go). At the end of the day, both languages have domains they are strong at. C# suffers from design decisions that it cannot walk back and subpar developer culture (and poor program architecture preferences), Rust suffers from being abrasive in some scenarios and overly ceremonious in others. But other than that both provide excellent sets of tradeoffs. In 2025, we're spoiled with choice when it comes to performant memory-safe programming languages.
    
    Maxatar 2 days ago
    
    To be honest this sounds like something someone inexperienced would do in any language.
    If you're not comfortable in a language, then sure you ponder and pontificate and wonder about what the right approach is, but if you're experienced and familiar then you just do it plain and simple.
    What you're describing is not at all a language issue, it's an issue of familiarity and competency.
    
    brink 2 days ago
    
    It's literally not 5x the cost, it would take me 3 minutes to whip up a tokio example. I've done both. I like C# too, I totally understand why you like it so much. This is not a C# vs Rust argument for me. All I'm saying is that Rust is a productive language.
    
    pdimitar 2 days ago
    
    Rust is manual by design because people need to micro-manage resources. If you are experienced in it, it still takes a very little time to code your scenario.
    Obviously if you don't like the manual-ness of Rust, just use something else. For what you described I'd reach for Elixir or Golang.
    I was disagreeing with you that it's not easy or too difficult. Rust just takes a bit of effort and ramping up to get good at. I hated it as well to begin with.
    But again -- niches. Rust serves its niche extremely well. For other niches there are other languages and runtimes.
    
    neonsunset a day ago
    
    > Elixir or Golang
    That would be an incredible downgrade.
    
    pdimitar a day ago
    
    Downgrade compared to what?
    One-liners work best in comedy, dude.
    
    exceptione 20 hours ago
    
    - I recommend reading the comment history of @neonsunset. He has shared quite some insights, snippets and benchmarks to make the case that if you do not need the absolute bare metal control C or Rust provides, you are better of with either .net or Java.
    - Whereas in .net you have the best native interop imaginable for a high level language with a vast SDK. I understood that Java has improved on JNI, but I am not sure how well that compares.
    - Programming languages are like a religion, highly inflammable, so I can imagine you would not be swayed by some rando on the internet. I would already be happy if you choose Go over Python, as with the former you win some type safety (but still have a weak type system) and have a good package manager and deployment story.
    - Go was designed for Google, to prevent their college grads from implementing bad abstractions. But good abstractions are valuable. A weak type system isn't a great idea (opinion, but reasonable opinion). Back then .net was not really open source (I believe) and not as slim and fast as it is now, and even then, I think Google wants to have control about their own language for their own internal needs.
    - Therefore, if you are not Google, Go should likely not be your top pick. Limited area of application, regrettable decisions, tailored for Google.
    
    pdimitar 17 hours ago
    
    ---
    (Sorry for a reply all over the place.)
    ---
    > if you do not need the absolute bare metal control C or Rust provides, you are better of with either .net or Java.
    That, like your next point, is a relatively fair statement but it's prone to filter bubble bias as I am sure you are aware. I for example have extracted much more value out of Golang... and I had 9 years with Java, back in the EJB years.
    Java and .NET are both VM-based and have noticeable startup time. As such, they are best suited for servers and not for tooling. Golang and Rust (and Zig, and D, V and many other compiled languages) are much better in those areas.
    > Programming languages are like a religion, highly inflammable, so I can imagine you would not be swayed by some rando on the internet
    Those years are long past me. I form my own opinions and I have enough experience to be able to make fairly accurate assessments with minimum information.
    > I would already be happy if you choose Go over Python, as with the former you win some type safety (but still have a weak type system) and have a good package manager and deployment story.
    I do exactly that. In fact I am a prominent Python hater. Its fans have gone to admirable heights in their striving to fill the gaps but I wonder will they one day realize this is unnecessary and just go where those gaps don't exist. Maybe never?
    And yeah I use Golang for my own purposes. Even thinking of authoring my own bespoke sync and backup solution stepping on Syncthing, GIT and SSH+rsync and package it in Golang. Shell scripts become unpredictable from one scale and on.
    > Go was designed for Google, to prevent their college grads from implementing bad abstractions. But good abstractions are valuable. A weak type system isn't a great idea (opinion, but reasonable opinion). Back then .net was not really open source (I believe) and not as slim and fast as it is now, and even then, I think Google wants to have control about their own language for their own internal needs.
    That and your next point I fully agree with. That being said, Golang is good enough and I believe many of us here preach "don't let perfect be the enemy of good". And Golang symbolizes exactly that IMO; it's good enough but when your requirements start shooting up then it becomes mediocre. I have stumbled upon Golang's limitations, fairly quickly sadly, that's why I am confining it to certain kinds of projects only (and personal tinkering).
    > I recommend reading the comment history of @neonsunset.
    I don't mind doing that (per se) but I find appeals to authority and credentialism a bit irritating, I admit.
    Plus, his reply was in my eyes a fairly low-effort snark.
    
    exceptione 15 hours ago
    
    (I would note that EJB is something from the past. Like .net has also really grown.)
    > that's why I am confining it to certain kinds of projects only (and personal tinkering).
    You have a fair view of Go, I think. I could see that it makes sense to use it as a replacement for bash scripts, especially if you know the language well. Personally I am wanting to dive into using F# for my shell scripting needs. The language leans well into those kind of applications with the pipe operator.
    If you ever have the appetite, you should take a look at it as it can be run in interpreted/REPL mode too, which is a nice bonus for quick one-of scripts.
    https://blog.lucca.io/2022/05/19/fsharp-script
    
    neonsunset 16 hours ago
    
    > Java and .NET are both VM-based and have noticeable startup time. As such, they are best suited for servers and not for tooling. Golang and Rust (and Zig, and D, V and many other compiled languages) are much better in those areas.
    For JIT-based deployments, it is measured in 100-500ms depending on the size of application, sometimes below. .NET has first-party support for NativeAOT deployment mode for a variety workloads: web servers, CLI tools, GUI applications and more.
    Go is a VM-based language, where VM provides facilities such as virtual threading with goroutines (which is higher level of abstraction than .NET's execution model), GC, reflection, special handling for FFI. Identical to what .NET does. I don't think the cost and performance profile of BEAM needs additional commentary :)
    Go also has weaker GC and compiler implementations and, on optimized code, cannot reach the performance grade of C++ and Rust, something C# can do.
    > Those years are long past me. I form my own opinions and I have enough experience to be able to make fairly accurate assessments with minimum information.
    The comments under your profile seem to suggest the opposite. Perhaps "minimum information" is impeding fair judgement?
    > I don't mind doing that (per se) but I find appeals to authority and credentialism a bit irritating, I admit.
    Is there a comment you have in mind which you think is engaging in credentialism?
    
    pdimitar 16 hours ago
    
    > Is there a comment you have in mind which you think is engaging in credentialism?
    The other guy who told me to inspect your profile. Not you.
    > The comments under your profile seem to suggest the opposite.
    Sigh. You seem to act in bad faith which immediately loses my interest.
    You'll have to take your high and mighty attitude to somebody else. Seems like even HN is not immune from... certain tropes, shall we call them, generously.
    Disengaging.
    
    homebrewer 2 days ago
    
    Even when it's used by mediocre developers, which is probably more than 90% of us, myself very much included? All I've been seeing is Rust being used by very enthusiastic and/or talented developers, who will be productive in any language.
    
    lmm 2 days ago
    
    > Rust developers are insanely productive.
    If your baseline is a language that is missing some features that were in standard ML, sure. If you were already using OCaml or F#, Rust doesn't make you any more productive. If you were already using Haskell or Scala, Rust's lack of HKT will actively slow you down.
    
    hello_moto 2 days ago
    
    Rust is the silver bullet we all been waiting for?
    
    ramon156 2 days ago
    
    No, that's the other end of the stick. Its en par with other languages
    
    hello_moto 2 days ago
    
    Well, put any language against Rust and Rustaceans would argue Rust is better than those languages so ... Silver bullet no?
    
    chasd00 2 days ago
    
    If the word “Rustaceans” is actually in common use then rust loses by default.
cantalopes 2 days ago

"Use lisp for backend" lol
- worthless-trash 2 days ago
  
  Its easier and saner than you think.
  No massive churn, quite performant and the code i wrote 20 years ago runs without modification.
  Can other blub languages claim this?
- pjmlp 2 days ago
  
  It worked for Yahoo early days, SISCOG, ITA Software,...
  Even a site called HN if you happen to know it.
- NoGravitas 2 days ago
  
  I do this for my personal hobby projects, but that's as much to deter use by technology enthusiasts as anything.
- anonzzzies 2 days ago
  
  Works well, got problems with it (outside pure ignorance), pick something else; same difference to me. lol.
pdimitar 2 days ago

Nope, Elixir for backend.
We need the BEAM VM's guarantees, not yesterday but like 20 years ago, everywhere. The language itself does not matter. But we need that runtime's guarantees!
- teeray 2 days ago
  
  What is it specifically about the BEAM VM that positions it above, say, Go on K8S?
  - BeFlatXIII 2 days ago
    
    Not having to learn K8s
  - pdimitar 2 days ago
    
    Ability to manage tens of thousands of stateful connections without the 95th percentile requests jumping to 5 seconds.
    Just to start with.
- Sytten 2 days ago
  
  I get the appeal, but without strong typing it's a no-go in my book. Get me an elixir with proper types and we can talk.
  - Nilithus 2 days ago
    
    I believe they have already started this effort. https://elixir-lang.org/blog/2023/06/22/type-system-updates-...
    
    pdimitar 2 days ago
    
    They have but it's mostly a labor of love and it's very difficult to fit a static type system into a dynamically typed language.
    We already have some false positives. Happily the team is very motivated and is grinding away at them, for which we the community are forever grateful.
  - pdimitar 2 days ago
    
    Oh I agree. The Erlang/Elixir ecosystem is in danger of Rust inventing a BEAM-like runtime and making it irrelevant.
- dudus 2 days ago
  
  Elixir is Lisp with sprinkles on top
  - BeFlatXIII 2 days ago
    
    Lisp for people who hate parentheses.
    
    pdimitar 2 days ago
    
    No, it's LISP for people who understand multicore CPUs exist for a long time now.
    Modern LISP dialect authors still believe threads are a super clever idea which is just... /facepalm.

hinkley 2 days ago

I worked on a provenance system which would be so completely the wrong solution to this problem that I only bring it up because the 100,000 foot view is still relevant.

I think we are eventually going to end up with some sort of tagged memory with what this is for (such as credentials) and rules about who is allowed to touch it and where it's allowed to go. Instead of writing yet another tool that won't let fields called "password" or "token" or "key" be printed into the logs, but misses "pk", it's going to be no printing any memory block in this arena, period.

I also think we aren't doing enough basic things with backend systems like keeping just a user ID in the main database schema and putting all of the PII in a completely different cluster of machines, that has 1/10 to 1/100th of the sudoer entries on it of any other service in the company. I know these systems are out there, my complaint is we should be talking about them all the time, to push the Recency Effect and/or Primacy Effect hard.

perlgeek 2 days ago

Perl has had a limited data tagging system for decades now, called "taint checking".
If enabled (through a command line switch), all data coming in from the outside (sockets, STDIN etc.) are "tainted", and if you e.g. concatenate a non-tainted and a tainted string, it becomes tainted. Certain sensitive operations, like system() calls or open(), raise an error when used with tainted data.
If you match tainted data with a regex, the match groups are automatically untainted.
It's not perfect, but it demonstrates that such data tagging is possible, and quite feasible if integrated early enough in the language.
- hinkley 2 days ago
  
  Rails has a “safe” attribute that only works for html output, and doesn’t work right for urls (a bug that somehow became my responsibility to fix many times). It’s a limited version of the same thing and I believe Elixir has the same design, and I’ve already seen a reproduction of the Rails flaw in Elixir.
  But they are Boolean values and they need to be an enumeration or likely a bitfield. Even just for web I’ve already identified four in this thread. HTML unsafe, url unsafe, PII unsafe, credentials unsafe. I hesitate to add SQL unsafe because the only solution to sql injection is NO STRING CONCATENATION. But so many SQL libraries use concatenation even for prepared statements that maybe it should be. Only allow string constants for sql queries.
Etheryte 2 days ago

While I agree with you as a matter of an ideal, the step from one database to two is infinitely larger than from two to more. Given budget, time and engineering constraints, sticking everything in one database is by far the sanest solution for the vast majority of code out there.
- hinkley 2 days ago
  
  I think microservices kind of break that wall with a herd of stampeding elephants being chased by even angrier bees.
  If you're still using one database in 2025, even if you wouldn't touch microservices with a ten foot pole, then you've got some problems.
  OLAP, KV, cache heirarchies, you aren't running a singular database, except maybe for a standalone app.
pjmlp 2 days ago

Solaris SPARC ADI has had tagged memory for quite some time now.

neilv 3 days ago

Summary: We know, we know, but don't make us rewrite everything in Rust.

0xbadcafebee 3 days ago

This post seems a lot more informative to me: "It Is Time to Standardize Principles and Practices for Software Memory Safety" (https://cacm.acm.org/opinion/it-is-time-to-standardize-princ...)

I am 100% in favor of industry standards to enforce safety. It should go way past just memory safety, though. Engineering standards should include practices and minimum requirements to prevent safety issues as a whole.

dataflow 2 days ago

The key to progress in a lot of cases is to do it incrementally. If you make something too hard to chew, people won't bite.

NoGravitas 2 days ago

Programmers will invent new languages and demand new hardware architectures rather than ~~go to therapy~~ use a garbage collector.

omoikane 2 days ago

https://news.ycombinator.com/item?id=42962020 - It is time to standardize principles and practices for software memory safety (2025-02-06, 100 comments)

binkHN 2 days ago

Happy to see the mention of Kotlin's memory safety features here; goes a bit beyond Java with its null safety, encouragement of immutability and smart casting.

ammar2 2 days ago

I was actually a little surprised to see that in there, I wouldn't really consider those features to be "memory safety" as I traditionally see it over Java.
They don't really lead to exploitable conditions, except maybe DoS if you have poor error handling and your application dies with null pointer exceptions.
flykespice 2 days ago

Dont forget being very strict typed too.

WalterBright 3 days ago

The most common memory safety bug in released software is array overflow. This is easily corrected in C by adding a small extension:

https://www.digitalmars.com/articles/C-biggest-mistake.html

jcranmer 2 days ago

> This is easily corrected in C by adding a small extension
Unfortunately, easily corrected it is not. Yes, probably >95% of arrays have a nearby easily-accessible length parameter that indicates their maximum legal length (excluding the security disaster of null-terminate strings). But the problem is there's no consistent way that people do this. Sometimes people put pointer first and size second, sometimes it's the other way around. Sometimes the size is a size_t, sometimes an unsigned, sometimes an int. Or sometimes it's not a pointer-and-size, but pointer and one-past-the-end pointer pair. Sometimes multiple arrays share the same size parameter.
So instead of an easy solution getting you 90% for effectively free, you get like 30% with the easy solution, and have to make it more complicated to handle the existing diversity to push it back up to that 90%.
- WalterBright 2 days ago
  
  This extension was added in D at its start, and 25 years of experience with it shows that it is possibly D's best loved feature. It's a huge win.
EE84M3i 2 days ago

> The most common memory safety bug in released software is array overflow.
Do you have a source for this? I thought it was use after free.
- dcsommer 2 days ago
  
  CWE Top 25: https://cwe.mitre.org/top25/archive/2024/2024_cwe_top25.html
  Out of bounds write and read are more prevalent then UAF. There are multiple types of bugs that can produce OOB read or write though.
- WalterBright 2 days ago
  
  Not offhand, but every list of the common security bugs shows it's the top, by a wide margin.
- tptacek 2 days ago
  
  That (or, more broadly, memory lifecycle bugs) would be my guess too.
pjmlp 2 days ago

Which WG14 keeps refusing to do, regardless of how often this is pointed out.
- WalterBright 2 days ago
  
  I know, which was the motivation for D.
Gibbon1 3 days ago

They could fix typeof to make it something useful. Like the ability to take the type of something and pass it around.
And add slice and buffer typedefs to the standard library. Especially since they added counted_by to the language.
Add a way to define a slice the points to a c string.
ajb 2 days ago

Yeah, but looking back at the attempts to extend C a bit for safety, the only one that seems to have got market traction is MISRA and even then it's pretty limited. D, rust, zig etc all seem to have much more buy in. There must be some reason why a new language works better here -I mean, you're basing your business off D, not a C extension, right?
- WalterBright 2 days ago
  
  This idea is just adapting the D version of it.
  - ajb 15 hours ago
    
    That's.. kind of my point. This mechanism has seen more adoption in a new language than in the existing language. I'm sure it would work technically - there must be some other reason why it's easier to get a new language adopted .
lmm 2 days ago

> This is easily corrected in C by adding a small extension
It's so easy that thousands of developers trying for 40+ years haven't been able to do it yet.
- WalterBright 2 days ago
  
  They don't see the value in it. You have to use it a while to see how much time it saves you not chasing down memory corruption bugs.

sixthDot 2 days ago

while I agree on the idea I clearly see that instead of proposing solutions, a bag of programming languages are suggested.

booleandilemma 2 days ago

No mention of Carbon, I see.

pjmlp 2 days ago

Carbon naysayers keep forgeting to read the part it is an experimental language.
- ChocolateGod 2 days ago
  
  It's not as if an experimental label would keep Google from deploying something. We all know their lack of software testing.
  - pjmlp 2 days ago
    
    It only got an initial backend like a couple of months ago.
    The site clearly mentions folks to use something else, if they want to write safe code today.

dmitrygr 3 days ago

  > we suggest establishing a common framework

"If you want to kill an idea, start a committee to study it, or a framework to contain and guide it"

0xbadcafebee 3 days ago

How many security holes are caused by not sanitizing inputs, as opposed to memory safety? It feels like not sanitizing inputs is what enables memory safety exploits, in addition to many other classes of security hole, yet nobody seems to talk about it.

- Buffer overflow: somebody didn't sanitize the input (length of buffer).

- Stack smashing: somebody didn't sanitize the input (length of input).

- Format string vulnerability: somebody didn't sanitize the format string data.

- Integer converstion vulnerability: somebody didn't sanitize the integer input.

- SQL injection: somebody didn't sanitize the input before querying a database.

- Cross-site scripting: somebody didn't sanitize input and it got submitted and executed on behalf of the user.

- Remote file inclusion / Directory traversal: somebody didn't sanitize an input variable leading to a file path.

...and on, and on, and on. If people were as obsessed with input sanitization as they are with memory, I'll bet you a much larger percentage of attacks would be stopped. Too bad input sanitization isn't sexy.

UncleMeat 3 days ago

SQL Injection and XSS are actually great examples of vuln classes where the winning strategy is safe APIs rather than diligent sanitization. "Just sanitize all your user inputs" is hard to do correctly because it is difficult to create automatic rules that detect every single possible violation.
Prepared statements and safe HTML construction APIs plus some linters that scream at you when you use the unsafe APIs works like magic.
- bflesch 3 days ago
  
  You're correct. It's about distinction between code and data.
  You should simply discern between HTML elements (code) and HTML text nodes (data). Same with prepared statements: Clear distinction between SQL code vs SQL data.
  You just need to ensure that your data is never interpreted as code.
  - 0xbadcafebee 2 days ago
    
    > You just need to ensure that your data is never interpreted as code.
    That's sanitization. Many different languages implement this. The old-school method is "tainting" data so it can't be used as part of execution without an explicit function call to "untaint" it. Same is used for "secret" data in various programs where you don't want it leaked.
crazygringo 2 days ago

> - SQL injection: somebody didn't sanitize the input before querying a database.
> - Cross-site scripting: somebody didn't sanitize input and it got submitted and executed on behalf of the user.
To be technical about it, this is generally a failure of escaping rather than sanitizing.
You're supposed to be able to put anything into a database field and not have it affect the query. You ought to be able to paste JavaScript into a field and have it be displayed as JavaScript, not executed. The inputs remain as they are -- no sanitization -- they just have to be escaped properly.
That being said, I'm 100% on board about the importance of sanitation/validation. To the extent I think it ought to be part of the design of languages just like types. I.e. if a function parameter is only allowed to be three string values, or a string between 0 and 10 bytes, or a string limited to lowercase ASCII, these should be expressable.
GuB-42 2 days ago

Sanitizing inputs in important in addition to memory safety.
Sanitizing inputs won't protect you against all bugs. For example, you may store a string in a 256-byte buffer, so you check that the string is no longer than 256 characters, but you forget the zero-terminator, and you have a buffer overflow. Or maybe you properly limited the string to 255 characters, but along the way, you added support for multibyte characters, and you get another buffer overflow.
Bound checking would have caught that.
Injection can happen at any point. You may sanitize user input to avoid SQL injection, but at some later point it may get out of the database and in a format string, but it was sanitized for SQL, not for format strings, leading to a potential injection.
lmm 2 days ago

"Sanitising" doesn't work. It's an exploit mitigation strategy, not a sound way of actually preventing bugs. And it doesn't prevent many of the vulnerabilities you list, because many of the things that cause issues don't come from "input" at all (e.g. a lot of buffer overflows can be triggered with "legitimate" input that isn't and couldn't be caught by input sanitisation).
- 0xbadcafebee 2 days ago
  
  > It's an exploit mitigation strategy, not a sound way of actually preventing bugs.
  Actually it is a simple and effective way to prevent general bugs.
  If you have an input field called birthday, you can inject it directly into your database. Doing that could cause an SQL injection exploit, so people use prepared statements.
  But even if you use prepared statements, you'll still end up with a database column with all kinds of birthday formatting (M-D-Y, Y-M-D, slashes, spaces, colons, etc etc). These different, non-standard formats will eventually cause a bug.
  Input sanitization forces you to standardize on one birthday format, and inject it into your database in one format (let's say "YYYY-MM-DD", no other characters allowed). Then all the code expects - and gets - one format. This reduces bugs from unexpected formats.
  It has the added side-effect of also eliminating the SQL injection bug, regardless of prepared statement.
  - lmm a day ago
    
    > Input sanitization forces you to standardize on one birthday format, and inject it into your database in one format (let's say "YYYY-MM-DD", no other characters allowed). Then all the code expects - and gets - one format.
    That's not sanitisation as the word is normally used. That's parsing and canonicalisation. That's a good path to actual security - it leads you to the "make invalid states unrepresentable" style and using decent type systems.
wepple 2 days ago

Your distinction between stack smashing and buffer overflow is puzzling
- 0xbadcafebee 2 days ago
  
  A buffer overflow is just exceeding the size of a buffer, and stack smashing is a modification of either the stack itself or the stack pointer to result in different operations on the stack. These methods can coincide, and they can also exist independently of each other
  - wepple a day ago
    
    Yep. Hence my comment. Smashing the stack is an afterward condition of some kind of memory error. If you’re going to do that, why not mention heap overflow and others? It just seems unusual.
wepple 2 days ago

Two reasons
- modern languages prevent you from having to think about it at all. You shouldn’t have to sanitize, it should work
- there are a load of issues that have nothing to do with sanitization but everything to do with memory. Race conditions, type confusion, UAF to name a couple. If we focus on sanitization, we still need memory safety to fix those
Also sanitization is non-trivial for complex inputs

westurner 2 days ago

> Looking forward, we're also seeing exciting and promising developments in hardware. Technologies like ARM's Memory Tagging Extension (MTE) and the Capability Hardware Enhanced RISC Instructions (CHERI) architecture offer a complementary defense, particularly for existing code.

IIRC there's some way that a Python C extension can accidentally disable the NX bit for the whole process.. https://news.ycombinator.com/item?id=40474510#40486181 :

>>> IIRC, with CPython the NX bit doesn't work when any imported C extension has nested functions / trampolines

>> How should CPython support the mseal() syscall? [which was merged in Linux kernel 6.10]

> We are collaborating with industry and academic partners to develop potential standards, and our joint authorship of the recent CACM call-to-action marks an important first step in this process. In addition, as outlined in our Secure by Design whitepaper and in our memory safety strategy, we are deeply committed to building security into the foundation of our products and services.

> That's why we're also investing in techniques to improve the safety of our existing C++ codebase by design, such as deploying hardened libc++.

Secureblue; https://github.com/secureblue/Trivalent has hardened_malloc.

Memory safety notes and Wikipedia concept URIs: https://news.ycombinator.com/item?id=33563857

...

A graded memory safety standard is one aspect of security.

> Tailor memory safety requirements based on need: The framework should establish different levels of safety assurance, akin to SLSA levels, recognizing that different applications have different security needs and cost constraints. Similarly, we likely need distinct guidance for developing new systems and improving existing codebases. For instance, we probably do not need every single piece of code to be formally proven. This allows for tailored security, ensuring appropriate levels of memory safety for various contexts.

> Enable objective assessment: The framework should define clear criteria and potentially metrics for assessing memory safety and compliance with a given level of assurance. The goal would be to objectively compare the memory safety assurance of different software components or systems, much like we assess energy efficiency today. This will move us beyond subjective claims and towards objective and comparable security properties across products.

hackburg 2 hours ago

[dead]

hackburg 2 days ago

[dead]

antithesis-nl 3 days ago

[flagged]

fooker 3 days ago

Humans too!

iFire 2 days ago

Why not mandate ecc ram?

summerlight 2 days ago

Does ECC do anything with memory safety? This is about physical errors while the article is talking about software bugs. Those two are almost orthogonal.
wmf 2 days ago

Lobbying from Intel probably.
- GuB-42 2 days ago
  
  Intel supports ECC, so does AMD, so why would they lobby against it? Intel uses it for market segmentation, but I don't think it is a big deal.
  It is just that those who build consumer-grade hardware don't want to spend 12% more on RAM for slightly less performance. Among them are essentially all ARM devices, including smartphones and Apple silicon Macs.
  - wmf 2 days ago
    
    Don't look at the specs for LPDDR6...

tailrecursion 2 days ago

The article in question is published on Google's blog. Has Google resolved memory safety issues in its C++ code base? Did G port their code base to Rust or some other memsafe language? What's preventing them from doing that by themselves?

What's preventing Microsoft, or Apple, or the coagulate Linux kernel team, or any other kernel team, from adopting memsafe technology or practice by themselves for themselves?

The last thing we need are what are evidently incompetent organizations that can't take care of their own products making standards, or useless academics making standards to try to force other people to follow rules because they know better than everyone else.

If the team that designed and implemented KeyKos, or that designed Erlang, were pushing for standardized definitions or levels of memory safety, it would be less ridiculous.

At the same time, consciousness of security issues and memory safety has been growing quickly, and memory safety in programming languages has literally exploded in importance. It's treated in every new PL I've seen.

Putting pressure on big companies to fix their awful products is fine. No pressure needs to be applied to the rest of the industry, because it's already outpacing all of the producing entities that are calling for standards.

benced 2 days ago

The idea that Google is "evidently incompetent" for failing to resolve memory safety issues in their decades-old, giant codebase is dumb.
- tailrecursion 2 days ago
  
  If Google has failed so far to resolve mem safety issues in their decades old giant code base, then I'd rather hear standardization ideas from someone who succeeded. If G succeeded at resolving those issues, then that's a concrete positive example for the rest of industry to consider following. They ought to lead by example.
  It seems like decades-old giant code bases are precisely the ones hardest to migrate to memory safety. That's where coercion and enforcement is needed most. You and I don't need to be told to start a new project in not-C++ do we? Nearly every trained programmer has been brainwashed (in a good way) with formal methods, type systems, bounds checking, and security concerns. Now those same people who champion this stuff say it isn't enough, and therefore we need to do more of the same but with coercion. That's a failure to understand the problem.
  - Ukv 2 days ago
    
    > If Google has failed so far to resolve mem safety issues in their decades old giant code base, then I'd rather hear standardization ideas from someone who succeeded. If G succeeded at resolving those issues, then that's a concrete positive example for the rest of industry to consider following. They ought to lead by example.
    Google saw "the percentage of memory safety vulnerabilities in Android dropped from 76% to 24% over 6 years as development shifted to memory safe languages" - which I'd say is a positive example.
    It's not that they've already fully succeeded (I don't think anyone has on codebases of this size), but neither is it that they tried and failed - it's an ongoing effort.
    > You and I don't need to be told to start a new project in not-C++ do we?
    Don't need to be told because we all already avoid C++, or don't need to be told because it doesn't really matter if we do use C++?
    I'd disagree with both. There are still many new projects (or new components of larger systems) being written in C++, and it's new code that tends to have the most vulnerabilities.