Hacker Newsnew | past | comments | ask | show | jobs | submit | makapuf's commentslogin

And people looking at code may well be as numerous as people looking at assembly. (Which I even like to do)

My car is typically used twice a week and (like many others) I mostly ride my bike or walk. I'm not special at all and I certainly use the car, but it has not replaced walking.

I don't think either of us disagree though that the number of miles of non-leisure journeys walked per capita is significantly less in 2026 than it was in 1926 or 1826 though?

I always found it pretty remarkable in David Copperfield when Dickens recounts regular walks between London and Canterbury, which he apparently did make in real life


Hence why one persons behavior is called an anecdote.

AFAIK, you can't explicitly allocate cache like you allocate RAM however. A bit like if you could only work on files and ram was used for cache. Maybe I am mistaken ? (Edit: typo)

You can't explicitly allocate cache, but you can lay things out in memory to minimize cache misses.

A fun fact for the people who like to go on rabbit holes. There is an x86 technique called cache-as-RAM (CAR) that allows you to explicitly allocate a range of memory to be stored directly in cache, avoiding the DRAM entirely.

CAR is often used in early boot before the DRAM is initialized. It works because the x86 disable cache bit actually only decouples the cache from the memory, but the CPU will still use the cache if you primed it with valid cache lines before setting the cache disable bit.

So the technique is to mark a particular range of memory as write-back cacheable, prime the cache with valid cache lines for the entire region, and then set the bit to decouple the cache from memory. Now every access to this memory region is a cache hit that doesn't write back to DRAM.

The one downside is that when CAR is on, any cache you don't allocate as memory is wasted. You could allocate only half the cache as RAM to a particular memory region, but the disable bit is global, so the other half would just sit idle.


Thanks – I was wondering how code that initializes DRAM actually runs

Out of curiosity, why has there not been a slight paradigm shift in modern system programming languages to expose more control over the caches?

Same as the failure of Itanium VLIW instructions: you don't actually want to force the decision of what is in the cache back to compile time, when the relevant information is better available at runtime.

Also, additional information on instructions costs instruction bandwidth and I-cache.


> you don't actually want to force the decision of what is in the cache back to compile time, when the relevant information is better available at runtime

That is very context-dependent. In high-performance code having explicit control over caches can be very beneficial. CUDA and similar give you that ability and it is used extensively.

Now, for general "I wrote some code and want the hardware to run it fast with little effort from my side", I agree that transparent caches are the way.


x86 provides this control with non-temporal load/store instructions.

that solves the pollution problem, but it doesn't pin cache lines. it also doesn't cover the case that ppc does where you want to assert a line is valid without actually fetching.

That seems correct, but it also doesn’t account for managed languages with runtimes like JavaScript or Java or .NET, which probably have a lot of interesting runtime info they could use to influence caching behavior. There’s an amount of “who caches the cacher” if you go down this path (who manages cache lines for the V8 native code that is in turn managing cache lines for jitted JavaScript code), but it still seems like there is opportunity there?

thats a strange statement. its certainly not black and white, but the compiler has explicit lifetime information, while the cache infrastructure is using heuristics. I worked on a project which supported region tags in the cache for compiler-directed allocation and it showed some decent gains (in simulation).

I guess this is one place where it seems possible to allow for compiler annotations without disabling the default heuristics so you could maybe get the best of both.


There are cache control instructions already. The reason why it goes no further than prefetch/invalidate hints is probably because exposing a fuller api on the chip level to control the cache would overcomplicate designs, not be backwards compatible/stable api. Treating the cache as ram would also require a controller, which then also needs to receive instructions, or the cpu has to suddenly manage the cache itself.

I can understand why they just decide to bake the cache algorithms into hardware, validate it and be done with it. Id love if a hardware engineer or more well-read fellow could chime in.


Another reason for doing cache algorithms in hardware is that cache access (especially for level 1 caches) has to be low latency to be useful.

Because programmers are in general worse at managing them than the basic LRU algorithm.

And because the abstraction is simple and easy enough to understand that when you do need close control, it's easy to achieve by just writing to the abstraction. Careful control of data layout and nontemporal instructions are almost always all you need.


There has! Intel has Cache Acceleration Technology, and I was very peripherally involved in reviewing research projects at Boston University into this. One that I remember was allowing the operating system to divide up cache and memory bandwidth for better prioritization.

https://www.intel.com/content/www/us/en/developer/articles/t...


This is not applicable to most programming scenarios since the cache gets trashed unpredictably during context switches (including the user-level task switches involved in cooperative async patterns). It's not a true scratchpad storage, and turning it into one would slow down context switches a lot since the scratchpad would be processor state. Maybe this can be revisited once even low-end computers have so many hardware cores/threads that context switches become so rare that the overhead is not a big deal. But we are very far from anything of the sort.

I would say this is the main benefit of cuda programming on gpu. You get to control local memory. Maybe nvidia will bring it to the cpu now that the make CPU’s

You can in CUDA. You can have shared memory which is basically L1 cache you have full control over. It's called shared memory because all threads within a block (which reside on a common SM) have fast access to it. The downside: you now have less regular L1 cache.

I understand it as "the branch we're purchasing/hiring from", not the inner part of the company.

Variable costs increase ? (Floor space rental, Energy, Salaries, licenses, other services ...)

My proposal would be to define a set of intents for 0-15 with sensible defaults and let terminal themes assign any color they would like to those. 0 would be background, 7 for foreground , 1 for highlight, 3 for titles, 4 for frames and from there work on backgrounds also..

We should define a set of base colors for terminal apps that are used for themes so that we have a common set of colors for all term apps. Text, background, borders, hilight, muted then let the terminal set its theme.

Infortunately, this is where free market stops being a good optimizer and manual settings (laws) need to apply by requiring raiparability, which is difficult (but not completely impossible) to quantify.


You're right of course but it also depends on how long you want to spend on it. If Python gives you radix sort directly and the C implementation you can have with the same time is bubble sort because you spent much time setting up the project and finding the right libs it kinda makes sense.


Python doesn't come with Radix sort, and Julia doesn't come with

     [[deps.SuperDataStructures]]
     git-tree-sha1 = "7222b821efcee6dcdc9e652455da09c665d8afc1"
     repo-rev = "main"
     repo-subdir = "SuperDataStructures.jl"


I really think we should converge to semantic codes. By example Background is zero, standard is 7, positive / negative, highlight, colored1,2,3 .. with correct defaults, and let the user have a common 8 or 16 colors palette in the terminal for all textmode apps. Imagine having some kind of unified color themes in the terminal.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: