Hacker Newsnew | past | comments | ask | show | jobs | submit | elehack's commentslogin

Yes. In little-endian, the difference between short and long at a specific address is how many bytes you read from that address. In big-endian, to cast a long to a short, you have to jump forward 6 bytes to get to the 2 least-significant bytes.


Wow, I've been living life assuming that little endian was just the VHS of byte orders with no redeeming qualities whatsoever until today. This actually makes sense, thank you!


Mamba is fast, and Pixi is also fast + sands a lot of the rough edges off the Conda experience (with project/environment binding and native lock files).

Not perfect, but pretty good when uv isn't enough for a project or deployment scenario.


And readability. data.table is very capable, but the incantations to use it are far less obvious (both for reading and writing) than dplyr.

But you can have the best of both worlds with https://dtplyr.tidyverse.org/, using data.table's performance improvements with dplyr syntax.


The article is about a mechanism for the OS to validate focus requests. The application with the link requests a focus token, and passes it to the browser along with the open-link request, and the browser can then request focus.

It isn't perfect, because there's no way to know that the browser isn't using the token to request focus for something else, but maintaining and validating chain of custody for focus across applications is exactly the problem it looks like they are working on solving.


That was exactly the example given in the article, but somehow this isn't what I expected would happen if I click a link in say, my email client or chat program.

I imagined it more like: User clicks link in email program. Email program tells OS: "Here, open https://..." -- OS checks URL scheme registry and selects Firefox, OS brings Firefox to the front and throws the URL at it and says "Open this."

I guess perhaps my naïve way could falls down if the OS accepts URLs from apps that aren't in the foreground, so a random background process could activate any app it wants to steal focus.


Yep. With the solution discussed, as I understand it, the e-mail program just needs to be modified to request a focus token and send it along with the URL request to the browser or the OS browser dispatch service to keep the expected behavior.

This could be abstracted by libraries (e.g. a method in Qt to open a URL in the system browser automatically gets the token) so each application doesn't need to be updated separately, or possibly even OS services.


Describing this as a limit on "CS programs" is a common, but erroneous, understanding of the proposal limit.

This specific solicitation — CISE Core Programs — has a 2-proposal-per-year limit. However, that only applies to this solicitation, and only counts proposals submitted to this solicitation. CISE Core Programs is an important CS funding mechanism, but there are quite a few other funding vehicles within CISE (Robust Intelligence, RETTL, SATC, and many more, including CAREER). Each has its own limits, that generally don't count or count against the Core Programs limit.


I like this — JSX is a little annoying to work with outside the major implementations.

If this existed, I might not have found the need to make my little Hyperstatic library (https://jsr.io/@mdekstrand/hyperstatic).


> JSX is a little annoying to work with outside the major implementations

There are dozens of us, dozens!


UUID v5 is quite useful if you want to deterministically convert external identifiers into UUIDS — define a namespace UUID for each potential identifier source (to keep them separate), then use that to derive a V5 UUID from the external identifier. It's very useful for idempotent data imports.


Both UUIDv3 and UUIDv5 are prohibited for some use cases in some countries (including the US), which is something to be aware of. Unfortunately, no one has created an updated standard UUID that uses a hash function that is not broken. While useful it is not always an option.


Could you provide an example of such a prohibition? I've never heard of that before.

I doubt that the quality of the hash function is the real issue. The problem with MD5 and SHA1 is that it's easy (for MD5) and technically possible (for SHA1) to generate collisions. That makes them broken for enforcing message integrity. But a UUID is not an integrity check. Both MD5 and SHA1 are still very good as non-cryptographic hash functions. While a hash-based UUID provides obfuscation, it isn't really a security mechanism.

Even the existence of UUIDv5 feels like a knee-jerk reaction from when MD5 was "bad" but SHA1 was still "good". No hash function will protect you against de-obfuscation of low-entropy inputs. I can feed your social security number through SHA3-512 but it's not going to make it any less guessable than if I fed it through MD5.

Moreover, a UUID only has 122 bits of usable space. Even if we defined a new SHA2- or SHA3-based UUID version, it's still going to have to truncate the hash output to less than half of its full size. This significantly alters the security properties of the hash function, though I'm not sure if much cryptanalysis has been done on the shorter forms to see if they're more practically breakable yet.

There is one area where the collision resistance of the hash function could be a concern, though. If all of the inputs to the hash are under the control of a potential attacker, then maliciously constructed data could produce the same UUID. I still wouldn't think this would be a major issue, since most databases will fail to insert a duplicate key, but it might allow for various denial of service attacks. This still feels like quite a niche risk, though, and very circumstance-dependent.


Systems where a sophisticated attacker may engineer collisions are precisely why UUIDv3/5 are prohibited. SHA1 is deemed broken by some government authorities and not to be used in any critical systems, including as UUID (this is where I’ve seen it expressly prohibited). The entire point of UUIDs in many systems is that collisions should be impossible, system integrity is predicated on it. Many systems exist in a presumptively adversarial environment.

Similarly, UUIDv4 is also prohibited in many contexts because people using weak entropy sources has been a recurring problem in real systems. It isn’t a theoretical issue, it has actually happened repeatedly. Decentralized generation of UUIDv4 is not trusted because humans struggle to implement it correctly, causing collisions where none are expected.

There are also contexts where probabilistic collision resistance is disallowed because collision probabilities, while low, are high enough to be theoretically plausible. Most people aren’t working on systems this large yet.

Ironically, there are many reasonable ways to construct reasonable and secure 128-bit identity values but the standards don’t define one. Some flavor of deterministic generation + encryption are not uncommon but they are also non-standard.

That said, many companies unavoidably have a mix of standard and non-standard UUIDs internally. To mitigate collisions, they have to transform those UUIDs into something else UUID-like, at which point it is pretty much guaranteed to be non-standard. Not ideal but that is the world we live in.


Ok, that makes sense. As far as I can tell, even truncated to "just" 122 bits, there's still no known way to generate a SHA-256 collision, so the MD5/SHA1 versions are comparatively vulnerable vs an hypothetical SHA256 UUID version. However, it's starting to feel like UUIDs may not be long enough in general to meet the need for secure, distributed ID generation.


Disclaimer: I am a CS professor.

I don't think AI advancements will cause a problem for the value of the degree (or rather, if they do, then it wasn't a very good MS degree). The value of formal university CS education done well, at both BS and MS levels, is learning skills in a context that integrates those skills into a knowledge framework that transcends any particular technology and hopefully outlasts several trend changes. The specific ML algorithms you would learn in an ML-focused MS will likely be out-off-date soon; the training on problem formulation, data preparation, fundamental limits of learning, and the theory of how ML works will not only outlast many technology shifts, but give you a good framework for navigating those shifts and integrating new advances into your knowledge.

There are likely many programs that would not provide this kind of foundation. But in understanding in general the value of an MS, this is how I would advise a student to think about it. (and on MS vs BS, BS usually provides some opportunity for specialization but is very much a generalist degree; an MS provides more opportunity for specialization and credentialing on that specialization.)


asks a drug dealer How do you feel legalization will impact your business? /sarcasm

Disclaimer: I dropped out, but i do wish i finished just because it's sad to now be 36 and I hate leaving things undone.

In all seriousness, i think higher ed has issues to resolve regardless of whatever AI does to it. The ongoing imbalance between the value one can extract from a degree and what you get out of it has been mostly impacting students other than CS or other engineering degrees, but with a slower economy we may end up sucked into the issue other fields have long suffered from. Speak to anyone in the environmental field, hard to believe this is /the issue/ of our time yet we value is so poorly.


>The value of formal university CS education done well, at both BS and MS levels, is learning skills in a context that integrates those skills into a knowledge framework that transcends any particular technology and hopefully outlasts several trend changes.

While I don't disagree with your main point re the value of a CS degree, this is the same argument verbatim given by every English, History, and Underwater basket weaving professor.


They’ve also got a point. The skills may not be technologically valuable, but they can teach critical thinking and give broader context for life. Philosophy majors tend to do better than average salary wise as well.

That said I also believe many fields have gone bunkers. The whole everybody needs a degree also creates incentives for degree factories.


Outside of ML/AI what would you say are areas of CS in which a lot of active research is being conducted?


Programming language theory and formal verification have been relatively hot during the last 10-15 years and show no signs of slowdown. Still, a relatively niche area.

Also the intersection of CS, probability and statistics is a very interesting area to work on. Less trendy than deep ML, but really practical. See e.g. Stan, Pyro, Andrew Gelman's books, etc.


Thanks for the insight. My Software Quality prof gifted me a copy of one of Gelman's texts but I haven't had time to take it in; I should change that...

It's weird to me that formal verification isn't more widely used; I would think it would be common at least in safety critical systems development.


There's a lot to critique in publishing and associated costs, but this tweet is unfortunately factually wrong.

From the linked article, ACM's publication costs are $10.9M, not $33.7M.

One of the ACM's major publication initiatives over the last 3-5 years has been an overhaul of their publication templates and publication workflow, to ensure greater consistency in publication formatting, improve accessibility, and archive publications in more future-proof formats. There are also the ongoing costs of creating and indexing metadata (ACM tracks more metadata than arXiv, including resolved citations), preservation (ACM buys failsafe perpetual access services from Portico, arXiv has mirrors at other university libraries).

Should it cost $10.9M? I am not sure. Does it cost a lot more than what arXiv does? Yes.

For a costing exercise: the service ACM buys from Portico is archival and republication. If ACM goes insolvent, Portico flips on their archive and the content remains available. How would you price this service, knowing that when it is actually needed, it's because your customer can no longer pay bills, and you now need to take up their hosting (and all related costs) for approximately forever with no further revenue? I think a network of university libraries would be a more cost-effective way to provide this service, but it's the kind of thing that people working on publication and archival professionally think about, and that factors into the cost of professional archival-level publication.

(I cannot speak to IEEE.)


> their publication templates and publication workflow, to ensure greater consistency in publication formatting, improve accessibility, and archive publications in more future-proof formats

Publication workflow, formatting and accessibility? For every paper I’ve done I just send the ACM a final PDF produced myself from a LaTeX template that hasn’t changed in years. What’s the workflow for taking an already final PDF from authors and uploading it to a file server?


That workflow has changed in the last few years.

- Brand new templates (introduced about 5 years ago, the LaTeX template has had multiple updates per year since then)

- Workflow that makes use of the source (or possibly codes the source embeds in the PDF, but you have to provide LaTeX source to ACM these days)

- Papers now render in both PDF and HTML (and the HTML looks quite good), this started showing up within the last 1-2 years

- Papers are archived in an XML-based format (something called JITS, I do not know details) to facilitate rendering to PDF, HTML, ePub, and other formats not yet devised


That doesn't seem too impressive. It's essentially a workflow that a few universities could band together and replicate via an open source project relatively easily IMHO.

As an example, Pandoc can already handle 90% of this type of workflow by itself (converting Latex to various XML formats). An open source project shared among a few universities or developed by single body like the ACM and used among dozen's of publications and fields. Even two or three full time people working on this would cost much less than $1M per year.


That sounds pretty counterproductive. So now authors, in addition to keeping up on their research, need to keep up on the updates to the ACM's LaTeX stylesheet? And there's every chance that the version that is formatted well with the ACM stylesheet when you initially submit will have formatting bugs six months later because the template got updated? And now you have a whole new toolchain to debug when the HTML version of your paper misaligns your tables? And maybe the HTML version that looks fine today will get mangled in 2028 after you retire and they update the CSS, as has happened with most of the New York Times articles?

It sounds like the ACM has a really different set of priorities than libraries and researchers do, one that values increasing headcount over guaranteeing permanence.


I'm not sure how it works at ACM, but often, it's people retyping the contents of your article into a JATS-XML template and adding additional metadata (authors, date of publication, perhaps who funded it, etc.), which is then used to generate several outputs (e.g. PDF, HTML, but also citation lists, etc.).


>The Journal Article Tag Suite (JATS) is an XML format used to describe scientific literature published online. It is a technical standard developed by the National Information Standards Organization (NISO) and approved by the American National Standards Institute with the code Z39.96-2012.

https://en.wikipedia.org/wiki/Journal_Article_Tag_Suite

>LaTeXML is a free, public domain software, which converts LaTeX documents to XML, HTML, EPUB, JATS and TEI.

https://en.wikipedia.org/wiki/LaTeXML

The wonderful thing about standards is that there are so many of them. And each one has variations.


> people retyping the contents of your article

Wow. Well I can imagine that’s expensive.


Thank you for the correction.

IEEE's $193m is where we should focus our attention, when it comes to this expense line.


I agree. I have no idea what IEEE is doing that costs that much. And while I don't take as hard a line against them as I do against Elsevier, I have never published with them and don't currently have any plans to change that.


I'm not sure how many articles are published a year in ACM [1], but the answer seems to be a few 10,000s. That's a per-article publishing cost of a few hundred dollars, which is not unrealistic to me.

[1] The ACM Digital Library claims 2.8 million published over 84 years, or about 33,000/year if divided equally over the years (which is laughably false). Some number of that quantity may include citations for keynotes or posters, which aren't really research papers, but I don't have a good handle on that rate.


Annual report 2019 gives some details - 34,000 full text articles were published in the DL. This will exclude non-archival content like keynotes, posters, etc if conference organisers provide correct metadata.


Backblaze can back up arbitrarily large local drives, but does not allow you to set network drives as backup sources (for precisely this reason). It's fine with the local drive being shared - our desktop's big storage drive is exposed over the network - but it can detect and refuse mounts from other machines. I don't know what it does with an iSCSI drive, haven't tried.

I think it's harder to detect network mounts in a way that wouldn't have a bunch of false positives on Linux.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: