Hacker Newsnew | past | comments | ask | show | jobs | submit | thraxil's commentslogin

No. Right now I'm upset that Google has removed (or at least is in the process of removing) the Gemini 2.0 flash model. We use it for some pretty basic functionality because it's cheap and fast and honestly good enough for what we use it for in that part of our app. We're being forced to "upgrade" to models that are at least 2.5 times as expensive, are slower and, while I'm sure they're better for complex tasks, don't do measurably better than 2.0 flash for what we need. Yay. We've stuck with the GCP/Gemini ecosystem up until now, but this is kind of forcing us to consider other LLM providers.

this is one of the reasons im hearing more and more people are using open/locally hosted models. particularly so we dont have to waste time to entirely redo everything when inevitably a company decides to pull the rug out from under us and change or remove something integral to our flow, which over the years we've seen countless times, and seems to be getting more and more common.

products entirely disappearing or significantly changing will be more and more common in the llm arena as things move forward towards companies shutting down, bubbles deflating, brand priorities drastically reshifting, etc...

i think, we're at or at least close to a time to really put some thought into which pieces of your flow could be done entirely with an open/local model and be honest with ourselves on which pieces of our flow truly needs sota or closed models that may entirely disappear or change. in the long run, putting a little bit of thought into this now will save a lot of headache later.


Yeah. Back when Gemma2 came out we benchmarked it and were looking at open models. For our use case though, while the tasks are pretty simple, we do need a pretty large context window and Gemini had a big lead there over the open models for quite a while. I'll probably be evaluating the current batch of open models in the near future though.

What’s interesting about this is that for previous technologies you could define a standard and demonstrate compliance with interfaces and behavior.

But with LLMs, how do you know switching from one to another won’t change some behavior your system was implicitly relying on?


In case you don't know, Gemini 2.5 flash is hosted on DeepInfra. They also have 1.5 flash but not 2.0 flash.

I have no affiliation with DeepInfra. I use them, because they host open-source models that are good.


Thanks. Yeah, for now we're moving to 3.1 flash lite as that's the new cheapest at $.25/1M and is also still "good enough". 2.5 flash is more expensive at $.30/1M (looks like Deep Infra charges the same as GCP/VertexAI for it). I might check them out for Gemma though. We benchmarked Gemma2 when that came out and it wasn't remotely usable for us largely because the context window was way too small. It looks like 3 or 4 might be worth evaluating though.

Xiaomi's mimo-v2-flash is great if you care about speed and performance - it's 1/10 the price of Gemini 3.1 Flash Lite and faster (on OpenRouter).

GCP does server other non-Google models, but I'm not sure what they have other than Anthropic models. I don't think Haiku is a great model though.


Last time I went through SOC 2 we talked to our auditor about this. His view was that there are and basically always have been auditors/companies that will sign off on anything without verifying it if you're paying them. The rest of the industry knows who they are though. If you are taking things seriously and hire an auditor who does, that's one of the things that they look at when you're reviewing the reports from the services/subprocessors that you use. Ie, you can get a SOC 2 that doesn't mean anything but then any of your customers who know/care will flag it and it won't be worth anything.


From the article, OP dealt with this.

> But what do you do when the enterprise you are selling to asks you to show that pen-test report (which you never did despite paying for it, because Delve told you a pentest-tools.com vulnerability scan sufficed)? When they ask for your most recent risk assessment, do you just screenshot Delve’s pre-fabricated assessment and pray nobody will pay attention?

> It was that point where the realization sank in. We knew we messed up. We were unable to answer most questions honestly without jeopardizing the deals we were trying to land. We scrambled to get things done the proper way outside of Delve, in an effort to pretend to know what we were doing, but it ended up simply being too much work to get done quickly enough to save things.


Shameless self-promotion, but my own post on Ratchets from a few years back: https://thraxil.org/users/anders/posts/2022/11/26/Ratchet/ Similar basic idea, slightly different take.


Leonard Susskind's "The Theoretical Minimum" series is a great start. His corresponding Stanford lectures are on youtube as well and are a nice supplement.


Working heavily in Python for the last 20 years, it absolutely was a big deal. `pip install` has been a significant percentage of the deploy time on pretty much every app I've ever deployed and I've spent countless hours setting up various caching techniques trying to speed it up.


Yep. We have tables that use UUIDv4 that have 60M+ rows and don't have any performance problems with them. Would some queries be faster using something else? Probably, but again, for us it's not close to being a bottleneck. If it becomes a problem at 600M or 6B rows, we'll deal with it then. We'll probably switch to UUIDv7 at some point, but it's not a priority and we'll do some tests on our data first. Does my experience mean you should use UUIDv4? No. Understand your own system and evaluate how the tradeoffs apply to you.


I have tables that have billions of rows that use UUIDv4 primary keys and I haven't encountered any issues either. I do use UUIDv7 for write-heavy tables, but even then, I got a way bigger performance boost from batching inserts than switching from UUIDv4 to UUIDv7. Issue is way overblown.


Nice feedback. Out of curiosity, have you made any fine-tuning to psql that greatly improved performance?


Nope. Out of the box GCP Cloud SQL instance.


I switched from EE to CS (well, "Computer Engineering" technically) in the late 90s. Not specifically due to Smith charts, but that's relatable. For me it was just realizing that I was procrastinating on doing my EE problem sets, which just started to seem like endless grinding of differential equations, by playing around with whatever we were doing in the couple CS classes I had. I wouldn't say I've made "a large fortune" in software, but it's kept me gainfully employed for a few decades so I think it worked out.


Obviously nothing solid to back this up, but I kind of feel like I was seeing emojis all over github READMEs on JS projects for quite a while before AI picked it up. I feel like it may have been something that bled over from Twitch streaming communities.


Agree, this stuff was trending up very fast before AI.

Could be my own changing perspective, but what I think is interesting is how the signal it sends keeps changing. At first, emoji-heavy was actually kind of positive: maybe the project doesn't need a webpage, but you took some time and interest in your README.md. Then it was negative: having emoji's became a strong indicator that the whole README was going to be very low information density, more emotive than referential[1] (which is fine for bloggery but not for technical writing).

Now there's no signal, but you also can't say it's exactly neutral. Emojis in docs will alienate some readers, maybe due to association with commercial stuff and marketing where it's pretty normalized. But skipping emojis alienates other readers, who might be smart and serious, but nevertheless are the type that would prefer WATCHME.youtube instead of README.md. There's probably something about all this that's related to "costly signaling"[2].

[1] https://en.wikipedia.org/wiki/Jakobson%27s_functions_of_lang... [2] https://en.wikipedia.org/wiki/Costly_signaling_theory_in_evo...


There’s a pattern to emoji use in docs, especially when combined with one or more other common LLM-generated documentation patterns, that makes it plainly obvious that you’re about to read slop.

Even when I create the first draft of a project’s README with an LLM, part of the final pass is removing those slop-associated patterns to clarify to the reader that they’re not reading unfiltered LLM output.


Yeah and this explains why you see it in LLMs in the first place. They had to learn it from somewhere.


The name of HuggingFace is a reminder that it was a thing long before the current crop of LLMs.


Erlang/Elixir supervision trees also rely on process linking, which is implemented in BEAM and doesn't have a real equivalent in most other language runtimes (modulo some attempts at copying it like Akka, Proto.Actor, etc, but it's fairly uncommon).


Yeah, I switched from XMonad (which I used for over a decade) to Sway a few years back. Spent some time trying to duplicate the XMonad behaviour but eventually just realized that spending a few hours getting used to the Sway approach and slightly changing my workflow was a lot easier.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: