Hacker Newsnew | past | comments | ask | show | jobs | submit | robconery's commentslogin

The Big O of an algorithm does not change based on data size. Even if your set had precisely 1 record in it - the code you write, if it loops over every item in the set (even if it's only one item) - is still O(n).


Yes I know that - was this reply meant for another comment?


OP here - O(N log N) is not O(n^2). If n is 1000 then log n is 10, which is 1000 * 10 which is 10,000. That's a bit less than 1000 * 1000.


You're wrong. n lg n grows asymptotically slower than n^2, and is thus is O(n^2). n is also O(n^2), as is 1, or even sin(n). All of this follows directly from the definition [0].

Also the fact that you're plugging numbers in for n indicates to me that you don't actually understand big O notation, which is surprising since the original article is decent.

[0] https://en.wikipedia.org/wiki/Big_O_notation#Formal_definiti...


OP here. Agree that thinking about code is better than using a rule of thumb, but we need to start somewhere don't we? I tried to make it clear in the post that looping over n items within an n loop is n * n.


OP here - Big O notation is simply shorthand math. When you're discussing things in this way, time complexity and performance are the same thing. When you care about resource usage (memory etc) that's space complexity, which is different. Either way, they're good things to understand.

>So no I don't think you should use an O(log n) operation in place of an O(n) operation just because of Big O, what matters is which one is faster.

Mathematically the log n is always faster :). Realistically... well that would be a tough one to prove, even with caching, but I say go for it.


> time complexity and performance are the same thing

By definition, they are not.

> Mathematically the log n is always faster :)

No, it's not. Time complexity only gives you an asymptotic bound on the number of 'operations', it tells you nothing about what the actual run time will be.


I believe we're talking past each other. Big O has nothing to do with "actual run time*. It doesn't care what about the number of inputs you have - just that you have them.

Mathematically, if n=1000 then log n is 10. 10 operations vs. 1000 is, theoretically and time complexity wise, faster.

Our disconnect is "actual" vs. "theoretical" and I want to stress again that Big O is purely theoretical. It's a technical adjective.


OP here. Big O is indeed "worst case scenario" always, the size of the data set doesn't matter. An O(n) operation doesn't care if the data is sorted - even if it's the first item as you suggest. When you discuss Big O it's always worst case.


If Big O is always worst case, you're going to have a hard time saying anything interesting or useful about quicksort, for instance, or hash tables.

You're going to be limited to "quicksort is O(n^2)" and "hash table lookup is O(n)", both of which are quite misleading.

There are good reasons that we usually look at the worst case, but big-O does not fundamentally or necessarily mean that.


Well that's not what your article is saying - it says (well, not in these words, but it's at least what I got from it) that it's meant to be something practical to understanding algorithmic complexity and how that relates to code performance. And for that, the size of the data and the details of the algorithm very much do matter. Throughout these comments, people seem to be using two 'concepts' of big O and because of that, talking past each other: the academic 'provable upper boundary' concept and the applied 'what algorithm or data structure should I choose for my concrete problem, and how does complexity help me decide' concept. That last one is what is colloquially known as 'big O analysis', whereas technically that term is reserved for something else indeed. I'll readily admit that when I first made my GP comment, I didn't really clearly make that distinction in my mind either, which is probably what is the real underlying issue I was trying to point out.


This is wrong. Big-O can be used to talk about any function you like, whether best case of an algorithm, worst case, average case for uniform inputs, average case over some different distribution of inputs, how many branch mispredicts a Skylake core hits while running the code on backwards sorted input, the median price of a Big Mac as a function of a country's GDP per capita, or anything else.


Big O is not "worst case scenario". It describes an upper bound on functions. You can absolutely compute the Big O of the average case runtime of an algorithm.


Derp. OP here - yep typo and corrected thank you!


OP here - as a matter of fact I try to do just this, starting with the database. I'm mostly a data person so I try to think through, as deeply as I can, what I should expect in every table - there has to be a sensible default and if I can't find one then I rethink my design. You'll probably disagree with me and grunt out another single sentence missive, which is fine, but I think it's worth taking some extra time and using Null as a bit of a warning. It's a crutch! A way to stop thinking and say "whatever I don't know what this value is supposed to be so... it's null. Let's go shopping!"


Frankly, the idea that there must be sensible default worries me.

Take a database of people. There is literally no sensible default for name, age, gender, height, weight, social security...

If you’re amazon, your products have no sensible default for manufacturer, shipping weight/size, delivery address...

In fact, for just about any real-world data, there simply is no sensible default for anything at all. Most “sensible” defaults will eventually bite you in the arse. The only sane way to keep nulls from your DB is to refuse inserting incomplete data in the first place, and propagate the error to the user. Heavens save your team if you’re dealing with batch data and insist on not allowing nulls in the DB, though.

You can sweep this mess under a rug and pretend you have no nulls by turning things into relations that are allowed to be empty — “there are no delivery_address rows for this user” — but that’s a null in sheep’s clothing. Either your application knows how to deal with the query coming up empty, or it doesn’t.


What do you use in a database when you have a field where you literally do not know what the value should be?


If you have a PEOPLE table and some birthdates are unknown, then remove the "birthdate" column and make another table called PEOPLE_BIRTHDATES with a "birthdate" column and a foreign key pointing to PEOPLE. Now your queries can have lots of left joins. The results will still have nulls, however.


Which is the reason why you shouldn't write outer joins.


So if we don't know the customer's birthdate we can't serve her? I can imagine a problem with that...


Sigh.

Where have I said any such thing ?


If there's no row for the customer in the joined table, the customer won't show up in an inner join.


Great. Now if you can explain to me where you got the idea that a join (inner or otherwise) is the only possible way to query two tables then we might get somewhere. Because you can also just do two queries. And no, that does not necessarily mean "two roundtrips to the DBMS" (which I know perfectly well is undesirable). There are techniques for avoiding that. Perhaps not in SQL, but that's a reason you should be pressing the vendors to improve SQL. Not for you to agree to the status quo of sticking with the vendors' old bypasses-and-hacks cheating bag.


Haha OK then use a UNION... oh wait we're gonna have NULLs with that too. One suspects you'll also have some vague objection to this point, but if the only way to address that is to wait on somebody to invent an "improved" SQL, one won't worry about it too much.


That "improved SQL" was already defined in the previous century, and has been implemented as well. Your ignorance drips off of every word you write.


E.F. Codd literally designed null into the relational model. I don't think calling someone ignorant is very helpful.


But the demeaning ridicule that gets thrown at me is ?

(BTW I doubt very much that "Codd designed null into the RM". Even his 12 rules mention only "a systemic way to deal with missing information", not "null".)


> What do you use in a database when you have a field where you literally do not know what the value should be?

You don't.

If a value may not be present for an entity, it's not an attribute of the entity in question, it's an attribute of another entity that has a (0..1):1 relationship to the entity in question.

Normalization eliminates NULL.


That's great. Now I do a query. Maybe I use a join. If a row has the "0" case of that (0..1):1 relationship, what do I get?

Or maybe I don't do a join. Maybe I do a separate query. If the query comes back with zero rows, then I... what?


What do I get ? You get what you ask for.

Then I ... what ? Then you do what needs to be done as specified by the business in the case the queried piece of information is unknown.


> What do I get ? You get what you ask for.

In the join case, don't I get a NULL in the row that comes back if there isn't an entry in the other table? Or do I just not get a row?

> Then you do what needs to be done as specified by the business in the case the queried piece of information is unknown.

Sure, but how do I represent that condition in my software? With a different class/structure? With a flag that indicates that the other field isn't valid? Or with a null?

From where I sit, normalization doesn't make the problem go away at all.


OP here - OP has checked out (and lived with) trinary logic. Just because you appreciate null doesn't mean it should be kept in programming languages and existing programs.

That there is a logical truism, isn't this fun?

Also: false means not true, zero is a number and null doesn't exist, by definition. We can model true/false/zero easily as they exist. Null is made up, so every language gets to think about what it means in an abstract made up way. Thus the pain, thus the post.


The last 100 years have been dealing with the fact that true and false are not enough to describe the world of computation. You also need undecidable, or null, for statements that can be proved to never terminate.

It is very much still an open question if a statement is absolutely undecidable and if we need to add something like null in all logic [0].

As for your arguments on why we don't need null, they sound exactly like the arguments against zero from the middle ages [1].

>Just as the rag doll wanted to be an eagle, the donkey a lion and the monkey a queen, the zero put on airs and pretended to be a digit.

[0] http://logic.harvard.edu/koellner/QAU_reprint.pdf

[1] Menninger, Karl (1969), Number Words and Number Symbols. Cambridge, Mass.: The M.I.T. Press.


OP here - Yes that's the operation the question wasn't supposed to be a literal one, rather a consistency issue, which illustrates the larger point that different languages deal with null differently because it's not logical and therefore confusing and a pain in the ass :)


Languages are definitely not consistent in what they do if you try to coerce null to various types, nor are they consistent (often even internally) about whether they perform coercions implicitly. The Ruby example seemed to deal mostly with the latter.

I think implicit coercion is bad, but that's a separate issue from null.


>it's not true to say they don't exist in other languages

Sounds good, waiting for an example to support this...

>It's also wrong to suggest that Null has no place in "logic". Boolean logic is one type of logic, but it's not the only type.

What other types did you have in mind? Given that we work as programmers in a world defined by true/false 1/0 logic, I think you might want to reconsider this blanket dismissal.


You seem intelligent, so I'm going to drop you into the deep end. In category theory, a topos [0][1] (plural topoi) is a structure-place where logic can be done. Boolean logic corresponds to a particular topos. Actually, there are at least two topoi which do Boolean logic; one of them has the law of excluded middle, and another has the axiom of choice. [2]

And there's an infinite other number of topoi to choose from! Topoi can be custom-made to categorical specifications. We can insist that there are three truth values, and then we can use a topos construction to determine what the resulting logical connectives look like. [3]

Finally, there are logical systems which are too weak to have topoi. These are the fragments, things like regular logic [4] or Presburger arithmetic.

To address your second argument, why do we work in a world with Boolean logic? Well, classical computers are Boolean. Why? Because we invented classical computing in a time where Boolean logic was the dominant logic, and it fits together well with information and signal theory, and most importantly because we discovered a not-quite-magical method for cooking rocks in a specific way which creates a highly-compact semiconductor-powered transistor-laden computer.

Computers could be non-Boolean. If you think that the brain is a creative computer, then the brain's model of computation is undeniably physical and non-classical. It's possible, just different.

Oh, and even if Boolean logic is the way of the world, does that really mean that all propositions are true or false? Gödel, Turing, Quine, etc. would have a word with you!

[0] https://en.wikipedia.org/wiki/Topos#Elementary_topoi_(topoi_...

[1] https://ncatlab.org/nlab/show/topos

[2] https://ncatlab.org/nlab/show/two-valued+logic

[3] https://toposblog.org/2018/01/15/some-simple-examples/

[4] https://arxiv.org/abs/1706.00526


There are a lot of three-valued logics that lots of programming languages implicitly follow by accident sometimes. For example:

  true || 3 ==3 || panic()
will, through short-circuiting, evaluate to “true” even though the final element doesn’t evaluate to either true or false.

(As you may observe, they also follow a weird logic where AND and OR are not commutative...)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: