Big O Notation – Explained as easily as possible

bjeds · on Jan 16, 2021

If you are the kind of person that want to read an article titled "explained as easily as possible", I think you should just avoid saying the phrase "big oh" but instead talk about algorithm runtime more informally, like "quicksort has a worst case quadratic but average case n log n runtime".

The risk is otherwise you will shoot yourself in the foot, maybe during an interview or other situation, as Big O is just one out of many members in a family of notations that has a very specific mathematical definition, other common ones being small-o and big-theta.

Analysis of algorithms is more difficult than it appears. If you implement an algorithm in a high level language like Python you may get much worse runtime than you thought because some inner loop does arithmetic with bignum-style performance instead of hardware integer performance, for example. In such case you could talk of big-omega (your analysis is bounded-below instead of bounded-above, asymptotically).

Al-Khwarizmi · on Jan 16, 2021

I know the rest of the notations in the family but, to be honest, even in most algorithmics textbooks they tend to use Big O like 90% of the time, even in contexts where Big Theta would be more precise (e.g. "mergesort is O(n log n) in all cases"). Let alone in more informal contexts.

I don't especially like it, but it's OK because it's not a lie. And I do think for people who just want the gist of the concept, like readers of this piece, it's enough and it's not worth being fussy about it.

What irks me is people who use the equal sign as in T(n)=O(n log n), though. Why would a function equal a class? Set membership notation does the job just fine.

thaumasiotes · on Jan 17, 2021

> What irks me is people who use the equal sign as in T(n)=O(n log n), though. Why would a function equal a class? Set membership notation does the job just fine.

It's strange but it's fully standard. I would guess it develops from the use in analysis, where o(n) is more common. You derive your formula, it has a term in it that you don't want, you observe that "complicated term's numerator = o(n)", you take your limit, and the term vanishes away. Use of the = sign makes more sense there, because conceptually you're claiming something about the value of the term. (Specifically, that in the limit, it's equal to zero.)

repsilat · on Jan 16, 2021

> even in contexts where Big Theta would be more precise (e.g. "mergesort is O(n log n) in all cases"

Just to be careful here: the difference between big/little oh/theta/omega is orthogonal to best/worst/average case.

A pedant could say that merge sort makes O(n^3) comparisons in both the best and worst case, ω(1) in both the best and worst case, etc. Colloquially, the former means "as fast as", and the latter means "slower than".

gvx · on Jan 17, 2021

I've seen this a lot, where people are convinced big Oh is specifically meant for worst cases performance.

And there may be some logic behind it, because if some function is in Θ(n^3) in the worst case, then it is true that it is in O(n^3) in all cases, so maybe that is why they couple big Oh with worst case growth.

Al-Khwarizmi · on Jan 17, 2021

Again, I think from a shallow perspective it makes sense.

If an algorithm always runs in O(n^3), then it's guaranteed that it runs in O(n^3) in the worst case. And if an algorithm runs in O(n^3) in the worst case, then it's guaranteed to always run in O(n^3) (but not in Θ(n^3), of course). So if you only care about worst-case performance, it's reasonable to only use the big O.

Of course, what your parent comment says is also true - you could say that mergesort is O(2^n), in the worst case or in any other case, and be correct because it's an upper bound. But people using Big Oh informally don't say that because you typically want to show how good your algorithm is, so you use the tightest upper bound possible (i.e. the big theta of the worst case).

indymike · on Jan 16, 2021

I didn't really read that article as being for a developer. I read it and thought, hey, this one would be good to share with a few project managers & business unit managers. Any time people who are not programmers (especially ones we have to work with) start to better understand what we're really doing, it is a good thing. Articles like this are superb for helping them understand that developers do have a disciplined and rigorous way of solving problems.

I do agree with you that these articles do leave a lot of detail and precision out. They tend to give the reader a superficial understanding of the subject... but a superficial understanding may be enough to help.

Edmond · on Jan 16, 2021

Agreed.

The use of "Big O Notation" itself as a way of referring to algorithmic complexity seems like a misnomer, considering that the topic is about analysis rather than the notation used to express the results of such analysis.

Unfortunately academic textbooks have terrible "UX", so students end up dealing with confusing presentation of topics, hence we're stuck with labels such as "Big O Notation".

bjeds · on Jan 16, 2021

I hear you.

Whether I like it or not, by now big o notation has fallen into the category of "folklore" that working engineers use and abuse informally without being very precise about it.

It's like the "proof by engineers induction": if some statement P(n) is true for P(0), P(1) and P(2), then P(n) is true for all n \in Z. :-)

Similarly if an engineer states that algorithm has a runtime of O(f(n)) that should probably be read as "as n grows very large (whatever that means) the runtime approximates (whatever that means) some bound (below, above, whatever) f(n). yolo.".

But people should at least be _aware_ that they are being imprecise about it.

If I read a blog post or StackOverflow post or whatever and I see big-theta notation I know that the person is probably precise with her definition. If I see big-o then it may be correct, or accidentally correct (happens often due to the nature of the definition) or mistaken.

tshaddox · on Jan 16, 2021

There can be appropriate levels of imprecision. Arguably all communication necessarily requires that. This is only a problem if it leads to an unnoticed miscommunication. For example, I suspect most of the time when an engineers refers to an algorithm as being in O(n^2) they intend to preclude the possibility that the algorithm is not in O(n).

bonzini · on Jan 16, 2021

Usually "average" or worst case" O(n^2) means it's really big O, while "best case" or "always" O(n log n) means big Theta.

ganafagol · on Jan 17, 2021

No, that's the opposite of what parent wrote. If an engineer mentions "this makes our functions run in O(n^2)" they mean either average or worst case and actually mean theta. The interesting fact they want to express is not the upper bound but the lower bound.

bonzini · on Jan 17, 2021

I mean you say "best case O(n)" if the best case is Theta(n), even though the algorithm may be quadratic on average.

You don't say the best case is O(n^2) if the best case is Theta(n log n) even though technically that would be correct.

OakNinja · on Jan 17, 2021

Your comment reminded me of this SO answer: https://stackoverflow.com/a/185576/1502563

/* This is O(scary), but seems quick enough in practice. */

nicoburns · on Jan 16, 2021

The problem with that is that informal use of big-o like notation is a lot more intuitive than the fancy language in your explanantion.

Most people who can program can grasp the informal meaning of O(n^2) pretty easily. They may not connect the word quadratic to that say concept.

sabas123 · on Jan 16, 2021

Why wouldn't you be able to just say "worst case n^2?"

umanwizard · on Jan 17, 2021

O(n^2) doesn’t necessarily mean worst case; it could mean average case.

smichel17 · on Jan 16, 2021

Dropping a constant

tester756 · on Jan 16, 2021

>Analysis of algorithms is more difficult than it appears. If you implement an algorithm in a high level language like Python you may get much worse runtime than you thought because some inner loop does arithmetic with bignum-style performance instead of hardware integer performance, for example. In such case you could talk of big-omega (your analysis is bounded-below instead of bounded-above, asymptotically).

Of course, that's why Big O says one thing, Compiler, CPU and Caches may say the other.

cortesoft · on Jan 16, 2021

I think the main purpose of an “explained as easily as possible” article is to help the reader understand when SOMEONE ELSE uses the term.

Sure, I can choose t use informal language, but I can’t stop someone else from using big o notation when speaking to me or writing something I want to read.

proverbialbunny · on Jan 17, 2021

>The risk is otherwise you will shoot yourself in the foot, maybe during an interview or other situation

If the point is to identify the speed (or ram consumption) of algorithm, then why not check for that itself instead of the vocabulary in an interview? Why be pedantic when you can instead measure how well they would do as a developer? In an interview you can ask followup questions to see how precise their ability to explain their thought process is.

If someone is so pedantic that they would consider the interviewee to have shot themselves in the foot because they said "The big O is n squared." without any followup questions from the interviewer, that doesn't sound healthy to me. I would worry this kind of culture would extend past the interview and it wouldn't be an enjoyable place to work.

Can you imagine working in a place where people regularly argue over terminology instead of just making sure everyone is on the same page?

(Full warning: I'm not a dev, so I'm coming in from the view of another industry.)

pvg · on Jan 17, 2021

If the point is to identify the speed (or ram consumption) of algorithm, then why not check for that itself instead of the vocabulary

To some extent because the "that itself" is a deep field in its own right with its own specialized vocabulary.

bakuninsbart · on Jan 16, 2021

Exactly, the link above starts well, but fails in some regards. If you use big O, you should give the mathematical definition and at least shortly explain it. The idea of an upper boundary isn't very hard, and it is actually important to understand that an algorithm running in O(n) is also running in O(n log(n)) is also running in O(n^2). It is at least necessary to understand the other greek letters, which really does come in handy in a deeper understanding of algorithms.

The list of "common" O's is also kinda bad. Particularly, and I see this all the time, I think it is a mistake to go from O(n^2) to O(c^n), as this is the step that leaves polynomial time complexity, and glosses over the fact, that each level of exponent constitutes a different time complexity. Here the mathematical notation of O(n^2), O(n^3), ..., O(n^l) is indispensible. Nesting loops is probably one of the most commonly relevant applications of O-notations, so this actually has an influence on real-world implementations.

kristopolous · on Jan 17, 2021

Last summer I was interviewing at a FAANG company for a supposed senior level position and I pointed this out.

Instead I was treated as if I fundamentally had no understanding of algorithms whatsoever. It was enormously frustrating, especially when I demonstrated real world runtime to the interviewer of two implementations.

If they need someone to actually make things work well on real physical hardware, they need to know how the claims map to the physical reality and the fundamental limitations of chalkboard optimization. The real world actually matters.

If he knew this maybe he wouldn't be overbudget, overdeadline and trying to mad hire people like some parody of the mythical man month...

8 months later it still rubs me the wrong way - that is, a bad faith read on new information as obviously objectively wrong and the speaker (me) as misinformed even after it's been demonstrated as accurate. Assuming everyone is stupid is a great way to hire, just fantastic.

I'd bet thousands the project is either still off the rails or they've overhauled the org chart. The product hasn't been publicly announced yet btw.

It's really all for the best. This way I only wasted one day instead of say 6 additional months just spinning wheels against a stonewall.

leonidasv · on Jan 17, 2021

Big O is not about actual hardware, not it should be.

We can argue about it being useful or not, but that don't change this fact.

kristopolous · on Jan 17, 2021

Of course it isn't. Of course it's a mathematical abstraction.

This project required high performance, high throughout, distributed computing with exabytes of data.

The person applying for the job should illustrate they can deal with that and knows when to ask what kind of question.

If they think chalkboard algo analysis is the end of the game, that they can just pack up and go home, not looking at the actual hardware specifications and capabilities, the real world implementations and costs, and just blindly trust the mathematical abstraction without any type of evidence, analysis, testing, or considerations of a system as complex as the physical hardware they are using, then good luck.

For example, if there is a "slower" implementation that's embarrassingly parallelizable and trivial to distribute, those are actually important factors.

If they have a "slower" implementation that also allows for a quicker mark and sweep or cache invalidation, those are also actually important.

If input data can be tightly characterized, that's actually important, it changes the real world expected results.

Bursty and continuous traffic are different problems so average throughput is insufficient for characterization.

This guy disputed all that. Basically the midterm I took 20 years ago as an undergrad when I 18 at the University, that's it. That's all of HPC.

By the end I was just giving him the college freshman level answers and he was genuinely surprised as if he thought I didn't know it.

Again, last I heard, the project is still on the rocks.

iujjkfjdkkdkf · on Jan 17, 2021

> Instead I was treated as if I fundamentally had no understanding of algorithms whatsoever.

This and similar experiences, basically discovering the interviewer, potential boss, or worse, actual boss, is not a colleague but actually a shallow copy of what one would expect from someone in their role, is pretty common, especially (from what I've seen) in orgs with a clear divide between a "manager" class that is mostly composed of people with less experience than those they are managing, and those doing the work.

It's best, as you say, to just write off the time wasted on the discussion, move on, and be happy you don't have to work with them.

melenaboija · on Jan 16, 2021

I agree with what you say and not a Python fanatic but this high level language stigmas catch my attention.

If you try to implement your own, let's say, intersection of sets you will probably get at most the same performance as Python using a reasonable amount of time.

I guess my point is that seeing the comment of Python in a thread like this can also be confusing. Bad (or maybe I should say "not appropriate for your use") implementations can exist in any language.

zachrose · on Jan 16, 2021

I appreciate that the author was specific about “if you count addition as 1 operation.” Without saying this, it’s not obvious that the notation is such a simplified abstraction over operations and their costs, with all the limitations that come with that simplification.

sixstringtheory · on Jan 16, 2021

> If you implement an algorithm in a high level language like Python you may get much worse runtime than you thought because some inner loop does arithmetic with bignum-style performance instead of hardware integer performance

If a language/library can change the dominant asymptotic term of an algorithm like that, instead of just the constant factor, that is a problem. Does that really happen with python or is this exaggeration? I’m inclined to accept another reason to dislike these super high level interpreted langs but I’ve never seen something that e.g. appears written to be logarithmic to become quadratic because of library implementation details.

Brian_K_White · on Jan 16, 2021

The language doesn't matter, nor even it's high vs low class.

You can write a short function in any language that looks O(1) if you only look at the surface or high level of it. Even assembly. Meanwhile in the middle of that routine is a single call to another function or macro which may be 0(1) or O(n!).

Python was just an example.

a1369209993 · on Jan 16, 2021

> Even assembly.

Although high-level languages like python or to a lesser but less excusable degree C++ are more susceptible to this because more features can secretly be greater-than-constant-time function calls.

cgriswald · on Jan 16, 2021

For Python code, I usually just ask critics if they’ve tested it (because I have). Frequently using a built-in will be faster than a custom loop even if at the surface level you’re going over the data more than once.

mhh__ · on Jan 16, 2021

Anecdotally (it was a contrived sorting benchmark so the exact numbers don't really matter), Python started off about 3 times slower than D, but and grew at what would be considered the same O() but the coefficient was enormous. To the point where a D was taking 3s for n-million arrays, Python closer to one minute.

Node was actually very impressive, roughly as fast as D's reference compiler (cutting edge optimisations maybe 2 decades ago) in release mode.

d0mine · on Jan 17, 2021

> If you implement an algorithm in a high level language like Python you may get much worse runtime than you thought because some inner loop does arithmetic with bignum-style performance instead of hardware integer performance

How does it change big O? Typically, you implement "bignum" on top of "hardware integer" regardless of the language. Or do you mean that some common integer operations are implemented with suboptimal big O in CPython?

crashocaster · on Jan 17, 2021

For some large n, integers in the algorithm may be so large that operations on them cease to be constant time.

d0mine · on Jan 17, 2021

that is obvious. But how does it change big O? (why any manual implementation would have a better big O compared to existing arithmetics implementation in CPython?)

kaba0 · on Jan 17, 2021

I think he meant that naively implementing an algorithm may not be bounded by the O notation he/she originally wanted due to code calling other functions “hidden” from the programmer.

bitexploder · on Jan 16, 2021

It is still a useful conceptual framework. Maybe this sparks someone’s interest. Agree it is much harder than it appears :)

polishdude20 · on Jan 17, 2021

Can you recommend any good algorithms books?

bhrgunatha · on Jan 17, 2021

I'm assuming you want something rigorous - based on the comment you replies to.

Many people will recommend CLRS [0] but I prefer it as a reference, rather than a learning resource. I feel it's very dry and academic.

Instead I'd recommend Tim Roughgarden's series of books Algorithms Illuminated for learning about analysis and algorithms. He also has courses on Coursera and Edx to cover the material. It's thorough rigorous and shows algorithms that apply to different paradigms - lke divide and conquer, graph theory.

Sedgewick and Wayne's Algorithms has a companion website with lots of additional material (heavily Java based) - and courses on Coursera too. Again I think it's more approachable than CLRS while still being detailed and covering the theory.

[1] https://mitpress.mit.edu/books/introduction-algorithms-third...

[2] http://timroughgarden.org/books.html

[3] https://algs4.cs.princeton.edu/home/

polishdude20 · on Jan 17, 2021

Thanks!

codesparkle · on Jan 17, 2021

A bad programmer solves their problems inefficiently and a really bad programmer doesn't even know why their solution is inefficient

To any beginners reading this: Solving problems inefficiently does not make you a bad programmer. Most of the time, an "inefficient" solution will be good enough, and optimising for performance comes at a cost.

So sit back, relax, and enjoy the journey.

volkk · on Jan 17, 2021

Should also add that an inefficient solution for a mostly fixed input size is still going to be efficient. If you have to write a double for loop but the outer loop is iterating on a 1000 element array and the inner one is operating on a 26 element array (e.g alphabet), it's still a fast and probably good enough solution.

gvx · on Jan 17, 2021

Operating on a fixed size datatype is trivially O(1), too (because O(c) = O(1) when c is a constant).

kaba0 · on Jan 17, 2021

Inefficiency and algorithmic complexity is imo not necessarily the same all the time.

For a beginner, inefficiencies like allocating in loops and the like (which can often be optimized) are not that big of a problem, since they only change a constant factor. On the other hand, O(n^2) and it’s supersets can be problematic when applied blindly. I don’t remember the exact situation but I recall a GUI app that listed some options in a drop-down menu. But on each click, they managed to call a function with O(n^2) complexity and you don’t need many elements to get a big number that way, so the drop-down visibly froze the UI (I guess it was an older framework with no separate thread/just bad code that worked on the main thread).

Of cource relax and enjoy programming, but I think reading up on algorithms can be fun and useful for the long term!

KnobbleMcKnees · on Jan 18, 2021

I wanted to add something to this from recent personal experience.

I've worked with a few engineers recently who were keen - occasionally insistent - on coming up with an O(n log n) solution, or better, at design stages for a specific project. We were working on different parts of the platform (and in different dev environments) but essentially implementing the same thing.

For the implementation I tried to talk them into going with a simpler implementation that was O(n²) for the initial release, but they were adamant not to.

When it came to writing automated tests for the feature, I became aware of some edge cases that hadn't been considered during the design stages. We had another design meeting, updated the requirements, yadda yadda.

A day or so later I put the changes in for review and had them merged reasonably quickly. I later found out that the other group had to significantly rewrite their algorithm and write new tests from scratch, ultimately leading them to miss out on launching the feature at the same time as ours had been released.

The moral of the story? A good programmer knows _what_ the best algorithm is, but a good engineer knows _when_ a given algorithm is called for. Premature optimisation is, after all, the root of all evil.

I've since updated my version to coincide with their more performant version and rewritten a lot of my tests.

deburo · on Jan 17, 2021

That's a popular sentiment to always cuddle new players in a field, but knowing the performance of your algos is part of the job. Performance comes into play in many scenarios. Your users may not "care", but you might be wasting a lot of their time.

brailsafe · on Jan 18, 2021

Sometimes it is, a lot of the time it's not, or at least not so much so that you're a bad programmer if you do solve a problem but it's not as efficient as it could be. If there's one thing I've realized over a number of jobs where I tried to do things right, it's that most of the time all the work you do doesn't matter, won't last, and your bosses only care that they can list some feature on the product page. They define good programmer as someone who gets their tickets in on time, and if it matters, they can budget for you to improve it with another one.

It's also totally fine to waste some of your user's time if you first create value for them that they didn't have before. That's the nature of iteration and MVPs

chii · on Jan 17, 2021

And great programmers would knowingly solve a problem inefficiently, because it's easier to write and ship it to the customer, and thus prove that the problem being solved is valuable.

blackbear_ · on Jan 17, 2021

Until, two years layer, the program needs one minute to start, every operation needs ten seconds to be executed and nobody knows why the program needs gigs of ram to stay idle and what to do about it.

rramadass · on Jan 17, 2021

Good caveat; beginners should not bother about any of this stuff. Focus on good modularity, clear code structure and language idioms and in general for readability.

pmiller2 · on Jan 16, 2021

Every one of these "Big-O explainers" says pretty much the same thing: count (or bound) the number of steps, then take the most significant term, and drop the constant associated with it. None of them explain why you take the most significant term or drop constant factors.

I get why that is. You need the mathematical definition to demonstrate why that is, and most "Big-O explainers" don't want to assume any significant amount of mathematical background. But, that definition isn't that hard. It's simply:

f(x) is O(g(x)) iff there exists a positive number M and an x_0 such that for all x > x_0, |f(x)| <= Mg(x).

And, if you're in an analysis of algorithms context, it's even easier, because you typically don't have to worry about this absolute value business.

Well, that M is essentially the reason you get to drop constant multiples of f(x). And, you drop the least significant terms of f(x) because g(x) dominates them, i.e. lim_{x -> \infty} g(x)/f(x) = 0. (No need to prove this, because this is what makes the "less significant" terms less significant.)

I would also like to add that the equals sign in f(x) = O(g(x)) is one of the most useful abuses of notation that I know of, but it can be misleading. It doesn't behave at all like a real equality because it's not symmetric, but it is transitive and reflexive. It actually acts more like set membership than equality.

casion · on Jan 16, 2021

You say it isn't hard, but I have 2 graduate degrees and didn't understand your explanation at all.

"Hard" is relative to prerequisite knowledge, which can vary significantly.

roywiggins · on Jan 16, 2021

The short version is that the fastest-growing term dominates all the others and for large xs the smaller terms round down to zero. Since big-O notation is about how the complexity grows for large inputs, you can assume the input is arbitrarily large, and you'll notice that the complexity is completely determined by the fastest-growing term.

You drop the constant because it doesn't alter how the complexity grows as the input increases.

pmiller2 · on Jan 16, 2021

Of course "hard" is relative. Because I was writing a HN post and not a "Big-O explainer," I didn't provide you with any of that prerequisite knowledge. But, the amount of prerequisite knowledge one needs to understand this is very, very little, and would easily fit in a digestible web page, provided you have some basic fluency with functions of the real numbers. And, I think that's a reasonable level of prerequisite to assume for anyone who wants to be throwing around terms like "Big-O."

If it's not too intrusive, may I ask what your graduate degrees are?

sabas123 · on Jan 16, 2021

> But, the amount of prerequisite knowledge one needs to understand this is very, very little, and would easily fit in a digestible web page

I'm really not sure on this one. This is easy if you have an idea of: 1. how you can graph things (runtime vs input size) 2. do the same but stretching the function to infinity 3. compare this to some term (which isn't as tangible compared to most algorithms imo),

And this is only for the intuition part. For people that got into programming by doing some UI stuff, I can definitely can see why _a subset of people_ struggle with this.

pmiller2 · on Jan 17, 2021

Sure, a subset may struggle. That's beside the point. I'm most interested in giving people who, as I mentioned, have a basic fluency with functions of the real numbers the answer to why you drop constant multiples and everything but the most significant term. If I wanted to make sure everybody understood (which isn't even theoretically possible), I'd have to include the better part of a semester-long course in my explanation, and that defeats the purpose of having everything in a digestible format.

andi999 · on Jan 16, 2021

The important pre knowledge is that every polynomial is dominated by its largest exponential term (for x to infty).

pmiller2 · on Jan 16, 2021

It's a little more than that. You need to account for the fact that n log n dominates n, but not n^2, as well. That's not hard, but you should spell it out.

This discussion is making me think that a good way to write a "Big-O explainer" would be sort of like a progressive web app, i.e. "here's the explanation with the highest level of mathematical sophistication. Click here to get some of the prerequisite knowledge." Then, the user just keeps clicking "break it down more" until either they get it or they reach the most detailed explanation.

mhh__ · on Jan 16, 2021

Largest exponential? That should be monomial, right?

Anyway you can show that on the spot if you think about what the derivative of that monomial looks like

pmiller2 · on Jan 16, 2021

One easy way to show that more generally is that if f and g are differentiable, f(x) dominates g(x) iff f'(x) dominates g'(x), by L'Hopital's rule. Details left as an exercise for the reader.

aborsy · on Jan 16, 2021

|g(x)|<M|f(x)| does not imply |g’(x)|<=C|f’(x)|.

pmiller2 · on Jan 16, 2021

Sure it does, for all functions f and g that we actually care about in the CS context for big-O. Hint: what if f and g are smooth?

aborsy · on Jan 16, 2021

One function could be below another and have arbitrarily derivative. Even if they are both smooth: f(x)=sin(e^x) and g(x)=1.

gvx · on Jan 17, 2021

Unless I'm mistaken, |g(x)|<M|f(x)| does not hold for those, since sin(e^x) has an infinite number of zeroes.

pmiller2 · on Jan 17, 2021

Ah, yes, right. Smoothness alone doesn't do it. Nonetheless, I still maintain that the property holds for all functions that we actually care about when doing analysis of algorithms. In particular, it certainly holds for sums and products of n! e^n, n log n, n, log n, and 1, which covers probably 99.9% of everything I've ever seen inside an O().

andi999 · on Jan 18, 2021

probably you meant monotonicity

kaba0 · on Jan 17, 2021

A function f(n) is O(g(n)) if the graph of f will be underneath the graph of g(n) for a big enough n. (If we want to be more correct, then I would have to add that if there exists a positive number c, and cg should be above the graph of f)

So f(n):=3n+28 will be O(n^2), because choosing c as 3, for every n greater than or equal 4, 3n^2 will be greater than f(n).

It would help if I could draw some graphs, but hopefully it helps.

andi999 · on Jan 16, 2021

Had such students as well. If I ask anything they said, oh I learned that as undergrad (implying that it is too long ago to remember). I am sad about such a waste.

casion · on Jan 17, 2021

Not sure what this is supposed to mean. Apparently this requires calc knowledge, which I was never required to take!

mhh__ · on Jan 16, 2021

Admittedly I am studying theoretical physics so I am supposed to be able to, but it made sense to me?

pmiller2 · on Jan 16, 2021

Excellent!

The level I was shooting for with my brief explanation was that someone who understood limits at a calc 1 level should be able to get it with a little thinking. I do wonder, though: did you know those things before you read my comment?

mhh__ · on Jan 16, 2021

I did. For some scale of what I get up to I'm currently banging my head against various differential geometry textbooks.

Ultimately you're limited in not having graphs, which always limits the intuitiveness of any calculus-explainers.

pmiller2 · on Jan 16, 2021

Cool. Sounds like you're in grad school right now, yeah? I know differential geometry is not exactly the same as differential topology, but I remember taking differential topology in grad school. I was always amused that we never actually evaluated any integrals that didn't come out to a very simple number (generally 0).

If I were writing this as a web page, though, I would definitely include a few graphs to explain the calculus concepts. The amount of calculus you need here is pretty intuitive once you draw a couple of pictures.

mhh__ · on Jan 16, 2021

I'm actually in my second year of university but I started learning physics "properly" at 14 so I had a head start.

kergonath · on Jan 17, 2021

This bit:

>> f(x) is O(g(x)) iff there exists a positive number M and an x_0 such that for all x > x_0, |f(x)| <= Mg(x).

is fairly similar to the so-called epsilon-delta definition of limits of functions. This way of reasoning is quite common. I know I bumped into it quite a lot: I learnt it in high school, even though I really understood it a couple of years later (I did a MSc in Physics). So the explanation above makes sense even if I never saw this exact formulation. Now I appreciate that not everyone is a Physics graduate, but I’d expect this to be understandable for people with degrees in applied Maths, Physics, or some related engineering discipline.

Al-Khwarizmi · on Jan 16, 2021

I'd say that the = is one of the most annoying abuses of notation that I know of, if not the most. Apart from not symmetric, which is a problem in its own right, for small o and small omega it's not even reflexive, which does not prevent the = users from using it also for those. Which amounts to using an equals sign to highlight how two functions differ.

And what do people gain with that? It's not as if a set membership symbol, which works just fine because big O, small o, the omegas and their ilk define sets of functions, doesn't work just fine and take the same space without being confusing.

edflsafoiewq · on Jan 17, 2021

The notation is from math; the point of it is that you can manipulate it like a normal expression, but it's "anonymous"; o(1) means some function that goes to zero, but we don't bother with a name, just recording the asymptotic behavior.

For example, f'(x) = lim (f(x+h)-f(x))/h can be rewritten with an error term f'(x) = (f(x+h)-f(x))/h + o(1), and then you can manipulate it more freely, say like f(x+h) = f(x) + hf'(x) + o(h). There's no need to drag out a bunch of useless names, each qualified by a set membership, to do this. I mean e(h) in o(1), e2(h) := h e(h) in o(h), etc.

The failure of "reflexivity", eg O(f) = O(f), is because the anonymity hides whether the two O(f)s are referring to the same function (ie. exactly what a name would tell us).

Tade0 · on Jan 17, 2021

I understood your definition (all in all I had calculus 101) but to me it only describes why that is, it doesn't explain it.

Most Big-O explainers don't assume a mathematical background because to non-mathematicians parsing your definition feels like being told you're in a hot air balloon. They see the what, but they don't understand the why.

uh_uh · on Jan 16, 2021

> And, you drop the least significant terms of f(x) because g(x) dominates them, i.e. lim_{x -> \infty} g(x)/f(x) = 0.

If g(x) dominates all the terms in f(x), then wouldn't lim_{x -> \infty} g(x)/f(x) go to infinity?

edflsafoiewq · on Jan 16, 2021

In most practical cases f=O(g) is the same as saying f/g is bounded.

jakear · on Jan 16, 2021

It acts exactly like set membership, because that’s what it is.

otabdeveloper4 · on Jan 17, 2021

> f(x) is O(g(x)) iff there exists a positive number M and an x_0 such that for all x > x_0, |f(x)| <= Mg(x).

Correct.

Corollary: x is O(x^2), for example.

tromp · on Jan 16, 2021

Note that f(n) = O(g(n)) denotes an asymptotic upper bound. So it would be correct (if confusing) to say that 2n+3 = O(n^2). This is important for use in algorithmic complexity because we may not always be able to determine the running time even up to a constant, but can more easily infer an upper bound.

Using Omega instead denotes an asymptotic lower bound. To denote both, you use Theta: f(n) = Theta(g(n)) means that for two constants 0 < c1 < c2 and large enough n, we have c1 < f(n)/g(n) < c2.

Finally, little o denotes vanishing behaviour. f(n) = o(g(n)) when f(n)/g(n) goes to 0 in the limit.

kaba0 · on Jan 17, 2021

A nitpick because this is an accepted notation, but as others mentioned in the thread: when someone writes 2n+3 = O(n) they mean 2n+3 \in O(n) (\in is little epsilon in latex, that is “element of set”), since O(f) is the set of all functions that has “f as an upper bound”.

lordnacho · on Jan 16, 2021

The thing that didn't click for me was precisely the "try to count the operations" thing that the author mentions. In fact it's the wrong road to go down, it isn't the point of big-o, and yet that's how you're invited to think about it when the issue is presented in college. It's only natural to think "oh let's look at all the operations and add them up".

I think of it pretty simply, but not in that formal big-theta/big-omega kind of way, because you want to just quickly have an idea of whether you'll write a really slow piece of code in general, not some best or worst case.

The question is simply what growth model dominates the increase of time/space for the algo for each of the inputs? Imagine the algo is already processing millions of input A, and you now increase A by a factor of 10, 100, etc.

This melts away all the setup costs, nothing machine specific like cache size matters, anything that isn't the dominating factor is swamped, and all you're left thinking about is probably how some loop expands. You also don't need to think about what coefficient that dominating term has, which you would if you tried to write an equation that took all the operations into account.

kaba0 · on Jan 17, 2021

It was the contrary for me. I think it helps understanding that we are actually put a value to each line of code that gets executed, but instead of microbenchmarking, we do it in an abstract way, say print gets c1 constant, addition gets c2 and the like. For loops will multiply the instructions’ sum inside them by the number of times they get executed. And basically that’s it. You sum the whole thing and get something like (c1+c2)n+c3 for a for loop over an n element list or something with two instructions inside and one other outside the loop. Since these were arbitrary constants, c1 and c2 can be replaced by another one, so you’ve got cn+c3, and since (I’m not gonna be mathematically rigorous here) as n changes, it will be much larger than the others, we are only interested in it, hence it was an O(n) algorithm.

The eye-opening thing about it was that for simple algorithms, I only need high-school math to analyze them for different measurements. Like, memory allocation is costly for this sort of application and I want to measure that, just count each malloc instead! (But do note that it is quite hard/impossible to rigorously analyze programs for modern CPUs with cache misses and the like)

_ktx2 · on Jan 17, 2021

I learned to code as a kid and only met mathematicians who consider themselves programmers as an adult.

Some opinion, maybe unpopular:

Big O notation can be quite informally understood by normal people. It is academic people that make it and keep it challenging because it is how they understand the world. This is why interviews have stayed materially gruesome despite loud voices wishing it weren't so. It's the language of the people that rule this industry and you either learn it or leave. That said, we can change it too, if we want.

rramadass · on Jan 17, 2021

>Big O notation can be quite informally understood by normal people.

Right; the concept is easily understood. It is the rigor of deriving and proving that is made difficult by the mathematicians which need not be that way.

As an example, Here is a neat communication from Faraday to Maxwell on receiving one of Maxwell's paper;

“Maxwell sent this paper to Faraday, who replied: "I was at first almost frightened when I saw so much mathematical force made to bear upon the subject, and then wondered to see that the subject stood it so well." Faraday to Maxwell, March 25, 1857. Campbell, Life, p. 200.

In a later letter, Faraday elaborated:

I hang on to your words because they are to me weighty.... There is one thing I would be glad to ask you. When a mathematician engaged in investigating physical actions and results has arrived at his conclusions, may they not be expressed in common language as fully, clearly, and definitely as in mathematical formulae? If so, would it not be a great boon to such as I to express them so? translating them out of their hieroglyphics ... I have always found that you could convey to me a perfectly clear idea of your conclusions ... neither above nor below the truth, and so clear in character that I can think and work from them. [Faraday to Maxwell, November 13, 1857. Life, p. 206]”

asperous · on Jan 17, 2021

I have met a lot of programmers that don't know about the concept nor do they proactively think to apply it.

I do understand the disconnect between knowing snobby language and doing good work. Certainly you can be an amazing programmer and apply these ideas possibly without ever even being trained on them or knowing the jargon.

In industry at least, a lot of work is communication so you have to know what things are commonly called to explain your thoughts to other people. and along those lines Mathematicians are the ones that are studying this concept in the abstract, so its useful to use their lingo because then you know where to find all the abstract knowledge on the subject.

Finally I'll say of all the obscure terminology for things intuitively applied, Big O has to be one of the most common, followed by gang of 4s design patterns.

_ktx2 · on Jan 17, 2021

> In industry at least, a lot of work is communication so you have to know what things are commonly called to explain your thoughts to other people. and along those lines Mathematicians are the ones that are studying this concept in the abstract, so its useful to use their lingo because then you know where to find all the abstract knowledge on the subject.

I think this is what I'm getting at. Mathematicians can adjust their language to communicate with a wider audience, especially on things as so commonly understood as Big O. It's a two way street, because you need to know and understand mathematical principles to be a good programmer, but if this is your only mode of understanding you are equally useless. There needs to be hiring gates for both.

maweki · on Jan 16, 2021

What I always find a bit missing in such articles is what the n is. The author writes "If you consider "addition" to be 1 operation" and this seems kinda intuitive and is correct if we talk about normal machine integers.

But adding two arbitrary integers might be somewhat linear in bit-width. And there we have it: with a fixed bit-width, this becomes a constant term.

So you might not want to talk about number of input terms as n, but also width of your terms (and python integers are arbitrary precision btw.).

So yeah, this is an upper bound on how many "steps" you do for every input, but often enough it's not really clear what a step is, especially if you have several "steps" that relate do different forms of data retrieval and organization (which often are culprits for bad performance if you're done optimizing the number of loops). Sometimes you can hide behind some guaruantees that your hashset has constant lookup. But did you factor in whether the hash function is actually constant (or even fast, for that matter)?

bnegreve · on Jan 17, 2021

> often enough it's not really clear what a step is

The steps are basic CPU operations such as load/store and basic arithmetic operations performed on fixed width type. I.e. things a CPU can do.

> And there we have it: with a fixed bit-width, this becomes a constant term.

It makes sense because that is how the CPU works: arithmetic operations on fixed width types are performed in constant time in the CPU ALU. Adding anything below 2^32 requires the same number of cylces.

It is only confusing when you confuse basic CPU operations with basic python instructions (an arbitrary precision addition is certainly not a simple operation from the cpu perspective)

maweki · on Jan 17, 2021

> The steps are basic CPU operations such as load/store

Only if you consider the result of a comparison as a "store" into the register. But again, comparing two objects might not be constant. The compiler I develop for my PhD, for example, uses (written in Haskell) somewhat nested sets and quite a few operations are dominated by the equality check.

> It is only confusing when you confuse basic CPU operations with basic python instructions

But the author did exactly that, when they considered addition to be constant, which is a good approximation for small integers, but wrong in python. And really, you do have to consider those intricacies, that's why I chose this example, as sets and maps in python are basically hashmaps, you can assume constant lookup, for example, if your hash function works in constant time. And again, this begs the question what the n is. Usually we would set n as the number of items in a collection for its retrieval operation. But you actually also have to consider the item being retrieved.

bnegreve · on Jan 17, 2021

> But the author did exactly that, when they considered addition to be constant.

Ok, I agree the article is imprecise. It should say fixed width integer addition instead of simply addition since the latter can either refer to a simple ADD cpu instruction or to the much more complex '+' operator in python.

However, once you agree on this, the concept of step is clear enough: it is an ADD instruction (or LOAD+ADD+STORE if you wish, still constant time for fixed width types). Obviously, it does not mean that every occurrence of '+' in python can be computed in constant time.

maweki · on Jan 17, 2021

This is exactly my point. In every somewhat lisp-like language, or even just considering operator overloading by any means (interfaces, for example), the concept of a step becomes unclear.

Say instead of the for-loop the author would use something like 'for idx in len(ls):' and then access items with [idx]. I think it's obvious that in order to know the runtime complexity, would would need to know what kind of access [] provides (linked list in linear time, array in constant time, treelist or skiplist in log time). That's why I said it's easy to hide behind implementation details. And if you do count them, with all intricacies, it gets quite complex.

We could now look at what the turing machine implementing that algorithm would do, as no "shortcuts" are allowed there. And the computational complexity is strongly bound to that kind of computation (Specifically it is unknown whether the number of derivation steps in lambda calculus translates to number of steps in a turing machine).

analog31 · on Jan 17, 2021

I learned a couple of things during a brief teaching stint. First, no matter how much math your students learned in their high school and college courses, you should expect to re-teach the concepts that are needed for your lesson. It needn't be extensive, but your students will thank you for it.

Second, don't introduce more than one hard concept at once. Asymptotes can be reviewed with pure math functions such as polynomials, but that's as far as they got in high school math. Then there are the other "interesting" orders such as log(n) that can be introduced, and you can show graphically why they're useful.

Now you're ready to discuss the order of algorithms.

I'm not a computer scientist, but that's how I learned it, and while I don't remember all of the algorithms today, I still understand the derivations of their orders when I see them.

lrossi · on Jan 16, 2021

If you really want to make it easy to understand, make it graphical.

That is: benchmark the code, varying the input size, and plot the results. Almost anyone should be able to understand.

This might also reveal effects that are not taken into account by Big O notation, as not all algorithms that have the same complexity have the same performance. But I see it as a plus.

mhh__ · on Jan 16, 2021

The coefficients are what can bite you.

Also amortized analysis, e.g. appending to a std::vector is O(n) in the worst case but O(1) almost all the time (the O(n) spikes come at something like every n = golden_ratio^k * initial size for integer k)

kevin_thibedeau · on Jan 16, 2021

It isn't uncommon for algorithms with "worse" algorithmic complexity to perform better than the faster alternative because the constant overhead of setting up the "better" algorithm eats all the performance gains. Sequential search is faster than binary search for short lists. You need to test to see where the crossover happens on your platform.

kaba0 · on Jan 17, 2021

Not necessarily set-up time, but either constants can reverse the roles, or the fact that CPUs are hardly complex and the model most often used in analyzing algorithms is more simple and doesn’t map too well to things like caches, branch-prediction and the like.

xiphias2 · on Jan 16, 2021

Just read a book on algorithms. They are timeless, unlike APIs, so it's worth the effort to study them.

xyzelement · on Jan 16, 2021

It's great to have a handle on big O but funny I've seen people index to it too much.

An algorithm can look "really bad" from Big-O point of view and still be really good if: it's applied to a small enough input, or it's implemented very efficiently.

musingsole · on Jan 16, 2021

Most people jump to an extreme hyperbole instead of the reality what they're grappling with. Big-O is just that algorithmatised. Many software systems must be designed to anticipate orders of magnitude growth in scale.

However, I'm not sure about how most software systems grow -- not sure anyone really knows. We all have narratives about it, but it's a small, small subset of projects I know about that have an order of magnitude different number of users than when I first learned about them (other than moving to effectively 0).

windowojji · on Jan 16, 2021

> it's applied to a small enough input

The whole point of big-O notation is to analyze algorithms as they're applied to input of size `n` larger than some `n_0`. Of course it's not useful for small inputs - it's explicitly about large inputs.

> it's implemented very efficiently

On large inputs, it's very hard see how, say, a linear algorithm with a quadratic algorithm regardless of how "efficiently" it's implemented. Assuming you're talking about something like cache friendliness or good register allocation?

xyzelement · on Jan 16, 2021

Yeah we are talking about the same thing. I agree that what you are saying is deeply connected to value of Big O. I am just pointing out that people sometimes forget to consider these things.

NationalPark · on Jan 16, 2021

Yes, it's good to remember that complexity analysis is only a theoretical topic that motivates certain approaches. When performance matters, you measure your program/system and use that as the ultimate guide for your work. Cache locality and branch prediction are two examples of practical issues that the theoretical approach will often ignore.

kadoban · on Jan 16, 2021

That in _no_ way explained big-O notation. It explained the basics of counting things to do with algorithms. Which is kind of fine, really. But why bother saying it's going to explain big-O?

dfdz · on Jan 17, 2021

Disclaimer: I am very familiar with big O notation and find it intuitive.

However, I think the notation \lesssim or << which is common in upper level math courses, but not often discussed in undergraduate texts is much easier to explain, and should be introduced in lower level courses. You write

f(n) \lesssim g(n)

if there is an absolute constant C such that

f(n) \le C g(n)

the meaning is completely clear and by rearranging an expression it can replace big O notation. For example instead of writing

f(n) = g(n) + O(h(n))

you write

|f(n) - g(n)| \lesssim h(n)

https://math.stackexchange.com/questions/1793395/who-introdu...

gvx · on Jan 17, 2021

If I understand correctly, f(n) << g(n) is equivalent to f(n) = o(g(n)), not f(n) = O(g(n)).

dfdz · on Jan 17, 2021

It depends on the author/field. In analysis sometimes f(n) <<g(n) means f(n) = o(g(n)), but other times (especially in discrete math) f(n) << g(n) means f(n) = O(g(n)).

Due to this ambiguity, I think the notation

f(n) ~< g(n) or in latex f(n) \lesssim g(n) is more clear

silisili · on Jan 16, 2021

I think a minimal reproduction of each common complexity would be a way more helpful addition.

Someone with little math background is going to have no idea what log even means, much less what causes it.

mypalmike · on Jan 17, 2021

As someone who learns best from experience, my intuition for algorithmic complexity didn't really come from reading about it or solving problems in CS classes (though that was useful). It came from running tests on my own inefficient code and waiting for it to finish, wondering why my stupid computer was frozen. And then digging in and realizing my O(n^3) or whatever algorithm was not going to be sufficiently fast for production cases.

infogulch · on Jan 16, 2021

I noticed some comments here discussing the right prerequisite knowledge to understand big-O and friends.

I propose limits. I think it would be a lot easier for someone to understand how to think about big-O if they already understood limits.

Lim [x -> inf] O(f(x))/O(g(x))

If you know limits, you know you how and why you can ignore all but the highest power term, how to compare and simplify other kinds of terms, etc. Though maybe that's too much to ask of someone new.

windowojji · on Jan 16, 2021

> Lim [x -> inf] O(f(x))/O(g(x))

I understand both limits and I've taught asymptotic analysis in the past. O(f(x)) and O(g(x)) are sets. How does one divide two sets? This explanation is ill formed.

psibi · on Jan 17, 2021

I don't think limits is a good way to understand it because of the reasons here: https://math.stackexchange.com/a/3222489/124772

ineedasername · on Jan 16, 2021

I'll admit I'm a little surprised to see a topic like this get much attention. Have programming courses & curriculums changed so much in the past ~20 years that this isn't simply part of every introduction to the subject?

Yes, the topic of code performance as a whole is more complex than just Big O, but as its own concept it was, in my time (get off my lawn!) pretty effective covered everywhere, and certainly the moment "algorithms" we discussed in any learning material.

Maybe it's just that it goes back to the common topic here on HN that there's a lot more inefficient code nowadays because faster processors & more memory helps to paper over the cracks. But if something like Big O isn't taught as one of the most primitive bits of knowledge in programming then I can't completely that trend either.

racl101 · on Jan 16, 2021

I think this might be for the self taught crowd of developers who never formally took Comp Sci, yet, who are also wielding important positions in software development that pay as well if not more than the guys who did take comp sci.

You'd be astounded how big this self taught cohort could be and how much power they wield.

They do their job pretty well, and yet, the basics of computer science is something that they never learned.

I met a senior guy the other day who had never heard of a freaking truth table for fuck's sake.

jobbo · on Jan 16, 2021

I was dropped into the tech lead position at my job last year; I started in game design at art college, so having to run and gun has been fascinating. I'd be lying if I said I wasn't scared reading about much of the stuff in this comment section that really should be bread and butter.

I'm talking to my boss to see if there's some kind of training program I can pick-up on the side to help me gather what should be the basics that I've missed out on, although we're so overloaded finding the time and money is challenging. I'm lucky it's mostly CRUD, but I can't help by worry every architecture decision I'm making is going to cost us massively down the road.

raegis · on Jan 17, 2021

A few people have made incorrect mathematical statements in this discussion, so having a math teacher might be useful when learning this stuff. My unsolicited advice: take your time and learn some mathematics you might enjoy. Trust simple mathematical definitions over long-winded "explained as easily as possible" essays.

Mathematicians congratulate each other for simple, elegant definitions (sometimes developed over decades) which make deriving results easy. If you don't understand a definition which requires only a few words, learn some of the background instead of doing 10000 Google searches for the "easiest" explanation.

Here's an example. In physics, a vector is something with a "magnitude" and "direction" and we associate feelings and intuition with this. In mathematics, a vector (in 3-dimensional space) is simply "an ordered triple of real numbers". Many people might find this definition unsatisfying, but it is simple, precise, and lots of USEFUL mathematics is created from it.

matttb · on Jan 16, 2021

Many, many developers did not go to school and have no 'CS' type learning

ncmncm · on Jan 17, 2021

Big-O notation was the only thing taught to me in a college (tech) class that I use. Everything else I use, I had already learned; you can pick up everything CS departments teach, just by reading. Engineering wasn't like that.

It was taught so well, by Paul Cull, that it seemed obvious and hardly in need of a name.

But I wish I had a nickel for every time some hotshot thinks that, because they got the right big-O performance, they are done. Where performance matters, big-O is is table stakes. There is typically an order of magnitude or two to be gained from that point.

tracyhenry · on Jan 16, 2021

Remotely related: the Theory of Computation course I took (by Michael Sipser) gave me a more fundamental understanding of algorithmic complexity than I had through solving algorithmic puzzles in high-level languages: https://math.mit.edu/~sipser/18404/

alok-g · on Jan 16, 2021

>> If you consider "addition" to be 1 operation then ...

What if we don't? Addition on computers is an O[1] operation only because we use fixed bit-widths for numbers, e.g., a 32-bit signed integer. How would we reformulate or express algorithmic complexity if we were to talk about unbounded integer values or infinite-precision mathematics?

aweinstock · on Jan 16, 2021

The contrast between Gauss-Jordan[0] and the Bareiss algorithm[1] is a good example of explicitly handling the length of the numbers in bits as part of the runtime.

Gauss-Jordan is O(n^3) if you consider addition and multiplication to be constant-time, but if it operates on arbitrary-precision integers, it can get exponentially slow[2].

The Bareiss algorithm avoids this worst-case behavior by choosing the pivots in a slightly more involved manner, which avoids building up large intermediate values.

[0]: https://en.wikipedia.org/wiki/Gaussian_elimination#Computati... [1]: https://en.wikipedia.org/wiki/Bareiss_algorithm [2]: https://staff.itee.uq.edu.au/havas/fh97.pdf, Section 3.1's "Construction A" is a specific matrix that exhibits the worst-case behavior.

qsort · on Jan 16, 2021

It generally changes the analysis very little but makes it more annoying, which is why arithmetic is usually taken to be O(1). Formally speaking you'd have to specify a computational model to be able to talk about complexity (there is no such thing as 'complexity' but only 'complexity respective to a model'; e.g. you can change the big-O complexity on Turing machines by using more tapes).

The computational model of a book like CLRS is "arithmetic is O(1), but we're not going to abuse that bug".

mhh__ · on Jan 16, 2021

You'd need to know the exact details of the implementation of your arithmetic library, but it would depend on the integer in question which makes analysis a little harder so adding two numbers would become some function O(f(n, m)) where n, m are the two numbers or some property of them (length, digits etc.)

Brian_K_White · on Jan 16, 2021

Then just substitute whatever other operation that makes you happy? Or an imaginary one if you mean to say there is no such thing in real hardware.

hisfastness · on Jan 16, 2021

Not a bad intro and appreciate the intention on keeping it short and simple, but personally for me I think Grokking Algorithms has the best beginner explanations of Big O and basic algorithms / data structures. If I had to share resources with a learner, I'd suggest Grokking as an item to go deeper on after reading this article.

dragontamer · on Jan 16, 2021

Big O is literally easier than the analytical alternative.

I didn't "understand" Big-O until I read Donald Knuth's "Art of Computer Programming". I forget exactly where Knuth does this... but Knuth derives the _exact_ number of average runtime on some algorithm. (Where "average" is defined as all possible permutations of the input).

I forget why or where this derivation was, but it was along the lines of 3nlog(n) + 5n +25, or something along those lines (I made up the numbers).

Solving for the specific and exact runtime of a function is very, very, very difficult. Instead of doing that, Comp. Sci has decided that the easier Big-O is "good enough" for most purposes.

That's it. Big-O is ALWAYS easier to calculate than the exact runtime. Yeah, its still hard in some cases, and there are situations like Radix sort (O(n)) vs Quicksort (O(n*log(n)), or Karatsuba multiplication vs optimal multiplication (like O(n^1.5) vs O(n^1.4...)) where the "slower Big-O" is better in practice.

But such situations are rare, and are easily figured out through profiling.

---------

So lets do some real-talk. Most programmers don't wait on their code anymore. Computers are so fast, that all this algorithmic complexity is a red herring compared to other issues. By and large, inefficient programming languages are being used in grossly inefficient ways and no one cares.

For most practical purposes, profiling with a stopwatch is your go-to methodology for analyzing algorithms. If that's not good enough, then profiling with a dedicated profiler (which can statistically count specific situations: like cache-hits or branch-mispredictions, as well as how many times any particular line of code ran).

That's the information you want and need. Take the runtimes, plot it on a log-log plot, fit a growth-exponent on the curve and BAM, you got O(n^whatever).

Where Big-O comes in are the situations where you're waiting for your code to finish. You run your code, and you wait 10-minutes, 1-hour, 10-hours, 2-days... will your code finish? Do you have an infinite loop? Or is it actually making forward progress? If so, how long do you estimate it to last? (And indeed: Hollywood movies are known to take multiple days to render a single frame, so these situations come up every now an then in practice).

Under such a situation, you can't really run a profiler or use a stopwatch. You probably can run a big-O analysis by pen-and-paper over your code however, and then get an estimate on the size of your data. You'll want to make sure that you're O(n) or O(n*log(n)). If you're getting O(n^2) or O(n^3) from an algorithm that's taking days to run... you might be in trouble.

PhantomBKB · on Jan 17, 2021

I liked it, but perhaps giving an example each of the different complexities that it talks about with the suitable use case would have been great. On that note, I'll take up the task of doing that in the near future.

hprotagonist · on Jan 16, 2021

https://nedbatchelder.com/text/bigo/bigo.html is also a good intro read.

WinterHoursTF · on Jan 17, 2021

I think calling it big O was a mistake. Saying worst case upper bound isn't too many words and conveys the correct meaning to people with incorrect concepts about what big O means.

ColFrancis · on Jan 17, 2021

Calling it big O is indeed a mistake (especially when you start using it with handwriting, o & O because difficult to distinguish).

> Saying worst case upper bound isn't too many words and conveys the correct meaning to people with incorrect concepts about what big O means

Tricky thing is, that's not what big O says. It's a statement about the asymptotic growth of a function which you can apply to a worst case.

You'd be better off using "the worst case grows as at most". The distinction is important as you are ignoring two crucial things: the non-asymptotic behaviour and the coefficient of the asymptotic behaviour.

WinterHoursTF · on Jan 19, 2021

I know I was just talking about what people always seem to mean by big O. Also it is what O is applied on most of the time.

joe_91 · on Jan 16, 2021

Much better explained than in my computer science lectures! Nice simple examples.

It's such an important topic to grasp too if you're going to end up working with software!

lamename · on Jan 16, 2021

Also a nice quick ref here https://www.bigocheatsheet.com/

adamnemecek · on Jan 16, 2021

Count the number of nested loops. If one of the loops does splitting (like binary search) it’s a O(log n) as opposed to O(n).

That’s literally all there is to it.

creata · on Jan 16, 2021

That's not even close to all there is to it. Complexity analysis is a relatively young field with lots of deceptively simple unsolved problems.

For a simple example of how it's a lot more than that, follow Tarjan's proof of the disjoint set's amortized time complexity[0]. It's not at all obvious, even though the disjoint set is a very simple and practical data structure.

[0]: http://www.e-maxx.ru/bookz/files/dsu/Efficiency%20of%20a%20G...

andi999 · on Jan 16, 2021

So what would be a deceptivly simple unsolved problem?

creata · on Jan 16, 2021

Wikipedia has a list of big-name unsolved problems in complexity theory. Most of these have very simple problem statements.

https://en.wikipedia.org/wiki/List_of_unsolved_problems_in_c...

adamnemecek · on Jan 16, 2021

P == NP on analog quantum computers. A light prism performs a diagonalization which can be used to do factorization in O(1).

rytill · on Jan 16, 2021

Factorization of arbitrarily large numbers? I doubt it.

adamnemecek · on Jan 16, 2021

amelius · on Jan 16, 2021

On constant-size hardware?

And doesn't getting the data in and out already take O(N)?

adamnemecek · on Jan 17, 2021

Are you talking about arbitrary sized or infinite sized?

rytill · on Jan 17, 2021

We’re saying the time, space, or some other metric of the solution scales with the size of the input number in a way that is not constant. The number could be any input (arbitrary).

nkozyra · on Jan 16, 2021

That covers most of the algorithms you'll see in a coding interview test, but there's a ton more complexities outside of search spaces, sorts and nested O(n) loops.

Even some graph operations will leave familiar territory.

dragontamer · on Jan 16, 2021

Strassen algorithm is 7-multiplications for a 2x2 matrix. https://en.wikipedia.org/wiki/Strassen_algorithm

A matrix can always be split into groups of sub-matrix, and the groups of sub-matrix is itself a matrix. Applying Strassen algorithm recursively is therefore O(n^log2(7)) == O(n^2.8ish).

adamnemecek · on Jan 16, 2021

The article covers none of those.

jkaptur · on Jan 16, 2021

What if there are recursive function calls?

pmiller2 · on Jan 16, 2021

Then you get to have fun solving a recurrence relation. :-)

johndough · on Jan 16, 2021

If you don't like "fun", WolframAlpha can solve those for you. :-)

https://www.wolframalpha.com/input/?i=f%281%29+%3D+1%2C+f%28...

pmiller2 · on Jan 16, 2021

Sure, but if you're not on the spot in an interview, the master theorem is simple enough to apply: https://en.wikipedia.org/wiki/Master_theorem_(analysis_of_al...

commandnotfound · on Jan 16, 2021

Thank you, but for the link provided it only says that these are the Fibonacci numbers and nothing more. Yes, you can click on this F(n) (how can you even know this is a link), but anyways.

miguelrochefort · on Jan 16, 2021

branchesᵈᵉᵖᵗʰ

breck · on Jan 16, 2021

One thought I’ve never had: “I wish I hadn’t been so much time thinking about Big O notation, power laws, and orders of magnitudes”

dikaio · on Jan 17, 2021

Beautiful.

alisaus · on Jan 17, 2021

I was a really bad girl. Punish me with your dick in my mouth. - https://adultlove.life