This looks interesting. Does anyone knows how it compares to other learning theory books like foundations of machine learning [1] in terms of depth and approachability?
I honestly don't understand why people write these books anymore. Let me explain: there used to be a lot of these kinds survey books that start with linear regression and end at... something classical. I can rattle off a lot of titles (Pattern Recognition and Machine Learning, Elements of Statistical Learning, Intro to Statistical Learning, blah blah blah). They all covered the same material at various levels of sophistication (some of them covered meta theory like PAC learning or shattering dimension or empirical risk minimization or whatever). Some of them took the statistical approach and some of them took the optimization approach. Again: blah blah blah. The synthesis/summary is/was there is no grand unified theory of machine learning and everyone saw that it should be clear.
And then "deep learning" arrived and it became even more obvious that the only thing that matters is data and time spent crunching numbers (more of both and you get better results no matter the model).
Again I just want to be crystal clear, because I'm sure someone will pop in and claim "oh I still use SVM to pick my family's shopping list": no professional ML engineer/team/org today that ships and ML product "at scale" gives a fuck about SVMs or graphical models or bayes nets or kernel methods. No one. So who cares about all this sophistry? What value is it to learn concentration inequalities - training goes brrr no matter what if you have enough data. And if you don't, if you're really building a model to predict your family's shopping list, I encourage to reflect on whether it would be simpler to just ask your family what they want for dinner instead.
My 2 cents: teach people/students useful things instead of this stuff. They'll be happier and you'll feel more fulfilled (even though you didn't get flex your big math brain).
Maybe the book it’s just not for you. It doesn’t mean it’s not for anyone.
I understand that deep learning is all in vogue now. But when I was in graduate school, a professor asked me why I was using neural nets in a project since it was not as good as SVMs. We used to study Vapnik and VC dimensions, SVMs etc. and neural nets were totally out of fashion.
Imagine what would have happened if everybody were using and researching only those methods because they worked better. And deep learning could benefit from a theory that explains why, when and how it works so well. Maybe someone working on this could develop on it to include it.
Also I don’t think you’re right to assume that all models out there are deep learning models. Yes they are very good for many cases (specially those with less structured data, like image or nlp). But in some cases gradient boosting or even GLMs are better suited for the task (because of the structure and size of the data or because of computing restrictions).
And in the end, people can just want to learn it because they find it interesting.
It’s a bit sad to do only things that are “useful”. That’s my 2 cents.
Some of these “ML” methods have applications outside of what you’d think of as ML. My background is in control theory, which relies on guarantees which you just can’t get from neural nets. Skimming through the outline, there are tons of methods here which are used in controls and estimation — certainly they’re still useful
> no professional ML engineer/team/org today that ships and ML product "at scale" gives a fuck about SVMs or graphical models or bayes nets or kernel methods.
There’s a reason why “AI is just statistics” became a meme: a lot of places do use textbook machine learning techniques and dress it up as AI. Yes deep learning will win with enough data but few companies have that luxury.
> This book is not for you... you might want to look out for one in the "for dummies" series.
My guy I learned this material from a healthy mix of ESL and Casella Berger and Billingsley. I could still, to this day, probably do every proof in this book without reviewing the material. And yet, despite all that training, I still argue this book is not useful for absolutely anything except assigning homework problems and setting exams.
"Why yet another book on learning theory? ...the main reason is that I felt that the current trend in the mathematical analysis of machine learning was leading to overly complicated arguments and results that are often not relevant to practitioners. Therefore, my aim was to propose the simplest formulations that can be derived from first principles, trying to remain rigorous without overwhelming readers with more powerful results that require too much mathematical sophistication."
From my own reading and experience on the mathematical analysis approach of this "training goes brrr" approach, I thought the material in Chapter 12, Overparameterized Models, was interesting and coherent with 12.2.4 Linear Regression with Gaussian Projections being an especially elegant explanation. It would be interesting to hear if you had read/skimmed/purused this section and found it wanting etc.
The pursuit of knowledge is not a linear path. The reason you benefit from deep learning now is because a few people in the past believed neural networks had a future despite not working as well as other techniques such as SVMs.
Discovering knowledge and using the knowledge that works best are very different.
Your argument remind me from this lecture from Feynman. Quoting him: "...and every theoretical physicist that is any good knows 6 or 7 different theoretical representations for exactly the same physics and knows that they are equivalent... but he keeps them in his head hoping that they'll give him different ideas for guessing."
This book doesn't seem massively different from several other existing textbooks. There are also several good textbooks on deep learning specifically (I'd recommend the new Bishop)
This textbook is hardly irrelevant for people who only care about deep learning though. It covers regularisation, optimisation, overparameterised models, double descent and err, neural networks. Sounds pretty relevant to me?
If you think the rest of the book is irrelevant then skip it.
You sound a bit nutty when you confidently state nobody uses any of the other methods in this book. How could you possibly know that?
You seem to know a lot about this area. I do not but I've heard explaining what deep learning models do is a black-box? If you work in a "misson critical" you'd have to explain all the math behind the model. Let's say in healthcare, finance, aviation, etc.
Also, the "big math brain"'s you're talking about probably read all the books your shutting down. I'd say their big math brains are the reason we have LLMs today.
Vouch for this approach ... not just in ml/dl but as a general way of learning things in life.
resources like this are useful, in an academic setting. lets not forget we all forget 50% or more of what we learn about 20 minutes afterwards unless we consistently remind ourselves of what we learn.
unless others prove me wrong using personal anecdotes.
This feels boring because students can't connect with bigger picture. We still need to learn fundamentals but must be connected to actual product. Or else people will forget after leaving university.
I see a big danger in that line of thinking. It was (in part) what delegated NNs, that you ponder so high, to the basement for a long time, as many people thought “lets learn expert systems, that is what ships, nobody cares about NNs”
I would like to learn all, specially thing that lead to the current status.
[1] https://cs.nyu.edu/~mohri/mlbook/
reply