Hacker Newsnew | past | comments | ask | show | jobs | submit | ariel-faigon's commentslogin

If you quote "Veryfi" in the search box, it is not considered a misspelling by google, and top results seems to work as expected :)


Good eye. May be random noise since most daily deltas in the data-set are so small. See comments about over-fitting and noise elsewhere in the thread. Also may be a veggie-style (without the bun) burger. It was over 2 years ago, so I don't remember. I plan to write a separate machine-learning focused doc on this exploratory experiment when I get more time.


This is pretty hilarious. Thanks for sharing. It may be showing how much noise there is in a random data-set with too few data-points to draw statistically significant conclusions. It might also mean that you're on the right track and now that you have the tool, you can keep going until you find real enlightenment.

I really like your comment because you're one of the very few that actually tried the idea. Thats the #1 reason I put this on github, to help others conduct their own experiments on whatever they care about. If you ignore all my weight-loss journey story and data and just use some of these ideas to improve your own life, it was all worth it for me.


Wow. This is a beautiful and data-rich chart. Thanks for sharing.

Would be great if you could put this on github as well, and add some explanations to all the details in README.md.


"He" is me :)

Sorry, I've never read that book so rest assured, I'm not promoting it. Please note that the link is not to the book, it is a just a google search for the term "the truth about statins". Perhaps some sponsored ad for the book was added by google? If so, my sincere apologies for the unintended consequences.

Here's my personal experience with statins, I may be wrong here, but I'm following my compass and am open to be proven otherwise:

Doctor: "Your 'bad cholesterol' is borderline, I want you to take these statins". Side observation: when the 'Lipitor' patent expired and it became a cheap generic drug, the suggestion turned into 'Crestor' which I learned has a bit longer effective half-life, and way higher price. Me, researching the subject while adopting a different diet: "Hmm statins would have taken me _maybe_ 3% lower and here I am 20% lower after a year of a simple, self-studied, diet change. Maybe there are better ways to lower the so called 'bad cholesterol'? Further study: there's always a new statin the moment patent expires: check out: Compactin -> Simvastatin -> Fluvastatin -> Cerivastatin -> Atorvastatin -> Rosuvastatin on wikipedia (these are chemical names, not brand names, the last two are the brands: "Lipitor" and "Crestor")

So I don't know. I'm 100% sure all my doctors are well meaning and caring and I have nothing against them, but my confidence in these health suggestions, and in all the research that is funded by big-pharma, and in the new great statin of the era while america keeps getting obese and less healthy, is, how can I put it? a bit shaken.

No zealotry at all. Just prove me wrong, and I'll change my view.


Thanks so much for all the excellent comments. There was definitely an over-fit with 4-passes.

No more. I've updated the Makefile to run only one pass, changed the options so it runs with older-version vw, Fixed misspellings of 'gioza', removed 'mayo' which found itself on the wrong side because it appeared only twice and always alongside the bun and regenerated the chart.

All the main conclusions remain intact.

In the end, I urge everyone to use their own data, that was the main purpose of sharing this code. My data-set is small, awfully noisy and insufficient. There are no p-values and no rigorous statistics, so please don't read too much into the minute details. It is the discovery journey into the top factors that is the important part, in my view. The ML was just one aid in this discovery process. The proof for me was my actual, and sustainable, weight loss that came after (very slowly) realizing the top factors that eventually worked for me. Thanks again.


I don't think it matters whether you run 4 passes or 1 pass, it's still going to overfit. You can run an online linear regression in a single pass too, but that doesn't magick away the uncertainties. The results are still going to be garbage, and any effects you get are due to your health-consciousness and not any specific dietary choices you make (how could it be, when the data is so weak and noisy that each item can easily flip signs?).


Thanks so much. Your comments are really helpful.

I realized early on that the data is hopelessly noisy, due to the small daily changes and the scales resolution so rather than trying to build a perfect model to gauge the variable importance of each and every kind of food, I focused on the few days when weight change was more significant hoping I could detect some signal in those, and extrapolate and further explore from that. That's why I sorted the data-set by abs(delta) and that's what consistently pointed me towards sleep/fasting as the #1 factor. I do agree that the full list/model is garbage in the sense that probably 80% or so of it is woefully inaccurate/flipped, noisy, overfitted etc. The main point was to lead me in the right direction by looking at the big picture and what stood out.

And what stood out were 2 things 1) sleep (fasting duration), and 2) fat vs carbs. I think everything else should be ignored. I think we're in total agreement on this point.

Does this sound more sensible to you?


Is it possible weight loss made you sleepy?


Thanks for the comment. You're absolutely right. I stand corrected and updated the README accordingly.


Thanks mbrundle. I'm the person behind that git repository and honestly am in a bit of a shock that this is making hacker-news.

As I say in the README.md: please ignore the noise, the scales I used had 0.2 pound resolution, and my data-set was too small (and as one snarky commenter noticed, some words were misspelled). What is important is the big picture. There are actually numerous contradictions and irregularities in the data. In particular, any food item that appears only once or twice in the data-set, and is randomly coinciding with other features that make it biased the wrong way contributes to the error of the model.

So as I say in the README, I would ignore anything that's not near the top or bottom, and even those should be taken with a healthy dose of (noise/modeling) skepticism.

Anyway, the code is free for everyone to use so people are encouraged to run the experiment on themselves using more accurate methods and contributing more data. It only requires R+ggplot2 and vowpal wabbit. Cheers.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: