Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I may regret posting this ....

But this is what you probably want to do:

user@host:/path/to/EssentialMixRip09.04.10$ wget -rl 0 -np -A .mp3 http://thenine.ca/essential/



As a heads up for anyone who doesn't realize how much music this is, that wget will slurp >100gb of stuff (or would, if it weren't for sever load).


Using Coral CDN:

  wget -rl 0 -np -A .mp3 http://thenine.ca.nyud.net/essential/


Nogood, checkout the robots.txt


Since Coral considers itself a distributed caching proxy rather than a crawler, it doesn't respect robots.txt. It does respect the relevant cache-control headers, like "no-cache", but this site doesn't appear to set them.

However, the main problem here is that Coral doesn't cache any files >50mb, so almost none of these are cached. The few <50mb do seem to be, though, e.g.: http://thenine.ca.nyud.net/essential/1998/1998.01.01%20-%20E...


Yup, thenine.ca is blocking coral cache for .mp3 :(


-e robots=off

However aren't they copyrighted?!


Or install extension.fm :)




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: