I disagree wrt Mitchell; too dry and out-of-date. In its place I'd recommend the Norvig book which covers everything Mitchell does (IIRC) except 'Probably Approximately Correct' (PAC) learning. And, it's more up-to-date (Support Vector Machines) and is much more readable. Also, I haven't read it except a flip-through at a bookstore, but Christopher Bishop's one I'd look at (perhaps the Amazon comments).
I've seen the Norvig book recommended by others on this site for ML, but I don't understand why: the Norvig book is an AI book, not an ML book. It only has a few chapters on learning, and IIRC much of that is on reinforcement learning. Obviously the Norvig book is very well-written and is good background for learning about ML, but I don't think it is a sufficient ML book as such.
As far as ML goes, I found the "Programming Collective Intelligence" book to very readable and practical, but very light on the theoretical foundations (which is intentional, of course). I've got a copy of the Witten ML book ("Data Mining: Practical Machine Learning Tools and Techniques"), but to be honest I haven't gotten much from it yet, either: it doesn't seem to discuss SVMs in any detail, nor random forests or neural networks. But I haven't really dug into it yet.
This is kind of moot, truthfully I'd go, now (or look first at, but they're both getting praise so...) for Chris Bishop or Ethem Alpaydin's new books. But between Norvig and Mitchell, Norvig has 138 pages on learning vs Mitchell's ~390, but, Mitchell's is from a different era, Norvig is easier to read - larger pages, more diagrams, better writing, and you know where you are better - and fresher material. To each his own. But like I say, I'd probably go with one of the even newer ones. For a while Mitchell was all there was, then Norvig came along, and now there're a few to choose from.
If you can read and understand Mitchell's book, you will have a very good foundation for understanding modern ML techniques. The poster was looking for references to introduce him to the field.
I'll second this recommendation - I bought the printed copy, and I'm constantly going to it for reference. The fact that it's available for free is just an added bonus.
This is a great book, but it's definitely not introductory. I think Segaran's "Programming Collective Intelligence", mentioned above, is the best first bet, primarily because it's fun. And the code is good.
Check out the lecture notes for CS229, Stanford's class on Machine Learning. The class has no textbook, so the notes are fairly comprehensive. (Though the math notation can get a bit intense...)
Also, the notes assume a reasonable knowledge of probability theory and linear algebra. If you're unsure, you might do well to review those topics before approaching machine learning (or at least keep good references handy). (Edit: the link above has Section Notes that review probability theory, linear algebra, and convex optimization. You might find those useful.)
And it's not an easy course. Don't be discouraged if the problem sets seem impossible. (They very nearly are.)
I stopped reading on page 3, because it seems to me that the "physical grounding hypothesis" is complete bullshit. Where is the "physical grounding" aspect if I want to build a search engine for the web? It's all just information.
Not saying that getting physical can not yield interesting results, but to state that it is the only possible way to program intelligence seems just wrong.
Start by reading and completely understanding a few of the simpler machine learning algorithm (Backpropagation neural networks for example), how&why they work.
and continue from there.
Data Mining: Practical Machine Learning Tools and Techniques (Second Edition) http://www.cs.waikato.ac.nz/~ml/weka/book.html
Which goes nicely with the Weka open source ML toolkit http://www.cs.waikato.ac.nz/ml/weka/
(although it is a good read without the toolkit)
If you want a bit more math, I really like the recent (Oct 2007) book:
Pattern Recognition and Machine Learning by Christopher M. Bishop http://www.amazon.com/Pattern-Recognition-Learning-Informati...
It is nicely self contained, going through all the stats you'll need.