Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The idea/hope is that a video model would answer the car wash problem correctly. There are exactly the kinds of issues you have to solve to avoid teleporting objects around in a video, so whenever we manage more than a couple seconds of coherent video we will have something that understands the real world much better than text-based models. Then we "just" have to somehow make a combined model that has this kind of understanding and can write text and make tool calls

Yes, this is kind of like Tesla promising full self driving in 2016

 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: