There isn't any deduplication, although that will hopefully be less of an issue at this point since there's a limited number of repositories in the index.
There must be something else or something wrong, because you indexed one of my small repo (~100 stars, ~20 forks, ~20Mb) and not the bigger ones (~500 stars, ~100/150 forks, ~150Mb)
Thanks! It's built on top of Solr. It fetches the repos from GitHub - it should pick up any updates to repos within a few days. It's running on a couple servers with 20 cores each, which is not really enough for the traffic it's getting right now.