Do we know how LLMs available in OpenLLM and other open source LLMs compare to different versions of GPT models? I know there’s a leaderboard on huggingface: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb... but it doesn’t contain GPT models.
I don’t see how any LLM would help me with a high quality proxy, which is what I actually need in web scraping and I’m using https://scrapingfish.com/ for this.
I initially built a system for web scraping but was constantly running into issues of getting blocked, even when using good quality residential proxies. I had to constantly investigate why I'm getting blocked and update tools. Sometimes the effort was significant when I had to switch to a different framework which was giving me a better success rate.
Then, I switched to web scraping API (I'm using https://scrapingfish.com as they have convenient pricing for my use case, but there are other alternatives). Now I only have to maintain parsing logic in scrapers. It also actually reduced my costs of scraping since I no longer pay for proxies which are more expensive for my scale than a web scraping API.
This looks really cool!
There was a tutorial posted on HN about building mobile proxy pool with RPI that had obvious limitations: https://scrapingfish.com/blog/byo-mobile-proxy-for-web-scrap...
It seems this could be a solution to scale capabilities of a single RPI.
For web scraping, I recommend using a web scraping API, e.g. https://scrapingfish.com. This solves all potential problems with getting blocked and can make data extraction easier as well.
For the app, I've recently started using Remix (https://remix.run) and so far it seems to have been a good choice for me. There is a good integration with Remix in Mantine for front end: https://mantine.dev/guides/remix/. I think it's a good full stack choice if you just want to quickly build an app for your project/product.