My experience trying to scrape Google Maps with no code
A few months back I was working on a project to help founders that sell to SMBs get better quality leads (Current solutions like Zoominfo and Apollo don’t do very well for the SMB market). Of course, I wanted to do this as quickly as possible with as little code as possible.
We found that people were manually going through Google Maps to find SMBs. They would use the search and manually type in the businesses they were looking for. For example, they would type “restaurants” and manually call/email them.
What we decided to do was gather the Google Maps data autonomously and surface that to our customers so they could download it. The problem was that we would need a bunch of data from Google Maps to pull it off. We would need to grab all the SMBs across the United States which is a huge undertaking.
Initially, I tried no-code AI web scraping solutions and they worked horribly. For some reason, I couldn’t even get them to scroll down on the page. I was also able to reverse engineer their open-source code and discover that they were taking the entire web page and passing it into GPT to extract data. That just burned my Openai bill.
I then tried the semi-code approach where I would use something like Apify or Google Places API to scrape the businesses. This worked better but still, there was an issue of price at the scale we wanted.
Eventually, we ended up writing our scraper for the task. The main problem came after writing the scraper and having to parallelize and concurrently run the processes on a server (The scraping task was so large that we had to parallelize for time). Battling with different infrastructure cost us weeks of time.
This experience was so horrible we ended up creating potarix.com. It's a no code scraper. Simply type in the url you want to scrape and briefly describe your scraping task and our AI will generate a script. We’ve also made it super easy to control the infrastructure and specify how many processes you want running, provide login information, and bypass captchas.
We also understand AI is shit and doesn’t work a lot of times, so depending on your task we’re also creating a white glove onboarding service, so we’ll work in conjunction with the AI to complete a data extraction task for you.
The spider-rs GitHub project has concurrent chunking of AI while trimming the context of unwanted things html elements, css, and etc. You should check out the project. https://github.com/spider-rs