Okay, let's admit it - web scraping via Puppeteer and Playwright is the most versatile and flexible way of web scraping nowadays. Unfortunately it's also the most cumbersome, time consuming way of scraping, and sometimes it feels a little bit like voodoo magic. This is a post about my long| Pixeljets
It looks like Cloudflare is using TLS handshake fingerprinting to fight scrapers. Let's see how this can be investigated and mitigated...| Pixeljets
This week, I’m introducing a new project at ScrapeNinja: a recursive web crawler, packed into an n8n community node. It isn’t just another scraper - it’s an advanced, powerful open-source tool that executes in your local n8n instance and can be used to harvest| Pixeljets
I am a big fan of n8n and I am using it for a lot of my projects. I love that it provides a self-hosted version and this self-hosted version is not paywalled like if often happens with so-called "open core" products which just use "open source&| Pixeljets
Introduction As a seasoned developer with a keen interest in web scraping and data extraction, I've often leveraged Python for its simplicity and power. In this realm, understanding and utilizing proxies becomes a necessity, especially to navigate through the complexities of web requests, IP bans, and rate limiting.| Pixeljets
In this article I will describe how to set a proxy in Playwright (Node.js version of Playwright). Playwright is obviously one of the best and most modern solutions to automate browsers in 2024. It uses the CDP protocol to send commands to browsers and supports Chromium, Chrome and Firefox| Pixeljets
When diving into the world of automated browser testing and scraping with Playwright, one of the first decisions you'll encounter is the choice of programming language. Playwright is not a one-language wonder; it caters to a polyglot audience. Let's see how Node.js and Python version| Pixeljets
Blocking unnecessary resources in Playwright is a pretty easy task, thanks to builtin route() function.| Pixeljets
In the ever-evolving world of web scraping, I often come across hurdles that require creative solutions and some quick code workarounds and hacks - and oh boy! this is especially true when I am working with programmatically driven browsers, which I happen to do a lot lately. Today, I'| Pixeljets
Once you're familiar with basic web scraping tools like Scrapy, and you've scraped your first 1-2 websites, you'll probably get your first ban because your IP address has made too many requests (what "too many" means really depends on the site, for| Pixeljets
Mock product pagination page 1 of None category for web scraper testing| Scrapeground