Okay, let's admit it - web scraping via Puppeteer and Playwright is the most versatile and flexible way of web scraping nowadays. Unfortunately it's also the most cumbersome, time consuming way of scraping, and sometimes it feels a little bit like voodoo magic. This is a post about my long| Pixeljets
It looks like Cloudflare is using TLS handshake fingerprinting to fight scrapers. Let's see how this can be investigated and mitigated...| Pixeljets
Why AI Products Need Sandboxing Sandboxing has become a core feature of modern AI-powered development tools. As AI coding assistants and autonomous agents become more sophisticated, they generate and execute code that needs to run safely in isolated environments. In my recent Lovable.dev and Bolt.new blog post I| Pixeljets
A few days ago, I took a deep dive into integrating my ScrapeNinja web scrapers into Zapier, Pipedream.com, and Integromat (Make.com) to better understand the market situation among low-code and no-code automation platforms. I wanted to do a simple job: extract some website data in JSON format from| Pixeljets
ScrapeNinja Scraping API recently got an exciting feature called Extractors. Extractors are pieces of user-supplied Javascript code which are executed in ScrapeNinja backend so ScrapeNinja returns pure JSON with data, from any HTML webpage in the world. This feature alone, with ScrapeNinja web-based IDE do write extractors, can shave off| Pixeljets
Okay, so if you're not into building a big startup with investors and splitting equity, you can try going solo and bootstrapped. Will it work for you? I don't know. It works for me, as far as I can tell - and I'm a| Pixeljets
This week, I’m introducing a new project at ScrapeNinja: a recursive web crawler, packed into an n8n community node. It isn’t just another scraper - it’s an advanced, powerful open-source tool that executes in your local n8n instance and can be used to harvest| Pixeljets
I am a big fan of n8n and I am using it for a lot of my projects. I love that it provides a self-hosted version and this self-hosted version is not paywalled like if often happens with so-called "open core" products which just use "open source&| Pixeljets
1.5 years ago, I wrote a blog post sharing my thoughts and experience on using Make.com, Zapier, and Pipedream from my perspective (I recommend reading that piece before continuing here). When exploring these awesome platforms, I was mostly interested in how no-code and low-code products can enhance my| Pixeljets
Duolingo and flashcards get boring quickly, so lately I've been learning French with ChatGPT. Observation #1: ChatGPT for Android got a very good speech-to-text and text-to-speech engine based on Whisper, since fall 2023. It understands what you say well and intones its phrases nicely when it speaks. You| Pixeljets
Let's imagine your product didn't die and managed to gain some real traction (🎉 CONGRATULATIONS!). After a few years it stops being a small and nimble project and turns into something much bigger, involving dozens and hundreds of people. Project lifecycle: MVP -> growth -&| Pixeljets
Introduction As a seasoned developer with a keen interest in web scraping and data extraction, I've often leveraged Python for its simplicity and power. In this realm, understanding and utilizing proxies becomes a necessity, especially to navigate through the complexities of web requests, IP bans, and rate limiting.| Pixeljets
In this article I will describe how to set a proxy in Playwright (Node.js version of Playwright). Playwright is obviously one of the best and most modern solutions to automate browsers in 2024. It uses the CDP protocol to send commands to browsers and supports Chromium, Chrome and Firefox| Pixeljets
When diving into the world of automated browser testing and scraping with Playwright, one of the first decisions you'll encounter is the choice of programming language. Playwright is not a one-language wonder; it caters to a polyglot audience. Let's see how Node.js and Python version| Pixeljets
Blocking unnecessary resources in Playwright is a pretty easy task, thanks to builtin route() function.| Pixeljets
In the ever-evolving world of web scraping, I often come across hurdles that require creative solutions and some quick code workarounds and hacks - and oh boy! this is especially true when I am working with programmatically driven browsers, which I happen to do a lot lately. Today, I'| Pixeljets
Once you're familiar with basic web scraping tools like Scrapy, and you've scraped your first 1-2 websites, you'll probably get your first ban because your IP address has made too many requests (what "too many" means really depends on the site, for| Pixeljets
I extensively use AI tools for coding - primarily Claude Sonnet 3.5 in VS Code Copilot and the OpenAI ChatGPT macOS app (using the 01 and 40 models) as of December 2024. While these tools, which felt groundbreaking just months ago, have become an integral part of my daily| Pixeljets
I'm a big fan of self-hosting. As an indie hacker who has launched several micro-SaaS products and as a CTO of a small company, I now prefer self-hosting all the tools I might need. With the rise of high-quality self-hosted offerings from talented teams using open source as their primary| Pixeljets