Scrapling Guide: Python Web Scraping That Bypasses Cloudflare (2026)

2026-06-02 Scrapling Python Web Scraping Cloudflare

What is Scrapling?

Scrapling is an adaptive Python web scraping framework created by developer D4Vinci. It quickly gained popularity on GitHub Trending in 2026, earning over 57,000 Stars, becoming a representative of the new generation of scraping tools.

Why Do You Need Scrapling?

Traditional crawler frameworks (like Scrapy, BeautifulSoup) face three major pain points:

Website structure changes break code: CSS selectors or XPath crash as soon as the page is redesigned
Anti-bot systems block requests: Protections like Cloudflare Turnstile and Akamai return 403 directly for ordinary requests
Poor scalability: Scaling from small-scale scraping to large-scale concurrent crawling requires rewriting lots of code

Scrapling's core design philosophy is "One library, zero compromises":

Adaptive Parser: Automatically learns website structure and re-locates elements after page updates
Built-in Anti-blocking: Works out of the box to bypass mainstream protections like Cloudflare Turnstile
Scrapy-like API: Developers familiar with Scrapy can migrate seamlessly
From Single Request to Full Crawling: The same API supports simple fetching and large-scale concurrent crawling

Scrapling vs Traditional Scraping Tools

Feature	BeautifulSoup	Scrapy	Scrapling
Learning Curve	Low	Medium	Medium
Adaptive Parsing	❌	❌	✅
Bypass Cloudflare	❌	Needs Plugin	✅ Built-in
Concurrent Crawling	❌	✅	✅
Dynamic Page Support	❌	Needs Middleware	✅ Built-in Playwright
Pause/Resume	❌	Needs Extension	✅ Built-in
Streaming Output	❌	❌	✅

Installing Scrapling

Requirements

Python 3.8+
pip package manager

Quick Install

pip install scrapling

Verify Installation

from scrapling.fetchers import Fetcher

# Test basic functionality
p = Fetcher.fetch('https://example.com')
print(p.title)  # Outputs page title

Optional Dependencies

If you need dynamic page rendering (JavaScript websites), install Playwright:

pip install playwright
playwright install chromium

Getting Started: Your First Crawler

Example 1: Simple Page Scraping

Let's start with a simple example—scraping headlines from Hacker News.

from scrapling.fetchers import Fetcher

# Send HTTP request
page = Fetcher.fetch('https://news.ycombinator.com/')

# Use CSS selector to extract data
stories = page.css('.titleline > a')

for story in stories[:5]:  # Take only the first 5
    title = story.text
    link = story.attrs.get('href', '')
    print(f"Title: {title}")
    print(f"Link: {link}")
    print("-" * 40)

Sample Output:

Title: Show HN: I built a real-time code collaboration tool
Link: https://github.com/example/collab-tool
----------------------------------------
Title: Ask HN: What's your favorite Python library in 2026?
Link: https://news.ycombinator.com/item?id=123456
----------------------------------------

Example 2: Adaptive Parsing (Auto-save)

Scrapling's core feature is adaptive parsing. When you first scrape data, you can enable auto_save=True, and Scrapling will learn the page structure and save features. When the website redesigns, just pass adaptive=True, and it will automatically find the target elements.

from scrapling.fetchers import Fetcher

page = Fetcher.fetch('https://quotes.toscrape.com/')

# First scrape: enable auto_save
quotes = page.css('.quote', auto_save=True)

for quote in quotes[:3]:
    text = quote.css('.text::text').get()
    author = quote.css('.author::text').get()
    print(f"{text} — {author}")

If the website structure changes:

# Subsequent scrapes: pass adaptive=True
page = Fetcher.fetch('https://quotes.toscrape.com/')
quotes = page.css('.quote', adaptive=True)  # Automatically adapts to new structure!

for quote in quotes[:3]:
    text = quote.css('.text::text').get()
    author = quote.css('.author::text').get()
    print(f"{text} — {author}")

💡 How it works: Scrapling records multiple features of an element (tag type, nearby text, attribute patterns, etc.). Even if CSS class names change, it can re-locate the element through other features.

Advanced Features

1. Bypassing Cloudflare Turnstile

Many websites use Cloudflare Turnstile or other anti-bot systems. Scrapling's StealthyFetcher can handle these protections automatically.

from scrapling.fetchers import StealthyFetcher

# Enable adaptive mode
StealthyFetcher.adaptive = True

# Automatically bypass Cloudflare
page = StealthyFetcher.fetch(
    'https://example-protected-site.com',
    headless=True,       # Headless browser mode
    network_idle=True    # Wait for network idle
)

# Extract data normally
products = page.css('.product-item')
for product in products:
    name = product.css('.name::text').get()
    price = product.css('.price::text').get()
    print(f"{name}: {price}")

Key Parameter Explanation: - headless=True: Uses headless browser to simulate real user behavior - network_idle=True: Waits for all network activity to complete before extracting (suitable for SPA apps) - adaptive=True: Enables adaptive parsing

2. Dynamic Page Rendering (Playwright)

For websites requiring JavaScript rendering, use DynamicFetcher.

from scrapling.fetchers import DynamicFetcher

page = DynamicFetcher.fetch(
    'https://spa-example.com',
    wait_for='.content-loaded',  # Wait for specific element to appear
    timeout=30000                # Timeout in milliseconds
)

# Extract dynamically loaded content
articles = page.css('article')
for article in articles:
    title = article.css('h2::text').get()
    summary = article.css('.summary::text').get()
    print(f"{title}\n{summary}\n")

3. Asynchronous Concurrent Scraping

Use AsyncFetcher for high-concurrency scraping.

import asyncio
from scrapling.fetchers import AsyncFetcher

async def fetch_multiple_pages():
    urls = [
        'https://example.com/page/1',
        'https://example.com/page/2',
        'https://example.com/page/3',
        'https://example.com/page/4',
        'https://example.com/page/5',
    ]

    # Send requests concurrently
    pages = await AsyncFetcher.fetch_many(urls, concurrency=3)

    for url, page in zip(urls, pages):
        if page:
            title = page.title
            print(f"{url}: {title}")
        else:
            print(f"{url}: Request failed")

asyncio.run(fetch_multiple_pages())

Spider Framework: Large-Scale Crawling

Scrapling provides a Scrapy-like Spider framework that supports large-scale concurrent crawling.

Basic Spider

from scrapling.spiders import Spider, Response

class QuoteSpider(Spider):
    name = "quotes"
    start_urls = ["https://quotes.toscrape.com/"]

    async def parse(self, response: Response):
        # Extract quotes from current page
        for quote in response.css('.quote'):
            yield {
                "text": quote.css('.text::text').get(),
                "author": quote.css('.author::text').get(),
                "tags": quote.css('.tag::text').getall()
            }

        # Go to next page
        next_page = response.css('.next a::attr(href)').get()
        if next_page:
            yield self.follow(next_page, callback=self.parse)

# Start the spider
QuoteSpider().start()

Concurrency Configuration

class MultiPageSpider(Spider):
    name = "multi-page"
    start_urls = [f"https://example.com/page/{i}" for i in range(1, 101)]

    # Configure concurrency
    custom_settings = {
        "concurrency": 5,           # Max concurrency
        "download_delay": 1,        # Download delay in seconds
        "robots_txt_obey": True,    # Obey robots.txt
    }

    async def parse(self, response: Response):
        title = response.css('h1::text').get()
        yield {"url": response.url, "title": title}

MultiPageSpider().start()

Pause and Resume

Scrapling supports checkpoint-based persistence. After gracefully exiting with Ctrl+C, it will automatically resume on the next start.

class LongRunningSpider(Spider):
    name = "long-crawl"
    start_urls = ["https://large-site.com/"]

    custom_settings = {
        "checkpoint_dir": "./checkpoints",  # Checkpoint directory
    }

    async def parse(self, response: Response):
        # Extract data...
        yield {"data": "..."}

        # Continue crawling
        for link in response.css('a::attr(href)').getall():
            yield self.follow(link, callback=self.parse)

LongRunningSpider().start()

Real-World Examples

from scrapling.fetchers import Fetcher

def scrape_github_trending():
    page = Fetcher.fetch('https://github.com/trending')

    repos = page.css('.Box-row')

    trending = []
    for repo in repos[:10]:
        name = repo.css('h2 a::text').get('').strip()
        description = repo.css('p.col-9::text').get('').strip()
        stars = repo.css('[href$=stargazers] span::text').get('').strip()
        language = repo.css('[itemprop=programmingLanguage]::text').get('').strip()

        trending.append({
            "name": name,
            "description": description,
            "stars": stars,
            "language": language
        })

    return trending

if __name__ == "__main__":
    results = scrape_github_trending()
    for repo in results:
        print(f"📦 {repo['name']}")
        print(f"   {repo['description'][:80]}...")
        print(f"   ⭐ {repo['stars']} | 📝 {repo['language']}")
        print()

Case 2: E-commerce Price Monitoring

from scrapling.fetchers import StealthyFetcher
import json
from datetime import datetime

def monitor_prices():
    urls = [
        "https://amazon.com/dp/B08N5WRWNW",
        "https://amazon.com/dp/B0BSHF7WHW",
        "https://amazon.com/dp/B09G9FPHY6",
    ]

    results = []

    for url in urls:
        page = StealthyFetcher.fetch(url, headless=True)

        title = page.css('#productTitle::text').get('').strip()
        price = page.css('.a-price .a-offscreen::text').get('').strip()

        results.append({
            "url": url,
            "title": title,
            "price": price,
            "timestamp": datetime.now().isoformat()
        })

        print(f"✅ {title[:50]}... - {price}")

    # Save to JSON
    with open('price_monitor.json', 'w', encoding='utf-8') as f:
        json.dump(results, f, ensure_ascii=False, indent=2)

    print(f"\n📊 Saved {len(results)} price records to price_monitor.json")

if __name__ == "__main__":
    monitor_prices()

Case 3: Streaming Output (Real-time Processing)

For long-running crawlers, you can use streaming mode to process data in real-time.

from scrapling.spiders import Spider, Response

class StreamingSpider(Spider):
    name = "streaming-demo"
    start_urls = ["https://quotes.toscrape.com/"]

    async def parse(self, response: Response):
        for quote in response.css('.quote'):
            item = {
                "text": quote.css('.text::text').get(),
                "author": quote.css('.author::text').get()
            }
            yield item  # Yield immediately, no need to wait for completion

        next_page = response.css('.next a::attr(href)').get()
        if next_page:
            yield self.follow(next_page, callback=self.parse)

# Stream consumption
async def main():
    spider = StreamingSpider()

    async for item in spider.stream():
        # Process each item in real-time
        print(f"Received: {item['text'][:50]}... by {item['author']}")
        # Can immediately store in database, send to message queue, etc.

import asyncio
asyncio.run(main())

Advanced Tips

1. Proxy Rotation

from scrapling.spiders import Spider, Response

class ProxySpider(Spider):
    name = "proxy-spider"
    start_urls = ["https://httpbin.org/ip"]

    custom_settings = {
        "proxy_list": [
            "http://proxy1.example.com:8080",
            "http://proxy2.example.com:8080",
            "http://proxy3.example.com:8080",
        ],
        "proxy_rotation": "per_request",  # Rotate proxy per request
    }

    async def parse(self, response: Response):
        ip = response.json().get('origin')
        print(f"Current IP: {ip}")

2. Custom Export Pipeline

from scrapling.spiders import Spider, Response
import csv

class CSVSpider(Spider):
    name = "csv-export"
    start_urls = ["https://quotes.toscrape.com/"]

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.csv_file = open('quotes.csv', 'w', newline='', encoding='utf-8')
        self.writer = csv.writer(self.csv_file)
        self.writer.writerow(['Text', 'Author', 'Tags'])

    async def parse(self, response: Response):
        for quote in response.css('.quote'):
            text = quote.css('.text::text').get()
            author = quote.css('.author::text').get()
            tags = ', '.join(quote.css('.tag::text').getall())

            self.writer.writerow([text, author, tags])

        next_page = response.css('.next a::attr(href)').get()
        if next_page:
            yield self.follow(next_page, callback=self.parse)

    def close(self, reason):
        self.csv_file.close()
        print(f"✅ Data saved to quotes.csv")

3. Development Mode (Cache Responses)

When debugging parsing logic, avoid repeated requests to the server.

from scrapling.spiders import Spider, Response

class DevSpider(Spider):
    name = "dev-mode"
    start_urls = ["https://example.com"]

    custom_settings = {
        "dev_mode": True,              # Enable dev mode
        "cache_dir": "./http_cache",   # Cache directory
    }

    async def parse(self, response: Response):
        # First run: cache response to disk
        # Subsequent runs: read directly from disk, no network request needed
        title = response.css('h1::text').get()
        yield {"title": title}

FAQ

Q1: What's the difference between Scrapling and Scrapy?

A: Scrapling can be seen as a modernized enhanced version of Scrapy: - Adaptive Parsing: Scrapy's selectors fail after page redesigns, while Scrapling adapts automatically - Built-in Anti-blocking: Scrapy needs extra middleware to bypass Cloudflare, Scrapling works out of the box - Simpler API: Scrapling's Fetcher API is better suited for small-scale quick scraping

If you're already familiar with Scrapy, migrating to Scrapling has almost no learning cost.

A: Use StealthyFetcher or DynamicFetcher to simulate login:

from scrapling.fetchers import DynamicFetcher

page = DynamicFetcher.fetch(
    'https://example.com/login',
    headless=True,
    wait_for='#dashboard'  # Wait for redirect after login
)

# Perform login operation (via Playwright)
page.page.fill('#username', 'your_username')
page.page.fill('#password', 'your_password')
page.page.click('#login-button')
page.page.wait_for_selector('#dashboard')

# Now you can scrape post-login content
data = page.css('.private-data::text').getall()

Q3: How to limit crawling speed to avoid being blocked?

A: Configure download delay and concurrency limits in the Spider:

class PoliteSpider(Spider):
    custom_settings = {
        "concurrency": 2,           # Lower concurrency
        "download_delay": 2,        # 2-second interval between requests
        "robots_txt_obey": True,    # Obey robots.txt
    }

Q4: What selectors does Scrapling support?

A: Supports both CSS selectors and XPath:

# CSS Selector
page.css('.class-name::text').get()
page.css('#id-name').getall()

# XPath
page.xpath('//div[@class="example"]/text()').get()

Summary

Scrapling is the most noteworthy Python crawler framework of 2026. It perfectly combines adaptive parsing, anti-blocking capabilities, and Scrapy-like API, allowing developers to implement the most stable crawlers with minimal code.

Core Advantages Recap: - ✅ Adaptive Parser: Automatically re-locates elements after page redesigns - ✅ Built-in Anti-blocking: Out-of-the-box bypass for Cloudflare Turnstile - ✅ From Single Request to Full Crawling: Same API covers all scenarios - ✅ Concurrency, Pause/Resume, Streaming Output: Production-grade features included

Resource Links: - GitHub Repository - Official Documentation - Discord Community

If you found this article helpful, feel free to share it with more developers!

Scrapling: adaptive Python crawler with anti-bot evasion. Bypass Cloudflare, auto-adapt to changes, Scrapy-like API. Full tutorial with code examples.