What is Scrapling?

Scrapling is an adaptive Python web scraping framework created by developer D4Vinci. It quickly gained popularity on GitHub Trending in 2026, earning over 57,000 Stars, becoming a representative of the new generation of scraping tools.

Why Do You Need Scrapling?

Traditional crawler frameworks (like Scrapy, BeautifulSoup) face three major pain points:

  1. Website structure changes break code: CSS selectors or XPath crash as soon as the page is redesigned
  2. Anti-bot systems block requests: Protections like Cloudflare Turnstile and Akamai return 403 directly for ordinary requests
  3. Poor scalability: Scaling from small-scale scraping to large-scale concurrent crawling requires rewriting lots of code

Scrapling's core design philosophy is "One library, zero compromises":

  • Adaptive Parser: Automatically learns website structure and re-locates elements after page updates
  • Built-in Anti-blocking: Works out of the box to bypass mainstream protections like Cloudflare Turnstile
  • Scrapy-like API: Developers familiar with Scrapy can migrate seamlessly
  • From Single Request to Full Crawling: The same API supports simple fetching and large-scale concurrent crawling

Scrapling vs Traditional Scraping Tools

Feature BeautifulSoup Scrapy Scrapling
Learning Curve Low Medium Medium
Adaptive Parsing
Bypass Cloudflare Needs Plugin ✅ Built-in
Concurrent Crawling
Dynamic Page Support Needs Middleware ✅ Built-in Playwright
Pause/Resume Needs Extension ✅ Built-in
Streaming Output

Installing Scrapling

Requirements

  • Python 3.8+
  • pip package manager

Quick Install

pip install scrapling

Verify Installation

from scrapling.fetchers import Fetcher

# Test basic functionality
p = Fetcher.fetch('https://example.com')
print(p.title)  # Outputs page title

Optional Dependencies

If you need dynamic page rendering (JavaScript websites), install Playwright:

pip install playwright
playwright install chromium

Getting Started: Your First Crawler

Example 1: Simple Page Scraping

Let's start with a simple example—scraping headlines from Hacker News.

from scrapling.fetchers import Fetcher

# Send HTTP request
page = Fetcher.fetch('https://news.ycombinator.com/')

# Use CSS selector to extract data
stories = page.css('.titleline > a')

for story in stories[:5]:  # Take only the first 5
    title = story.text
    link = story.attrs.get('href', '')
    print(f"Title: {title}")
    print(f"Link: {link}")
    print("-" * 40)

Sample Output:

Title: Show HN: I built a real-time code collaboration tool
Link: https://github.com/example/collab-tool
----------------------------------------
Title: Ask HN: What's your favorite Python library in 2026?
Link: https://news.ycombinator.com/item?id=123456
----------------------------------------

Example 2: Adaptive Parsing (Auto-save)

Scrapling's core feature is adaptive parsing. When you first scrape data, you can enable auto_save=True, and Scrapling will learn the page structure and save features. When the website redesigns, just pass adaptive=True, and it will automatically find the target elements.

from scrapling.fetchers import Fetcher

page = Fetcher.fetch('https://quotes.toscrape.com/')

# First scrape: enable auto_save
quotes = page.css('.quote', auto_save=True)

for quote in quotes[:3]:
    text = quote.css('.text::text').get()
    author = quote.css('.author::text').get()
    print(f"{text}{author}")

If the website structure changes:

# Subsequent scrapes: pass adaptive=True
page = Fetcher.fetch('https://quotes.toscrape.com/')
quotes = page.css('.quote', adaptive=True)  # Automatically adapts to new structure!

for quote in quotes[:3]:
    text = quote.css('.text::text').get()
    author = quote.css('.author::text').get()
    print(f"{text}{author}")

💡 How it works: Scrapling records multiple features of an element (tag type, nearby text, attribute patterns, etc.). Even if CSS class names change, it can re-locate the element through other features.


Advanced Features

1. Bypassing Cloudflare Turnstile

Many websites use Cloudflare Turnstile or other anti-bot systems. Scrapling's StealthyFetcher can handle these protections automatically.

from scrapling.fetchers import StealthyFetcher

# Enable adaptive mode
StealthyFetcher.adaptive = True

# Automatically bypass Cloudflare
page = StealthyFetcher.fetch(
    'https://example-protected-site.com',
    headless=True,       # Headless browser mode
    network_idle=True    # Wait for network idle
)

# Extract data normally
products = page.css('.product-item')
for product in products:
    name = product.css('.name::text').get()
    price = product.css('.price::text').get()
    print(f"{name}: {price}")

Key Parameter Explanation: - headless=True: Uses headless browser to simulate real user behavior - network_idle=True: Waits for all network activity to complete before extracting (suitable for SPA apps) - adaptive=True: Enables adaptive parsing

2. Dynamic Page Rendering (Playwright)

For websites requiring JavaScript rendering, use DynamicFetcher.

from scrapling.fetchers import DynamicFetcher

page = DynamicFetcher.fetch(
    'https://spa-example.com',
    wait_for='.content-loaded',  # Wait for specific element to appear
    timeout=30000                # Timeout in milliseconds
)

# Extract dynamically loaded content
articles = page.css('article')
for article in articles:
    title = article.css('h2::text').get()
    summary = article.css('.summary::text').get()
    print(f"{title}\n{summary}\n")

3. Asynchronous Concurrent Scraping

Use AsyncFetcher for high-concurrency scraping.

import asyncio
from scrapling.fetchers import AsyncFetcher

async def fetch_multiple_pages():
    urls = [
        'https://example.com/page/1',
        'https://example.com/page/2',
        'https://example.com/page/3',
        'https://example.com/page/4',
        'https://example.com/page/5',
    ]

    # Send requests concurrently
    pages = await AsyncFetcher.fetch_many(urls, concurrency=3)

    for url, page in zip(urls, pages):
        if page:
            title = page.title
            print(f"{url}: {title}")
        else:
            print(f"{url}: Request failed")

asyncio.run(fetch_multiple_pages())

Spider Framework: Large-Scale Crawling

Scrapling provides a Scrapy-like Spider framework that supports large-scale concurrent crawling.

Basic Spider

from scrapling.spiders import Spider, Response

class QuoteSpider(Spider):
    name = "quotes"
    start_urls = ["https://quotes.toscrape.com/"]

    async def parse(self, response: Response):
        # Extract quotes from current page
        for quote in response.css('.quote'):
            yield {
                "text": quote.css('.text::text').get(),
                "author": quote.css('.author::text').get(),
                "tags": quote.css('.tag::text').getall()
            }

        # Go to next page
        next_page = response.css('.next a::attr(href)').get()
        if next_page:
            yield self.follow(next_page, callback=self.parse)

# Start the spider
QuoteSpider().start()

Concurrency Configuration

class MultiPageSpider(Spider):
    name = "multi-page"
    start_urls = [f"https://example.com/page/{i}" for i in range(1, 101)]

    # Configure concurrency
    custom_settings = {
        "concurrency": 5,           # Max concurrency
        "download_delay": 1,        # Download delay in seconds
        "robots_txt_obey": True,    # Obey robots.txt
    }

    async def parse(self, response: Response):
        title = response.css('h1::text').get()
        yield {"url": response.url, "title": title}

MultiPageSpider().start()

Pause and Resume

Scrapling supports checkpoint-based persistence. After gracefully exiting with Ctrl+C, it will automatically resume on the next start.

class LongRunningSpider(Spider):
    name = "long-crawl"
    start_urls = ["https://large-site.com/"]

    custom_settings = {
        "checkpoint_dir": "./checkpoints",  # Checkpoint directory
    }

    async def parse(self, response: Response):
        # Extract data...
        yield {"data": "..."}

        # Continue crawling
        for link in response.css('a::attr(href)').getall():
            yield self.follow(link, callback=self.parse)

LongRunningSpider().start()

Real-World Examples

from scrapling.fetchers import Fetcher

def scrape_github_trending():
    page = Fetcher.fetch('https://github.com/trending')

    repos = page.css('.Box-row')

    trending = []
    for repo in repos[:10]:
        name = repo.css('h2 a::text').get('').strip()
        description = repo.css('p.col-9::text').get('').strip()
        stars = repo.css('[href$=stargazers] span::text').get('').strip()
        language = repo.css('[itemprop=programmingLanguage]::text').get('').strip()

        trending.append({
            "name": name,
            "description": description,
            "stars": stars,
            "language": language
        })

    return trending

if __name__ == "__main__":
    results = scrape_github_trending()
    for repo in results:
        print(f"📦 {repo['name']}")
        print(f"   {repo['description'][:80]}...")
        print(f"   ⭐ {repo['stars']} | 📝 {repo['language']}")
        print()

Case 2: E-commerce Price Monitoring

from scrapling.fetchers import StealthyFetcher
import json
from datetime import datetime

def monitor_prices():
    urls = [
        "https://amazon.com/dp/B08N5WRWNW",
        "https://amazon.com/dp/B0BSHF7WHW",
        "https://amazon.com/dp/B09G9FPHY6",
    ]

    results = []

    for url in urls:
        page = StealthyFetcher.fetch(url, headless=True)

        title = page.css('#productTitle::text').get('').strip()
        price = page.css('.a-price .a-offscreen::text').get('').strip()

        results.append({
            "url": url,
            "title": title,
            "price": price,
            "timestamp": datetime.now().isoformat()
        })

        print(f"✅ {title[:50]}... - {price}")

    # Save to JSON
    with open('price_monitor.json', 'w', encoding='utf-8') as f:
        json.dump(results, f, ensure_ascii=False, indent=2)

    print(f"\n📊 Saved {len(results)} price records to price_monitor.json")

if __name__ == "__main__":
    monitor_prices()

Case 3: Streaming Output (Real-time Processing)

For long-running crawlers, you can use streaming mode to process data in real-time.

from scrapling.spiders import Spider, Response

class StreamingSpider(Spider):
    name = "streaming-demo"
    start_urls = ["https://quotes.toscrape.com/"]

    async def parse(self, response: Response):
        for quote in response.css('.quote'):
            item = {
                "text": quote.css('.text::text').get(),
                "author": quote.css('.author::text').get()
            }
            yield item  # Yield immediately, no need to wait for completion

        next_page = response.css('.next a::attr(href)').get()
        if next_page:
            yield self.follow(next_page, callback=self.parse)

# Stream consumption
async def main():
    spider = StreamingSpider()

    async for item in spider.stream():
        # Process each item in real-time
        print(f"Received: {item['text'][:50]}... by {item['author']}")
        # Can immediately store in database, send to message queue, etc.

import asyncio
asyncio.run(main())

Advanced Tips

1. Proxy Rotation

from scrapling.spiders import Spider, Response

class ProxySpider(Spider):
    name = "proxy-spider"
    start_urls = ["https://httpbin.org/ip"]

    custom_settings = {
        "proxy_list": [
            "http://proxy1.example.com:8080",
            "http://proxy2.example.com:8080",
            "http://proxy3.example.com:8080",
        ],
        "proxy_rotation": "per_request",  # Rotate proxy per request
    }

    async def parse(self, response: Response):
        ip = response.json().get('origin')
        print(f"Current IP: {ip}")

2. Custom Export Pipeline

from scrapling.spiders import Spider, Response
import csv

class CSVSpider(Spider):
    name = "csv-export"
    start_urls = ["https://quotes.toscrape.com/"]

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.csv_file = open('quotes.csv', 'w', newline='', encoding='utf-8')
        self.writer = csv.writer(self.csv_file)
        self.writer.writerow(['Text', 'Author', 'Tags'])

    async def parse(self, response: Response):
        for quote in response.css('.quote'):
            text = quote.css('.text::text').get()
            author = quote.css('.author::text').get()
            tags = ', '.join(quote.css('.tag::text').getall())

            self.writer.writerow([text, author, tags])

        next_page = response.css('.next a::attr(href)').get()
        if next_page:
            yield self.follow(next_page, callback=self.parse)

    def close(self, reason):
        self.csv_file.close()
        print(f"✅ Data saved to quotes.csv")

3. Development Mode (Cache Responses)

When debugging parsing logic, avoid repeated requests to the server.

from scrapling.spiders import Spider, Response

class DevSpider(Spider):
    name = "dev-mode"
    start_urls = ["https://example.com"]

    custom_settings = {
        "dev_mode": True,              # Enable dev mode
        "cache_dir": "./http_cache",   # Cache directory
    }

    async def parse(self, response: Response):
        # First run: cache response to disk
        # Subsequent runs: read directly from disk, no network request needed
        title = response.css('h1::text').get()
        yield {"title": title}

FAQ

Q1: What's the difference between Scrapling and Scrapy?

A: Scrapling can be seen as a modernized enhanced version of Scrapy: - Adaptive Parsing: Scrapy's selectors fail after page redesigns, while Scrapling adapts automatically - Built-in Anti-blocking: Scrapy needs extra middleware to bypass Cloudflare, Scrapling works out of the box - Simpler API: Scrapling's Fetcher API is better suited for small-scale quick scraping

If you're already familiar with Scrapy, migrating to Scrapling has almost no learning cost.

Q2: How to handle pages after login?

A: Use StealthyFetcher or DynamicFetcher to simulate login:

from scrapling.fetchers import DynamicFetcher

page = DynamicFetcher.fetch(
    'https://example.com/login',
    headless=True,
    wait_for='#dashboard'  # Wait for redirect after login
)

# Perform login operation (via Playwright)
page.page.fill('#username', 'your_username')
page.page.fill('#password', 'your_password')
page.page.click('#login-button')
page.page.wait_for_selector('#dashboard')

# Now you can scrape post-login content
data = page.css('.private-data::text').getall()

Q3: How to limit crawling speed to avoid being blocked?

A: Configure download delay and concurrency limits in the Spider:

class PoliteSpider(Spider):
    custom_settings = {
        "concurrency": 2,           # Lower concurrency
        "download_delay": 2,        # 2-second interval between requests
        "robots_txt_obey": True,    # Obey robots.txt
    }

Q4: What selectors does Scrapling support?

A: Supports both CSS selectors and XPath:

# CSS Selector
page.css('.class-name::text').get()
page.css('#id-name').getall()

# XPath
page.xpath('//div[@class="example"]/text()').get()

Summary

Scrapling is the most noteworthy Python crawler framework of 2026. It perfectly combines adaptive parsing, anti-blocking capabilities, and Scrapy-like API, allowing developers to implement the most stable crawlers with minimal code.

Core Advantages Recap: - ✅ Adaptive Parser: Automatically re-locates elements after page redesigns - ✅ Built-in Anti-blocking: Out-of-the-box bypass for Cloudflare Turnstile - ✅ From Single Request to Full Crawling: Same API covers all scenarios - ✅ Concurrency, Pause/Resume, Streaming Output: Production-grade features included

Resource Links: - GitHub Repository - Official Documentation - Discord Community

If you found this article helpful, feel free to share it with more developers!