Firecrawl Findings: Web Scraping That Actually Works

Firecrawl promises to make web scraping easy. No dealing with Selenium, no headless browsers, no IP rotation. Just hit their API, get clean markdown back. Sounds perfect.

I spent today testing it on real websites. Here’s what actually works and what doesn’t.

The Basic Premise

Traditional web scraping is a pain. You write a scraper, it works for a week, then the site changes their HTML and everything breaks. Or they detect your bot and block you. Or they use JavaScript rendering and your scraper sees nothing.

Firecrawl handles all of this. They render JavaScript, bypass bot detection, and return clean structured data. In theory.

Setup (The Easy Part)

`npm install @mendable/firecrawl-js

const Firecrawl = require(‘@mendable/firecrawl-js’).default; const app = new Firecrawl({ apiKey: ‘your-api-key’ });

// Basic scrape const result = await app.scrape(‘https://example.com', { formats: [‘markdown’], onlyMainContent: true });

console.log(result.markdown);` That’s it. No Chrome driver, no Puppeteer config, no headaches. It just works.

What I Tested

20+ Sites Tested · 22 API Calls · ~35% Success Rate

I tested everything: government sites, event aggregators, museum websites, news sites, community pages. Some worked perfectly. Some failed spectacularly.

The Good: What Works

1. Content-First Sites

Sites that are actually trying to be readable work great. Tourism sites, official directories, content-heavy pages.

Success Example: Official tourism site with 200+ event listings. Result: Clean markdown with all event titles, dates, venues extracted perfectly. No parsing errors, no missing data.

2. Static HTML

If it’s mostly HTML with minimal JavaScript, Firecrawl crushes it. You get markdown that actually matches what you see on the page.

#### [Event Title](https://example.com/event/123) Friday 31 January 2026, 7:30pm Venue Name, City Event description goes here...

3. Consistent Structure

The best part: if a site uses consistent HTML patterns, your parser works forever. No more “they changed a CSS class and everything broke.”

The Bad: What Doesn’t Work

1. Heavy JavaScript Apps

Failure: Modern event aggregator site (think Eventbrite-style). Result: Either blocks you entirely or returns navigation/header content only. The actual event data never loads.

Firecrawl renders JavaScript, but there’s a timeout. If the page does lazy-loading or infinite scroll, you’re out of luck.

2. Anti-Bot Protection

Some sites just block you. Period. I tested a major venue website and got HTTP 403 every time. Firecrawl couldn’t bypass it.

3. Wrong Content Focus

Sometimes Firecrawl returns markdown, but it’s not what you wanted. You ask for event listings, it gives you the navigation menu and footer. The onlyMainContent: true flag helps, but isn’t perfect.

The Weird: Edge Cases

Location-Based Content

I tested an event aggregator that theoretically showed events for a specific city. It returned events from a completely different city. The URL said “City A” but the content was “City B.”

Why? The site probably uses geolocation or cookies to determine what to show. Firecrawl scraped from their servers, which are located elsewhere.

Calendar Views

Calendar/grid layouts are tough. You get back markdown like:

| | | | |---|---|---| | Event 1 | Event 2 | Event 3 | | Date 1 | Date 2 | Date 3 | Technically correct, practically useless. You need the relationship between rows, which markdown tables don’t preserve well.

Cost Reality Check

Firecrawl isn’t expensive, but it’s not free. I used 22 credits testing (~$0.50-1). For a production scraper running weekly? That’s manageable. Running hourly? Gets pricey.

Good Use Case: Weekly scrape of 10 sites = ~40 credits/month Cost: ~$1-2/month · Value: Totally worth it

Bad Use Case: Hourly scrape of 50 sites = ~36,000 credits/month Cost: ~$100-200/month · Value: Build your own scraper

Practical Tips

1. Find Aggregators, Not Individual Sources

Don’t scrape 50 venue websites. Find one aggregator that pulls from all of them. Way more reliable.

2. Test First, Scale Later

Burn 5-10 credits testing. If it doesn’t work in testing, it won’t magically work at scale.

3. Parse Conservatively

Don’t expect perfect structure. Write parsers that handle missing data gracefully:

`function extractEvents(markdown) { const events = []; const lines = markdown.split(‘\n’);

for (const line of lines) { // Look for patterns, but don’t assume they exist const match = line.match(/####\s+[([^]]+)](([^)]+))/); if (!match) continue;

events.push({
  title: match[1],
  url: match[2]
});

}

return events.filter(e => e.title && e.url); }`

4. Markdown > HTML

Always request markdown format. It’s cleaner and more consistent than trying to parse raw HTML.

5. Save Your Results

Cache the markdown locally. If your parser breaks, you can debug without burning more API credits.

When NOT to Use Firecrawl

Real-time data: If you need up-to-the-second updates, the scraping delay will kill you
Massive scale: 10,000+ pages? Build your own infrastructure
Authenticated content: Need to log in? Firecrawl won’t help
Interactive elements: Clicking buttons, filling forms? Use Playwright instead

When to Use Firecrawl

Periodic scraping: Daily or weekly updates from public sites
Content aggregation: Pulling articles, events, listings from multiple sources
Prototyping: Testing if scraping is even viable before building custom tools
Low-maintenance: You don’t want to babysit your scraper when sites change

The Verdict

Firecrawl is solid for what it does. It won’t magically scrape anything, but for content-heavy public sites, it’s way better than rolling your own.

Best For: Scraping 5-20 public sites periodically. Content aggregation, directory building, research projects.

Not For: Real-time data, massive scale, sites with heavy bot protection, authenticated content.

The key insight: pick your targets carefully. Find sites that are actually trying to be readable, and Firecrawl will handle the rest. Try to scrape heavily-protected or JavaScript-heavy sites? You’ll burn credits and get nothing.

Test first. Save often. Parse conservatively. It’s not magic, but it works when used right.