Firecrawl Findings: Web Scraping That Actually Works
Firecrawl promises to make web scraping easy. No dealing with Selenium, no headless browsers, no IP rotation. Just hit their API, get clean markdown back. Sounds perfect.
I spent today testing it on real websites. Hereâs what actually works and what doesnât.
The Basic Premise
Traditional web scraping is a pain. You write a scraper, it works for a week, then the site changes their HTML and everything breaks. Or they detect your bot and block you. Or they use JavaScript rendering and your scraper sees nothing.
Firecrawl handles all of this. They render JavaScript, bypass bot detection, and return clean structured data. In theory.
Setup (The Easy Part)
`npm install @mendable/firecrawl-js
const Firecrawl = require(â@mendable/firecrawl-jsâ).default; const app = new Firecrawl({ apiKey: âyour-api-keyâ });
// Basic scrape const result = await app.scrape(âhttps://example.com', { formats: [âmarkdownâ], onlyMainContent: true });
console.log(result.markdown);` Thatâs it. No Chrome driver, no Puppeteer config, no headaches. It just works.
What I Tested
20+ Sites Tested ¡ 22 API Calls ¡ ~35% Success Rate
I tested everything: government sites, event aggregators, museum websites, news sites, community pages. Some worked perfectly. Some failed spectacularly.
The Good: What Works
1. Content-First Sites
Sites that are actually trying to be readable work great. Tourism sites, official directories, content-heavy pages.
Success Example: Official tourism site with 200+ event listings. Result: Clean markdown with all event titles, dates, venues extracted perfectly. No parsing errors, no missing data.
2. Static HTML
If itâs mostly HTML with minimal JavaScript, Firecrawl crushes it. You get markdown that actually matches what you see on the page.
#### [Event Title](https://example.com/event/123) Friday 31 January 2026, 7:30pm Venue Name, City Event description goes here...
3. Consistent Structure
The best part: if a site uses consistent HTML patterns, your parser works forever. No more âthey changed a CSS class and everything broke.â
The Bad: What Doesnât Work
1. Heavy JavaScript Apps
Failure: Modern event aggregator site (think Eventbrite-style). Result: Either blocks you entirely or returns navigation/header content only. The actual event data never loads.
Firecrawl renders JavaScript, but thereâs a timeout. If the page does lazy-loading or infinite scroll, youâre out of luck.
2. Anti-Bot Protection
Some sites just block you. Period. I tested a major venue website and got HTTP 403 every time. Firecrawl couldnât bypass it.
3. Wrong Content Focus
Sometimes Firecrawl returns markdown, but itâs not what you wanted. You ask for event listings, it gives you the navigation menu and footer. The onlyMainContent: true flag helps, but isnât perfect.
The Weird: Edge Cases
Location-Based Content
I tested an event aggregator that theoretically showed events for a specific city. It returned events from a completely different city. The URL said âCity Aâ but the content was âCity B.â
Why? The site probably uses geolocation or cookies to determine what to show. Firecrawl scraped from their servers, which are located elsewhere.
Calendar Views
Calendar/grid layouts are tough. You get back markdown like:
| | | | |---|---|---| | Event 1 | Event 2 | Event 3 | | Date 1 | Date 2 | Date 3 |
Technically correct, practically useless. You need the relationship between rows, which markdown tables donât preserve well.
Cost Reality Check
Firecrawl isnât expensive, but itâs not free. I used 22 credits testing (~$0.50-1). For a production scraper running weekly? Thatâs manageable. Running hourly? Gets pricey.
Good Use Case: Weekly scrape of 10 sites = ~40 credits/month Cost: ~$1-2/month ¡ Value: Totally worth it
Bad Use Case: Hourly scrape of 50 sites = ~36,000 credits/month Cost: ~$100-200/month ¡ Value: Build your own scraper
Practical Tips
1. Find Aggregators, Not Individual Sources
Donât scrape 50 venue websites. Find one aggregator that pulls from all of them. Way more reliable.
2. Test First, Scale Later
Burn 5-10 credits testing. If it doesnât work in testing, it wonât magically work at scale.
3. Parse Conservatively
Donât expect perfect structure. Write parsers that handle missing data gracefully:
`function extractEvents(markdown) { const events = []; const lines = markdown.split(â\nâ);
for (const line of lines) { // Look for patterns, but donât assume they exist const match = line.match(/####\s+[([^]]+)](([^)]+))/); if (!match) continue;
events.push({
title: match[1],
url: match[2]
});
}
return events.filter(e => e.title && e.url); }`
4. Markdown > HTML
Always request markdown format. Itâs cleaner and more consistent than trying to parse raw HTML.
5. Save Your Results
Cache the markdown locally. If your parser breaks, you can debug without burning more API credits.
When NOT to Use Firecrawl
-
Real-time data: If you need up-to-the-second updates, the scraping delay will kill you
-
Massive scale: 10,000+ pages? Build your own infrastructure
-
Authenticated content: Need to log in? Firecrawl wonât help
-
Interactive elements: Clicking buttons, filling forms? Use Playwright instead
When to Use Firecrawl
-
Periodic scraping: Daily or weekly updates from public sites
-
Content aggregation: Pulling articles, events, listings from multiple sources
-
Prototyping: Testing if scraping is even viable before building custom tools
-
Low-maintenance: You donât want to babysit your scraper when sites change
The Verdict
Firecrawl is solid for what it does. It wonât magically scrape anything, but for content-heavy public sites, itâs way better than rolling your own.
Best For: Scraping 5-20 public sites periodically. Content aggregation, directory building, research projects.
Not For: Real-time data, massive scale, sites with heavy bot protection, authenticated content.
The key insight: pick your targets carefully. Find sites that are actually trying to be readable, and Firecrawl will handle the rest. Try to scrape heavily-protected or JavaScript-heavy sites? Youâll burn credits and get nothing.
Test first. Save often. Parse conservatively. Itâs not magic, but it works when used right.