Files

Case 8638500190 Feed Hunter: deep scraper skill, pipeline, simulator, first investigation

- Built deep-scraper skill (CDP-based X feed extraction)
- Three-stage pipeline: scrape → triage → investigate
- Paper trading simulator with position tracking
- First live investigation: verified kch123 Polymarket profile ($9.3M P&L)
- Opened first paper position: Seahawks Super Bowl @ 68c
- Telegram alerts with inline action buttons
- Portal build in progress (night shift)

2026-02-07 23:58:40 -06:00

3.5 KiB

Raw Blame History

name, description

name	description
deep-scraper	Deep scrape X/Twitter feeds to extract structured post data (author, text, metrics, links, media, cards) via Chrome DevTools Protocol, then classify posts by category (crypto, polymarket, arbitrage, trading) with spam detection and signal scoring. Use when scraping social feeds, analyzing X posts for money-making signals, or extracting structured data from web pages via authenticated browser sessions.

name

description

deep-scraper

Deep scrape X/Twitter feeds to extract structured post data (author, text, metrics, links, media, cards) via Chrome DevTools Protocol, then classify posts by category (crypto, polymarket, arbitrage, trading) with spam detection and signal scoring. Use when scraping social feeds, analyzing X posts for money-making signals, or extracting structured data from web pages via authenticated browser sessions.

Deep Scraper

Three-stage pipeline for X/Twitter feed intelligence:

Scrape — Extract structured posts from DOM via CDP
Triage — Identify posts with verifiable claims + actionable links
Investigate — Agent follows links, pulls real data, verifies claims

Prerequisites

Chrome with --remote-debugging-port=9222 --remote-allow-origins=*
User's Chrome profile copied to debug dir (script handles this)
websocket-client Python package

Launch Chrome for scraping

bash scripts/launch-chrome-debug.sh  # copies auth cookies, starts on port 9222

Stage 1: Scrape

python3 -u scripts/scrape-x-feed.py --port 9222 --scroll-pages 5

Output: data/x-feed/<timestamp>/posts.json — structured per-post data.

Stage 2: Triage

python3 scripts/triage-posts.py data/x-feed/<timestamp>/posts.json

Output: triage.json with investigation queue. Detects:

Performance claims — "wins 97% of the time", "10x return"
Copy trading — "copy this trader", "mirror bets"
Arbitrage — spreads, mispricing, risk-free claims
Prediction markets — Polymarket/Kalshi links, odds claims
Price targets — entry points, PT claims
Airdrops — free tokens, claiming steps

Each post gets an investigation priority score and a task list.

Stage 3: Investigate (Agent Workflow)

Read triage.json, then for each post in the investigation queue:

Read the claim — What exactly is being claimed?
Follow links — Use web_fetch or browser to visit linked pages
Extract evidence — Pull actual data (P&L, odds, prices, activity)
Verify — Does the data support the claim?
Assess — Is this actionable? Time-sensitive? Still valid?
Report — Summarize findings with verdict

Investigation methods by claim type:

performance_claim / copy_trading: Browse to user profile or linked platform. Check actual trade history, recent activity, win rate.
arbitrage_opp: Check both sides of the spread. Verify it still exists. Calculate fees and slippage.
prediction_market: Fetch current market page. Check odds, volume, resolution date. Assess if bet has edge.
price_target: Check current price on CoinGecko/TradingView. Compare to claim.
airdrop: Verify project legitimacy. Check contract on block explorer. Look for scam signals.

Example agent flow:

Post: "Copying @whaletrader99 bets wins 97% of the time"
→ Triage: performance_claim + copy_trading
→ Tasks:
  1. Browse to @whaletrader99 profile on Polymarket/X
  2. Check their actual bet history and P&L
  3. Verify "97%" claim against real data
  4. Check recency — are they still active?
  5. Assess: Is copying viable? Fees? Timing lag?
→ Verdict: VERIFIED / EXAGGERATED / STALE / SCAM

Output Structure

All outputs go to data/x-feed/<timestamp>/:

posts.json — raw structured scrape
summary.md — human-readable feed
triage.json — investigation queue
analysis.json — category/spam analysis (optional, from analyze-posts.py)

3.5 KiB Raw Blame History