- Built deep-scraper skill (CDP-based X feed extraction) - Three-stage pipeline: scrape → triage → investigate - Paper trading simulator with position tracking - First live investigation: verified kch123 Polymarket profile ($9.3M P&L) - Opened first paper position: Seahawks Super Bowl @ 68c - Telegram alerts with inline action buttons - Portal build in progress (night shift)
3.5 KiB
3.5 KiB
name, description
| name | description |
|---|---|
| deep-scraper | Deep scrape X/Twitter feeds to extract structured post data (author, text, metrics, links, media, cards) via Chrome DevTools Protocol, then classify posts by category (crypto, polymarket, arbitrage, trading) with spam detection and signal scoring. Use when scraping social feeds, analyzing X posts for money-making signals, or extracting structured data from web pages via authenticated browser sessions. |
Deep Scraper
Three-stage pipeline for X/Twitter feed intelligence:
- Scrape — Extract structured posts from DOM via CDP
- Triage — Identify posts with verifiable claims + actionable links
- Investigate — Agent follows links, pulls real data, verifies claims
Prerequisites
- Chrome with
--remote-debugging-port=9222 --remote-allow-origins=* - User's Chrome profile copied to debug dir (script handles this)
websocket-clientPython package
Launch Chrome for scraping
bash scripts/launch-chrome-debug.sh # copies auth cookies, starts on port 9222
Stage 1: Scrape
python3 -u scripts/scrape-x-feed.py --port 9222 --scroll-pages 5
Output: data/x-feed/<timestamp>/posts.json — structured per-post data.
Stage 2: Triage
python3 scripts/triage-posts.py data/x-feed/<timestamp>/posts.json
Output: triage.json with investigation queue. Detects:
- Performance claims — "wins 97% of the time", "10x return"
- Copy trading — "copy this trader", "mirror bets"
- Arbitrage — spreads, mispricing, risk-free claims
- Prediction markets — Polymarket/Kalshi links, odds claims
- Price targets — entry points, PT claims
- Airdrops — free tokens, claiming steps
Each post gets an investigation priority score and a task list.
Stage 3: Investigate (Agent Workflow)
Read triage.json, then for each post in the investigation queue:
- Read the claim — What exactly is being claimed?
- Follow links — Use
web_fetchorbrowserto visit linked pages - Extract evidence — Pull actual data (P&L, odds, prices, activity)
- Verify — Does the data support the claim?
- Assess — Is this actionable? Time-sensitive? Still valid?
- Report — Summarize findings with verdict
Investigation methods by claim type:
- performance_claim / copy_trading: Browse to user profile or linked platform. Check actual trade history, recent activity, win rate.
- arbitrage_opp: Check both sides of the spread. Verify it still exists. Calculate fees and slippage.
- prediction_market: Fetch current market page. Check odds, volume, resolution date. Assess if bet has edge.
- price_target: Check current price on CoinGecko/TradingView. Compare to claim.
- airdrop: Verify project legitimacy. Check contract on block explorer. Look for scam signals.
Example agent flow:
Post: "Copying @whaletrader99 bets wins 97% of the time"
→ Triage: performance_claim + copy_trading
→ Tasks:
1. Browse to @whaletrader99 profile on Polymarket/X
2. Check their actual bet history and P&L
3. Verify "97%" claim against real data
4. Check recency — are they still active?
5. Assess: Is copying viable? Fees? Timing lag?
→ Verdict: VERIFIED / EXAGGERATED / STALE / SCAM
Output Structure
All outputs go to data/x-feed/<timestamp>/:
posts.json— raw structured scrapesummary.md— human-readable feedtriage.json— investigation queueanalysis.json— category/spam analysis (optional, from analyze-posts.py)