Files
workspace/skills/deep-scraper/SKILL.md
Case 8638500190 Feed Hunter: deep scraper skill, pipeline, simulator, first investigation
- Built deep-scraper skill (CDP-based X feed extraction)
- Three-stage pipeline: scrape → triage → investigate
- Paper trading simulator with position tracking
- First live investigation: verified kch123 Polymarket profile ($9.3M P&L)
- Opened first paper position: Seahawks Super Bowl @ 68c
- Telegram alerts with inline action buttons
- Portal build in progress (night shift)
2026-02-07 23:58:40 -06:00

90 lines
3.5 KiB
Markdown

---
name: deep-scraper
description: Deep scrape X/Twitter feeds to extract structured post data (author, text, metrics, links, media, cards) via Chrome DevTools Protocol, then classify posts by category (crypto, polymarket, arbitrage, trading) with spam detection and signal scoring. Use when scraping social feeds, analyzing X posts for money-making signals, or extracting structured data from web pages via authenticated browser sessions.
---
# Deep Scraper
Three-stage pipeline for X/Twitter feed intelligence:
1. **Scrape** — Extract structured posts from DOM via CDP
2. **Triage** — Identify posts with verifiable claims + actionable links
3. **Investigate** — Agent follows links, pulls real data, verifies claims
## Prerequisites
- Chrome with `--remote-debugging-port=9222 --remote-allow-origins=*`
- User's Chrome profile copied to debug dir (script handles this)
- `websocket-client` Python package
### Launch Chrome for scraping
```bash
bash scripts/launch-chrome-debug.sh # copies auth cookies, starts on port 9222
```
## Stage 1: Scrape
```bash
python3 -u scripts/scrape-x-feed.py --port 9222 --scroll-pages 5
```
Output: `data/x-feed/<timestamp>/posts.json` — structured per-post data.
## Stage 2: Triage
```bash
python3 scripts/triage-posts.py data/x-feed/<timestamp>/posts.json
```
Output: `triage.json` with investigation queue. Detects:
- **Performance claims** — "wins 97% of the time", "10x return"
- **Copy trading** — "copy this trader", "mirror bets"
- **Arbitrage** — spreads, mispricing, risk-free claims
- **Prediction markets** — Polymarket/Kalshi links, odds claims
- **Price targets** — entry points, PT claims
- **Airdrops** — free tokens, claiming steps
Each post gets an investigation priority score and a task list.
## Stage 3: Investigate (Agent Workflow)
Read `triage.json`, then for each post in the investigation queue:
1. **Read the claim** — What exactly is being claimed?
2. **Follow links** — Use `web_fetch` or `browser` to visit linked pages
3. **Extract evidence** — Pull actual data (P&L, odds, prices, activity)
4. **Verify** — Does the data support the claim?
5. **Assess** — Is this actionable? Time-sensitive? Still valid?
6. **Report** — Summarize findings with verdict
### Investigation methods by claim type:
- **performance_claim / copy_trading**: Browse to user profile or linked platform. Check actual trade history, recent activity, win rate.
- **arbitrage_opp**: Check both sides of the spread. Verify it still exists. Calculate fees and slippage.
- **prediction_market**: Fetch current market page. Check odds, volume, resolution date. Assess if bet has edge.
- **price_target**: Check current price on CoinGecko/TradingView. Compare to claim.
- **airdrop**: Verify project legitimacy. Check contract on block explorer. Look for scam signals.
### Example agent flow:
```
Post: "Copying @whaletrader99 bets wins 97% of the time"
→ Triage: performance_claim + copy_trading
→ Tasks:
1. Browse to @whaletrader99 profile on Polymarket/X
2. Check their actual bet history and P&L
3. Verify "97%" claim against real data
4. Check recency — are they still active?
5. Assess: Is copying viable? Fees? Timing lag?
→ Verdict: VERIFIED / EXAGGERATED / STALE / SCAM
```
## Output Structure
All outputs go to `data/x-feed/<timestamp>/`:
- `posts.json` — raw structured scrape
- `summary.md` — human-readable feed
- `triage.json` — investigation queue
- `analysis.json` — category/spam analysis (optional, from analyze-posts.py)