Costruire un agente di ricerca privato con Venice AI

Gli agenti di ricerca sono utili quando vuoi qualcosa di più di un singolo risultato di ricerca o di una risposta veloce del modello. Un buon agente di ricerca può trasformare un argomento ampio in query di ricerca, raccogliere fonti, estrarre l’evidenza importante, fare follow-up sulle lacune e scrivere un briefing con citazioni che puoi successivamente ispezionare. In questo tutorial costruiremo un agente di ricerca privato usando Python e l’API di Venice. Alla fine avrai una CLI in grado di ricercare un argomento, fare scraping di pagine pubbliche in Markdown, riassumere chunk di fonti, eseguire pass di ricerca di follow-up consapevoli delle lacune e generare un report con citazioni e artefatti JSONL locali opzionali. Ti interessa l’implementazione completa? Dai un’occhiata al repository GitHub. Prima di continuare, ti servirà una API key Venice:

export VENICE_API_KEY=<my-key>

Cosa costruiremo

L’implementazione di riferimento è un piccolo progetto Python con poche parti ben definite:

Parte	Cosa fa
CLI	Accetta un argomento di ricerca, modello, provider, impostazioni di profondità, percorso di output e directory degli artefatti
Client Venice	Chiama chat completions, streaming chat completions e `POST /augment/scrape`
Layer di ricerca	Cerca su DuckDuckGo come default, con discovery opzionale di paper arXiv
Data model	Tiene traccia di URL delle fonti, URL canonici, chunk, evidenza, note, errori e report
Agente di ricerca	Pianifica le ricerche, legge le fonti, estrae l’evidenza, analizza le lacune, genera query di follow-up e scrive il report finale
Artifact writer	Memorizza record JSONL auditable per query, lacune di ricerca, risultati, fetch, chunk, source notes, bozze di report, errori e report

Il flusso è il seguente:

Chiedi a Venice di generare query di ricerca diversificate per l’argomento.
Cerca sul web con uno o più provider.
Deduplica gli URL prima di leggerli.
Usa l’endpoint di scrape di Venice per convertire ogni pagina sorgente pubblica in Markdown.
Suddividi le pagine lunghe in chunk.
Chiedi a Venice di estrarre l’evidenza da ogni chunk.
Chiedi a Venice di trasformare l’evidenza dei chunk in source notes.
Identifica lacune di ricerca e problemi di bilanciamento delle fonti prima di generare query di follow-up.
Chiedi a Venice di sintetizzare il report finale con citazioni in stile footnote.

Questo è “privato” nel senso pratico che l’agente mantiene l’orchestrazione, le source notes, gli artefatti e i report finali sulla tua macchina. Venice gestisce le chiamate al modello e lo scraping tramite la sua API. L’implementazione di riferimento di default invia comunque query di ricerca a DuckDuckGo o arXiv, quindi considera la scelta del provider come parte del tuo design della privacy.

Configurare il progetto

Il progetto di riferimento usa Python 3.13 e uv, ma lo stesso codice funziona anche con un normale virtual environment. Crea un nuovo progetto:

mkdir venice-research-agent
cd venice-research-agent
uv init

Installa le dipendenze:

uv add httpx beautifulsoup4 python-dotenv

Se preferisci pip, crea un virtual environment e installa gli stessi pacchetti:

python -m venv .venv
source .venv/bin/activate
pip install "httpx>=0.28.0" "beautifulsoup4>=4.13.0" "python-dotenv>=1.0.0"

Crea un file .env per lo sviluppo locale:

VENICE_API_KEY=your_venice_api_key_here
VENICE_MODEL=openai-gpt-55

Usiamo VENICE_MODEL così puoi cambiare il modello senza modificare il codice. L’implementazione di riferimento attualmente usa openai-gpt-55 come default, ma puoi sostituirlo con un altro modello di chat disponibile per il tuo account Venice.

Creare i data model

Prima di scrivere la logica dell’agente, definiremo gli oggetti che attraversano la pipeline. Questi modelli mantengono il resto del codice più facile da ragionare perché ogni fonte porta con sé la provenienza: da dove proviene, quale query l’ha trovata, quando è stata recuperata e come è stata suddivisa in chunk. Crea research_agent/models.py:

from __future__ import annotations

import hashlib
from dataclasses import dataclass, field
from datetime import UTC, datetime
from urllib.parse import parse_qsl, urlencode, urlparse, urlunparse

TRACKING_PARAMS = {
    "fbclid",
    "gclid",
    "igshid",
    "mc_cid",
    "mc_eid",
    "msclkid",
    "ref",
    "ref_src",
}


@dataclass(frozen=True)
class SearchResult:
    title: str
    url: str
    snippet: str
    query: str = ""
    rank: int = 0
    provider: str = "duckduckgo"
    canonical_url: str = ""

    def __post_init__(self) -> None:
        if not self.canonical_url:
            object.__setattr__(self, "canonical_url", canonicalize_url(self.url))


@dataclass(frozen=True)
class ScrapeResult:
    url: str
    content: str
    title: str = ""
    final_url: str = ""
    content_type: str = "text/markdown"


@dataclass(frozen=True)
class TextChunk:
    chunk_id: str
    text: str
    start: int
    end: int
    content_hash: str


@dataclass(frozen=True)
class WebPage:
    title: str
    url: str
    text: str
    final_url: str = ""
    canonical_url: str = ""
    content_type: str = ""
    retrieved_at: str = ""
    content_hash: str = ""
    chunks: tuple[TextChunk, ...] = field(default_factory=tuple)

    def __post_init__(self) -> None:
        final_url = self.final_url or self.url
        object.__setattr__(self, "final_url", final_url)
        if not self.canonical_url:
            object.__setattr__(self, "canonical_url", canonicalize_url(final_url))
        if not self.retrieved_at:
            object.__setattr__(self, "retrieved_at", utc_now())
        if not self.content_hash:
            object.__setattr__(self, "content_hash", content_hash(self.text))


@dataclass(frozen=True)
class EvidenceChunk:
    chunk_id: str
    text: str
    summary: str
    quotes: tuple[str, ...] = field(default_factory=tuple)


@dataclass(frozen=True)
class SourceNote:
    source_id: str
    title: str
    url: str
    query: str
    summary: str
    canonical_url: str = ""
    final_url: str = ""
    rank: int = 0
    snippet: str = ""
    provider: str = "duckduckgo"
    retrieved_at: str = ""
    content_type: str = ""
    content_hash: str = ""
    chunks: tuple[EvidenceChunk, ...] = field(default_factory=tuple)

I campi importanti qui sono canonical_url, content_hash e chunks. canonical_url permette all’agente di evitare di leggere ripetutamente la stessa fonte quando i risultati di ricerca differiscono solo per parametri di tracking o frammenti. content_hash aiuta a beccare pagine duplicate anche quando vivono a URL diversi. chunks ci permette di riassumere pagine lunghe in pezzi più piccoli invece di perdere evidenza utile per i limiti di contesto. Aggiungi le funzioni helper sotto i dataclass:

def utc_now() -> str:
    return datetime.now(UTC).isoformat()


def content_hash(text: str) -> str:
    return hashlib.sha256(text.encode("utf-8")).hexdigest()


def canonicalize_url(raw_url: str) -> str:
    if not raw_url:
        return ""

    parsed = urlparse(raw_url.strip())
    if parsed.scheme not in {"http", "https"} or not parsed.netloc:
        return ""

    scheme = parsed.scheme.lower()
    netloc = parsed.netloc.lower()
    path = parsed.path or "/"
    if path != "/":
        path = path.rstrip("/")

    query_pairs = [
        (key, value)
        for key, value in parse_qsl(parsed.query, keep_blank_values=True)
        if not _is_tracking_param(key)
    ]
    query = urlencode(sorted(query_pairs), doseq=True)
    return urlunparse((scheme, netloc, path, "", query, ""))


def chunk_text(text: str, *, chunk_chars: int = 3000, overlap: int = 250) -> tuple[TextChunk, ...]:
    clean = text.strip()
    if not clean:
        return ()
    if chunk_chars <= 0:
        raise ValueError("chunk_chars must be greater than 0")
    if overlap < 0 or overlap >= chunk_chars:
        raise ValueError("overlap must be at least 0 and smaller than chunk_chars")

    chunks: list[TextChunk] = []
    start = 0
    index = 1
    while start < len(clean):
        end = min(len(clean), start + chunk_chars)
        chunk = clean[start:end].strip()
        if chunk:
            chunks.append(
                TextChunk(
                    chunk_id=f"C{index}",
                    text=chunk,
                    start=start,
                    end=end,
                    content_hash=content_hash(chunk),
                )
            )
            index += 1
        if end == len(clean):
            break
        start = end - overlap

    return tuple(chunks)


def _is_tracking_param(key: str) -> bool:
    lowered = key.lower()
    return lowered.startswith("utm_") or lowered in TRACKING_PARAMS

Il chunking qui è volutamente semplice: chunk a carattere di dimensione fissa con overlap. È sufficiente per un demo agent di ricerca perché l’endpoint di scrape di Venice restituisce Markdown, che è di solito molto più pulito dell’HTML raw. Per ricerca in produzione su documenti tecnici lunghi, puoi migliorarlo splittando su heading, paragrafi o conteggio di token.

Costruire il client Venice

Successivamente, creeremo un piccolo client Venice. Potresti usare l’SDK Python di OpenAI per chat completions perché Venice è compatibile con OpenAI, ma l’implementazione di riferimento usa httpx direttamente così lo stesso client può chiamare l’endpoint POST /augment/scrape di Venice. Crea research_agent/venice.py:

from __future__ import annotations

import json
import os
import time
from dataclasses import dataclass
from typing import Any

import httpx

from .models import ScrapeResult

DEFAULT_BASE_URL = "https://api.venice.ai/api/v1"
DEFAULT_MODEL = "openai-gpt-55"
RETRYABLE_STATUS_CODES = {429, 500, 502, 503, 504}


class VeniceError(RuntimeError):
    """Raised when the Venice API returns an unusable response."""


@dataclass(frozen=True)
class VeniceClient:
    api_key: str
    model: str = DEFAULT_MODEL
    base_url: str = DEFAULT_BASE_URL
    timeout: float = 60.0
    max_retries: int = 2
    backoff_seconds: float = 1.0

    @classmethod
    def from_env(cls, model: str | None = None, *, max_retries: int = 2) -> "VeniceClient":
        api_key = os.getenv("VENICE_API_KEY")
        if not api_key:
            raise VeniceError("VENICE_API_KEY is required.")

        return cls(
            api_key=api_key,
            model=model or os.getenv("VENICE_MODEL", DEFAULT_MODEL),
            base_url=os.getenv("VENICE_BASE_URL", DEFAULT_BASE_URL).rstrip("/"),
            max_retries=max_retries,
        )

L’helper from_env() tiene i segreti fuori dal codice sorgente. Rende anche conveniente lo sviluppo locale perché python-dotenv può caricare VENICE_API_KEY e VENICE_MODEL dal .env. Ora aggiungi le chat completions:

    def chat(
        self,
        messages: list[dict[str, str]],
        *,
        temperature: float = 0.2,
        max_tokens: int = 1600,
    ) -> str:
        payload: dict[str, Any] = {
            "model": self.model,
            "messages": messages,
            "temperature": temperature,
            "max_tokens": max_tokens,
        }

        data = self._post_json("/chat/completions", payload)
        try:
            return data["choices"][0]["message"]["content"].strip()
        except (KeyError, IndexError, TypeError) as exc:
            raise VeniceError(f"Unexpected Venice API response: {data}") from exc

Per il report finale vogliamo usare lo streaming perché i report deep possono richiedere notevolmente più tempo (perché produrranno molto più testo). Questo può causare problemi di timeout per richieste in cui può servire moltissimo tempo per produrre l’output finale. Usando lo streaming possiamo eliminare questo problema e rendere la richiesta più resistente ai timeout failure:

    def chat_stream(
        self,
        messages: list[dict[str, str]],
        *,
        temperature: float = 0.2,
        max_tokens: int = 1600,
    ) -> str:
        payload: dict[str, Any] = {
            "model": self.model,
            "messages": messages,
            "temperature": temperature,
            "max_tokens": max_tokens,
            "stream": True,
        }
        return self._post_chat_stream("/chat/completions", payload).strip()

Poi aggiungi lo scraping:

    def scrape(self, url: str) -> ScrapeResult:
        data = self._post_json("/augment/scrape", {"url": url})
        content = _first_string(data, "content", "markdown", "text")
        if not content:
            raise VeniceError(f"Unexpected Venice scrape response: {data}")

        return ScrapeResult(
            url=url,
            final_url=_first_string(data, "final_url", "url", "source_url") or url,
            title=_first_string(data, "title"),
            content=content,
            content_type="text/markdown",
        )

L’endpoint di scrape di Venice accetta un URL pubblicamente accessibile e restituisce la pagina come Markdown. Significa che il modello non deve parsare l’HTML raw, e i tuoi prompt di estrazione dalla fonte possono lavorare con testo più pulito. L’helper rimanente gestisce i retry e il parsing della response:

    def _post_json(self, path: str, payload: dict[str, Any]) -> dict[str, Any]:
        for attempt in range(self.max_retries + 1):
            try:
                response = httpx.post(
                    f"{self.base_url}{path}",
                    headers={
                        "Authorization": f"Bearer {self.api_key}",
                        "Content-Type": "application/json",
                    },
                    json=payload,
                    timeout=self.timeout,
                )
                if response.status_code in RETRYABLE_STATUS_CODES and attempt < self.max_retries:
                    time.sleep(self.backoff_seconds * (2**attempt))
                    continue
                response.raise_for_status()
                data = response.json()
                if not isinstance(data, dict):
                    raise VeniceError(f"Unexpected Venice API response: {data}")
                return data
            except httpx.HTTPError as exc:
                if attempt < self.max_retries:
                    time.sleep(self.backoff_seconds * (2**attempt))
                    continue
                raise VeniceError(f"Could not reach Venice API: {exc}") from exc

        raise VeniceError("Could not reach Venice API")


def _first_string(data: dict[str, Any], *keys: str) -> str:
    for key in keys:
        value = data.get(key)
        if isinstance(value, str) and value.strip():
            return value.strip()

    for nested_key in ("data", "result", "scrape"):
        nested = data.get(nested_key)
        if isinstance(nested, dict):
            value = _first_string(nested, *keys)
            if value:
                return value

    return ""

Il repo completo include anche un robusto helper _post_chat_stream() che legge eventi server-sent dalle streaming chat completions. Puoi iniziare senza streaming e poi aggiungerlo una volta che il resto del flusso di ricerca funziona.

Aggiungere i search provider

Il layer di ricerca ha due lavori: trovare URL di fonti e recuperare quegli URL tramite lo scraper di Venice. L’implementazione di riferimento usa l’endpoint HTML di DuckDuckGo per la ricerca web generale e l’Atom API di arXiv per i paper. Crea research_agent/web.py:

from __future__ import annotations

import re
import xml.etree.ElementTree as ET
from collections.abc import Callable, Iterable
from urllib.parse import parse_qs, unquote, urlparse

import httpx
from bs4 import BeautifulSoup

from .models import ScrapeResult, SearchResult, TextChunk, WebPage, canonicalize_url, chunk_text, content_hash, utc_now

USER_AGENT = "venice-research-agent-demo/0.1 (+https://venice.ai)"


class SearchProvider:
    name = "provider"

    def search(self, web: "WebSearch", query: str, limit: int) -> list[SearchResult]:
        raise NotImplementedError

Ora aggiungi DuckDuckGo:

class DuckDuckGoProvider(SearchProvider):
    name = "duckduckgo"

    def search(self, web: "WebSearch", query: str, limit: int) -> list[SearchResult]:
        response = web.get("https://duckduckgo.com/html/", params={"q": query})
        soup = BeautifulSoup(response.text, "html.parser")
        results: list[SearchResult] = []
        seen_urls: set[str] = set()

        for node in soup.select(".result"):
            link = node.select_one(".result__a")
            if link is None:
                continue

            url = _normalize_duckduckgo_url(link.get("href", ""))
            canonical_url = canonicalize_url(url)
            if not canonical_url or canonical_url in seen_urls:
                continue

            snippet = node.select_one(".result__snippet")
            results.append(
                SearchResult(
                    title=_clean_text(link.get_text(" ", strip=True)),
                    url=url,
                    snippet=_clean_text(snippet.get_text(" ", strip=True) if snippet else ""),
                    query=query,
                    rank=len(results) + 1,
                    provider=self.name,
                    canonical_url=canonical_url,
                )
            )
            seen_urls.add(canonical_url)

            if len(results) >= limit:
                break

        return results

E arXiv:

class ArxivProvider(SearchProvider):
    name = "arxiv"

    def search(self, web: "WebSearch", query: str, limit: int) -> list[SearchResult]:
        response = web.get(
            "https://export.arxiv.org/api/query",
            params={
                "search_query": f"all:{query}",
                "start": 0,
                "max_results": limit,
                "sortBy": "relevance",
            },
        )
        namespace = {"atom": "http://www.w3.org/2005/Atom"}
        root = ET.fromstring(response.text)
        results: list[SearchResult] = []

        for entry in root.findall("atom:entry", namespace):
            title = _clean_text(_xml_text(entry.find("atom:title", namespace)))
            summary = _clean_text(_xml_text(entry.find("atom:summary", namespace)))
            url = _xml_text(entry.find("atom:id", namespace)).strip()
            canonical_url = canonicalize_url(url)
            if not url or not canonical_url:
                continue

            results.append(
                SearchResult(
                    title=title or url,
                    url=url,
                    snippet=summary,
                    query=query,
                    rank=len(results) + 1,
                    provider=self.name,
                    canonical_url=canonical_url,
                )
            )

            if len(results) >= limit:
                break

        return results

La classe WebSearch coordina i provider e recupera le pagine:

class WebSearch:
    def __init__(
        self,
        timeout: float = 15.0,
        *,
        providers: Iterable[SearchProvider] | None = None,
        chunk_chars: int = 3000,
        scraper: Callable[[str], ScrapeResult] | None = None,
    ) -> None:
        self._client = httpx.Client(
            timeout=timeout,
            follow_redirects=True,
            headers={"User-Agent": USER_AGENT},
        )
        self.providers = tuple(providers or (DuckDuckGoProvider(),))
        self.chunk_chars = chunk_chars
        self.scraper = scraper

    @classmethod
    def from_provider_names(cls, provider_names: Iterable[str], **kwargs: object) -> "WebSearch":
        providers = [_provider_from_name(name) for name in provider_names]
        return cls(providers=providers, **kwargs)

    def search(self, query: str, limit: int = 5) -> list[SearchResult]:
        results: list[SearchResult] = []
        seen_urls: set[str] = set()

        for provider in self.providers:
            for result in provider.search(self, query, limit):
                if result.canonical_url in seen_urls:
                    continue
                results.append(result)
                seen_urls.add(result.canonical_url)

        return results

    def fetch(self, result: SearchResult) -> WebPage:
        if self.scraper is None:
            raise RuntimeError("WebSearch.fetch requires a Venice scrape function.")

        scraped = self.scraper(result.url)
        text = scraped.content.strip() or result.snippet
        chunks = self._chunk_text(text)
        return WebPage(
            title=scraped.title or result.title,
            url=result.url,
            final_url=scraped.final_url or scraped.url or result.url,
            canonical_url=canonicalize_url(scraped.final_url or result.url),
            text=text,
            content_type=scraped.content_type or "text/markdown",
            retrieved_at=utc_now(),
            content_hash=content_hash(text),
            chunks=chunks,
        )

    def get(self, url: str, *, params: dict[str, object] | None = None) -> httpx.Response:
        response = self._client.get(url, params=params)
        response.raise_for_status()
        return response

    def close(self) -> None:
        self._client.close()

    def __enter__(self) -> "WebSearch":
        return self

    def __exit__(self, *_: object) -> None:
        self.close()

    def _chunk_text(self, text: str) -> tuple[TextChunk, ...]:
        overlap = min(250, max(0, self.chunk_chars // 10))
        return chunk_text(text, chunk_chars=self.chunk_chars, overlap=overlap)

L’implementazione di riferimento completa aggiunge retry, ritardi di richiesta per host ed errori più amichevoli. Vale la pena tenerli perché gli agenti di ricerca passano molto tempo a confrontarsi con pagine che bloccano l’automazione, redirezionano in modo inatteso o restituiscono errori transitori. Aggiungi i piccoli helper dei provider in fondo:

def _normalize_duckduckgo_url(raw_url: str) -> str:
    if not raw_url:
        return ""

    parsed = urlparse(raw_url)
    if parsed.netloc.endswith("duckduckgo.com") and parsed.path == "/l/":
        target = parse_qs(parsed.query).get("uddg", [""])[0]
        return unquote(target)

    if parsed.scheme in {"http", "https"}:
        return raw_url

    return ""


def _provider_from_name(name: str) -> SearchProvider:
    normalized = name.strip().lower()
    if normalized in {"duckduckgo", "ddg", "web"}:
        return DuckDuckGoProvider()
    if normalized == "arxiv":
        return ArxivProvider()
    raise ValueError(f"Unknown source provider: {name}")


def _clean_text(value: str) -> str:
    return re.sub(r"\s+", " ", value).strip()


def _xml_text(node: ET.Element | None) -> str:
    return "" if node is None or node.text is None else node.text

Scrivere artefatti locali

Per i workflow di ricerca, l’auditability conta. Se il report finale dice qualcosa di sorprendente, dovresti poter ispezionare quale fonte ha portato a quella conclusione. Crea research_agent/artifacts.py:

from __future__ import annotations

import json
from dataclasses import asdict, is_dataclass
from pathlib import Path
from typing import Any


class ArtifactWriter:
    def __init__(self, root: Path | None = None) -> None:
        self.root = root
        if self.root is not None:
            self.root.mkdir(parents=True, exist_ok=True)

    @property
    def enabled(self) -> bool:
        return self.root is not None

    def write(self, kind: str, record: object) -> None:
        if self.root is None:
            return

        path = self.root / f"{kind}.jsonl"
        payload = json.dumps(_to_jsonable(record), ensure_ascii=False, sort_keys=True)
        with path.open("a", encoding="utf-8") as file:
            file.write(f"{payload}\n")


def _to_jsonable(value: object) -> Any:
    if is_dataclass(value):
        return _to_jsonable(asdict(value))
    if isinstance(value, Path):
        return str(value)
    if isinstance(value, dict):
        return {str(key): _to_jsonable(item) for key, item in value.items()}
    if isinstance(value, (list, tuple)):
        return [_to_jsonable(item) for item in value]
    return value

Questo scrive un oggetto JSON per riga, il che rende gli artefatti facili da appendere, ispezionare ed elaborare con tool da command-line in seguito.

Costruire l’agente di ricerca

Ora che abbiamo Venice, search, model e artefatti, possiamo costruire l’agente vero e proprio. Crea research_agent/agent.py:

from __future__ import annotations

import json
from collections.abc import Callable
from textwrap import dedent

from .artifacts import ArtifactWriter
from .models import CollectionError, EvidenceChunk, ResearchReport, SearchResult, SourceNote, WebPage, utc_now
from .venice import VeniceClient, VeniceError
from .web import WebSearch

SYSTEM_PROMPT = """You are a careful research assistant.
Use the supplied source material only when making factual claims.
Flag uncertainty, contradictions, and missing context instead of filling gaps."""

ProgressCallback = Callable[[str], None]

DEFAULT_ITERATIONS = 3
DEFAULT_QUERY_COUNT = 6
DEFAULT_RESULTS_PER_QUERY = 4
DEFAULT_MAX_SOURCES = 40
DEFAULT_MAX_CHUNKS_PER_SOURCE = 6

Il system prompt è il guardrail comportamentale di base. Non vogliamo che il modello produca un report che suona impressionante dalla memoria. Vogliamo che usi il materiale sorgente e segnali l’incertezza quando l’evidenza è scarsa. Servono anche due dataclass finali in models.py se non li hai ancora aggiunti:

@dataclass(frozen=True)
class CollectionError:
    stage: str
    message: str
    query: str = ""
    url: str = ""
    source_id: str = ""
    provider: str = ""


@dataclass(frozen=True)
class ResearchReport:
    topic: str
    markdown: str
    sources: list[SourceNote]
    artifacts_dir: str | None = None

Ora definisci il ResearchAgent:

class ResearchAgent:
    def __init__(
        self,
        venice: VeniceClient,
        web: WebSearch | None = None,
        artifacts: ArtifactWriter | None = None,
        progress: ProgressCallback | None = None,
        max_sources: int | None = DEFAULT_MAX_SOURCES,
        max_chunks_per_source: int = DEFAULT_MAX_CHUNKS_PER_SOURCE,
    ) -> None:
        self.venice = venice
        self.web = web or WebSearch(scraper=venice.scrape)
        self.artifacts = artifacts or ArtifactWriter()
        self.progress = progress or (lambda _: None)
        self.max_sources = max_sources
        self.max_chunks_per_source = max_chunks_per_source

Il metodo run() coordina i pass di ricerca:

    def run(
        self,
        topic: str,
        *,
        iterations: int = DEFAULT_ITERATIONS,
        query_count: int = DEFAULT_QUERY_COUNT,
        results_per_query: int = DEFAULT_RESULTS_PER_QUERY,
    ) -> ResearchReport:
        notes: list[SourceNote] = []
        seen_source_keys: set[str] = set()
        seen_content_hashes: set[str] = set()
        queries = self._initial_queries(topic, query_count)

        self.artifacts.write("queries", {"stage": "initial", "topic": topic, "queries": queries})

        for iteration in range(1, iterations + 1):
            self.progress(f"Research pass {iteration}/{iterations}: {', '.join(queries)}")
            self._collect_notes(
                topic,
                queries,
                results_per_query,
                seen_source_keys,
                seen_content_hashes,
                notes,
                iteration,
            )

            if iteration < iterations:
                gaps, queries = self._gap_follow_up_queries(topic, notes, query_count)
                self.artifacts.write(
                    "research_gaps",
                    {
                        "topic": topic,
                        "after_iteration": iteration,
                        "source_balance": _source_cluster_counts(notes),
                        "gaps": gaps,
                        "queries": queries,
                    },
                )
                self.artifacts.write(
                    "queries",
                    {
                        "stage": "follow_up",
                        "topic": topic,
                        "iteration": iteration + 1,
                        "gap_count": len(gaps),
                        "queries": queries,
                    },
                )

        report = self._write_report(topic, notes)
        self.artifacts.write(
            "reports",
            {
                "topic": topic,
                "source_count": len(notes),
                "generated_at": utc_now(),
                "markdown": report,
            },
        )

        return ResearchReport(
            topic=topic,
            markdown=report,
            sources=notes,
            artifacts_dir=str(self.artifacts.root) if self.artifacts.root is not None else None,
        )

I due set seen_* sono ciò che impedisce all’agente di sprecare tempo su fonti duplicate. La dedupe per URL becca i link ripetuti. La dedupe per content hash becca mirror, post syndicati e pagine che redirezionano allo stesso contenuto finale.

Pianificare ricerche iniziali e di follow-up

La prima chiamata al modello trasforma l’argomento in query di ricerca:

    def _initial_queries(self, topic: str, count: int) -> list[str]:
        prompt = dedent(
            f"""
            Create {count} diverse web search queries for researching this topic:
            {topic}

            Cover background, recent developments, primary sources, criticism, and data.
            Include at least one query likely to find primary sources or datasets.
            Return JSON only in this shape: {{"queries": ["..."]}}
            """
        ).strip()
        return self._query_list(prompt, count, fallback=[topic])

Dopo ogni pass di ricerca, l’agente aggiornato fa un passo deliberato di analisi delle lacune. Guarda le note attuali, conta i cluster di fonti per dominio, chiede a Venice quale copertura manca, scrive quelle lacune negli artefatti e poi usa le query risultanti per il pass successivo.

Inizia tracciando il bilanciamento delle fonti:

from urllib.parse import urlparse


def _source_cluster_counts(notes: list[SourceNote]) -> list[dict[str, object]]:
    total = len(notes)
    if total == 0:
        return []

    clusters: dict[str, list[str]] = {}
    for note in notes:
        cluster = _source_cluster(note)
        clusters.setdefault(cluster, []).append(note.source_id)

    return [
        {
            "cluster": cluster,
            "source_count": len(source_ids),
            "source_share": round(len(source_ids) / total, 3),
            "source_ids": source_ids,
        }
        for cluster, source_ids in sorted(
            clusters.items(), key=lambda item: (-len(item[1]), item[0])
        )
    ]


def _source_cluster(note: SourceNote) -> str:
    url = note.canonical_url or note.final_url or note.url
    host = urlparse(url).netloc.lower()
    if host.startswith("www."):
        host = host[4:]
    return host or "unknown"


def _source_balance_digest(notes: list[SourceNote], limit: int = 8) -> str:
    clusters = _source_cluster_counts(notes)
    if not clusters:
        return "No source clusters yet."

    total = len(notes)
    lines = [
        f"- {cluster['cluster']}: {cluster['source_count']}/{total} sources "
        f"({cluster['source_share']:.0%}); IDs: {', '.join(cluster['source_ids'])}"
        for cluster in clusters[:limit]
    ]
    return "\n".join(lines)

Questo dà all’agente un modo semplice per accorgersi di catture dei cluster di fonti. Se tutte le fonti vengono da un’unica azienda, un unico framework o un unico dominio, le query di follow-up dovrebbero deliberatamente allargare l’insieme delle fonti invece di raccoglierne altre dello stesso tipo. Ora usa quell’informazione di bilanciamento per creare ricerche di follow-up:

    def _follow_up_queries(self, topic: str, notes: list[SourceNote], count: int) -> list[str]:
        digest = _source_digest(notes, max_chars=9000)
        source_balance = _source_balance_digest(notes)
        prompt = dedent(
            f"""
            We are researching: {topic}

            Current notes:
            {digest}

            Source balance:
            {source_balance}

            Create {count} follow-up web search queries that fill gaps, verify important claims,
            find primary evidence, and look for dissenting evidence.
            If one source domain, vendor, framework, product, or perspective is overrepresented,
            deliberately broaden beyond it unless the topic explicitly asks for that focus.
            Return JSON only in this shape: {{"queries": ["..."]}}
            """
        ).strip()
        return self._query_list(prompt, count, fallback=[topic])

L’implementazione di riferimento più recente avvolge tutto questo in _gap_follow_up_queries(), che chiede a Venice di restituire sia record di lacune che query:

    def _gap_follow_up_queries(
        self, topic: str, notes: list[SourceNote], count: int
    ) -> tuple[list[dict[str, str]], list[str]]:
        if not notes:
            return [], [topic]

        digest = _source_digest(notes, max_chars=12000)
        source_balance = _source_balance_digest(notes)
        prompt = dedent(
            f"""
            Identify coverage gaps before the next research pass.

            Research topic:
            {topic}

            Current source notes:
            {digest}

            Source balance:
            {source_balance}

            Find important missing coverage that would improve a deep research report.
            Look specifically for primary sources, technical concepts, dissenting views,
            overrepresented source clusters, and claims that need verification.

            Return JSON only in this shape:
            {{"gaps": [{{"missing": "...", "why_it_matters": "...", "query": "..."}}],
              "queries": ["targeted web search query"]}}
            """
        ).strip()
        response = self.venice.chat(
            [
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": prompt},
            ],
            temperature=0.3,
            max_tokens=900,
        )

        data = json.loads(response)
        gaps = _clean_gap_records(data.get("gaps"))
        queries = _clean_string_list(data.get("queries"))
        if not queries:
            queries = [gap["query"] for gap in gaps if gap.get("query")]
        return gaps, queries[:count]

Quando --artifacts è abilitato, questi record vengono scritti in research_gaps.jsonl. Questo ti dà un utile audit trail del perché l’agente ha cercato una particolare query di secondo pass. Il parser dovrebbe essere indulgente. Se il modello restituisce JSON malformato, l’agente fa fallback all’argomento originale:

    def _query_list(self, prompt: str, count: int, fallback: list[str]) -> list[str]:
        response = self.venice.chat(
            [
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": prompt},
            ],
            temperature=0.4,
            max_tokens=500,
        )
        try:
            data = json.loads(response)
            queries = data.get("queries", [])
        except (json.JSONDecodeError, AttributeError):
            queries = []

        clean_queries = [
            query.strip()
            for query in queries
            if isinstance(query, str) and query.strip()
        ]
        return (clean_queries or fallback)[:count]

Questo pattern vale la pena di usarlo in tutto il codice degli agent: chiedi output strutturato, parsalo e fornisci un semplice fallback quando l’output non è utilizzabile.

Leggere e riassumere le fonti

Ora raccogliamo le source notes. L’agente cerca ogni query, recupera ogni risultato tramite Venice scrape, suddivide il Markdown in chunk e riassume l’evidenza utile.

    def _collect_notes(
        self,
        topic: str,
        queries: list[str],
        results_per_query: int,
        seen_source_keys: set[str],
        seen_content_hashes: set[str],
        notes: list[SourceNote],
        iteration: int,
    ) -> None:
        for query in queries:
            if self.max_sources is not None and len(notes) >= self.max_sources:
                return

            self.progress(f"Searching: {query}")
            try:
                results = self.web.search(query, limit=results_per_query)
            except Exception as exc:
                self._record_error("search", exc, query=query)
                continue

            self.artifacts.write(
                "search_results",
                {"iteration": iteration, "query": query, "results": results},
            )

            for result in results:
                if self.max_sources is not None and len(notes) >= self.max_sources:
                    return

                source_key = result.canonical_url or result.url
                if source_key in seen_source_keys:
                    self.artifacts.write("dedupe", {"reason": "canonical_url", "url": result.url})
                    continue

                seen_source_keys.add(source_key)
                source_id = f"S{len(notes) + 1}"
                note = self._read_source(topic, query, source_id, result, seen_source_keys, seen_content_hashes)
                if note is not None:
                    notes.append(note)

I singoli fallimenti di search e fetch non devono fermare l’intero run. Il web pubblico è disordinato. Alcune pagine bloccano lo scraping, alcune restituiscono PDF, alcune sono down e alcune redirezionano a posti inattesi. Un agente di ricerca dovrebbe continuare a muoversi e registrare cosa è fallito. Ecco il metodo di lettura della fonte:

    def _read_source(
        self,
        topic: str,
        query: str,
        source_id: str,
        result: SearchResult,
        seen_source_keys: set[str],
        seen_content_hashes: set[str],
    ) -> SourceNote | None:
        self.progress(f"Reading {source_id}: {result.title}")
        try:
            page = self.web.fetch(result)
        except Exception as exc:
            self._record_error("fetch", exc, query=query, url=result.url, source_id=source_id)
            return None

        if page.content_hash in seen_content_hashes:
            self.artifacts.write(
                "dedupe",
                {"reason": "content_hash", "source_id": source_id, "url": result.url},
            )
            return None
        seen_content_hashes.add(page.content_hash)

        chunks = self._summarize_chunks(topic, query, source_id, page)
        if not chunks:
            self._record_error("summarize_chunk", VeniceError("no chunks could be summarized"), url=result.url)
            return None

        summary = self._summarize_source(topic, query, source_id, page, chunks)
        note = SourceNote(
            source_id=source_id,
            title=page.title,
            url=result.url,
            canonical_url=page.canonical_url,
            final_url=page.final_url,
            query=query,
            rank=result.rank,
            snippet=result.snippet,
            provider=result.provider,
            retrieved_at=page.retrieved_at,
            content_type=page.content_type,
            content_hash=page.content_hash,
            chunks=chunks,
            summary=summary,
        )
        self.artifacts.write("source_notes", note)
        return note

Per ogni chunk di fonte, chiedi a Venice un breve riassunto dell’evidenza e citazioni esatte:

    def _summarize_chunks(
        self,
        topic: str,
        query: str,
        source_id: str,
        page: WebPage,
    ) -> tuple[EvidenceChunk, ...]:
        evidence: list[EvidenceChunk] = []
        for chunk in page.chunks[: self.max_chunks_per_source]:
            prompt = dedent(
                f"""
                Topic: {topic}
                Search query: {query}
                Source ID: {source_id}
                Chunk ID: {chunk.chunk_id}
                Source title: {page.title}
                Source URL: {page.final_url}

                Source chunk:
                {chunk.text}

                Extract only evidence relevant to the topic.
                Return JSON only in this shape:
                {{"summary": "...", "quotes": ["short exact quote", "..."]}}
                """
            ).strip()

            try:
                response = self.venice.chat(
                    [
                        {"role": "system", "content": SYSTEM_PROMPT},
                        {"role": "user", "content": prompt},
                    ],
                    temperature=0.1,
                    max_tokens=600,
                )
                data = json.loads(response)
                evidence.append(
                    EvidenceChunk(
                        chunk_id=chunk.chunk_id,
                        text=chunk.text,
                        summary=str(data.get("summary", "")).strip(),
                        quotes=tuple(
                            quote.strip()
                            for quote in data.get("quotes", [])
                            if isinstance(quote, str) and quote.strip()
                        ),
                    )
                )
            except Exception as exc:
                self._record_error("summarize_chunk", exc, query=query, url=page.final_url, source_id=source_id)
                continue

        return tuple(evidence)

Poi collassa i riassunti dei chunk in una source note:

    def _summarize_source(
        self,
        topic: str,
        query: str,
        source_id: str,
        page: WebPage,
        chunks: tuple[EvidenceChunk, ...],
    ) -> str:
        chunk_digest = _chunk_digest(chunks, max_chars=9000)
        prompt = dedent(
            f"""
            Topic: {topic}
            Search query: {query}
            Source ID: {source_id}
            Source title: {page.title}
            Source URL: {page.final_url}

            Chunk evidence:
            {chunk_digest}

            Synthesize a source note using only the chunk evidence. Include:
            - key facts with dates/numbers where present
            - any limitations or bias in the source
            - useful exact wording from quotes if it is short

            Keep the note under 180 words and refer to the source as [{source_id}].
            """
        ).strip()
        return self.venice.chat(
            [
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": prompt},
            ],
            temperature=0.1,
            max_tokens=500,
        )

Questa sommarizzazione in due passaggi è la parte che fa sembrare l’agente più affidabile di un semplice script “riassumi questi URL”. Il modello legge prima i chunk delle fonti, poi scrive una nota a livello di fonte a partire da quei pezzi di evidenza estratti.

Scrivere il report finale

Una volta che l’agente ha le source notes, può scrivere il report. Inizia con un report writer a singolo pass:

    def _write_report(self, topic: str, notes: list[SourceNote]) -> str:
        if not notes:
            return (
                f"# Research report: {topic}\n\n"
                "No usable web sources were collected. Check your network connection or try a narrower topic."
            )

        prompt = dedent(
            f"""
            Research topic:
            {topic}

            Source notes:
            {_source_digest(notes, max_chars=45000)}

            Write a detailed source-backed Markdown research survey.

            Requirements:
            - Start with a precise H1 title.
            - Open with "## Overview".
            - Use topic-specific sections.
            - Use footnote-style citation markers like [^1] and [^2].
            - Do not cite with internal source IDs like [S1] in the report body.
            - Do not include uncited factual claims.
            - Avoid source-cluster capture from one vendor, domain, framework, or viewpoint.
            - Include uncertainty, contradictions, and missing context where relevant.
            - End with "## References" as a numbered list ordered by first citation.
            """
        ).strip()

        return self.venice.chat_stream(
            [
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": prompt},
            ],
            temperature=0.2,
            max_tokens=7000,
        )

L’implementazione di riferimento va oltre per i report deep: chiede a Venice un’outline, redige ogni sezione del report separatamente e poi chiede un pass finale di editor per assemblare il report finito e convertire gli ID interni delle fonti in citazioni in stile footnote. Quell’approccio a fasi è utile quando vuoi output di ricerca long-form perché un unico prompt gigante spesso comprime troppo. I prompt aggiornati spingono inoltre il report verso un survey ampio e source-backed invece di una sottile guida decisionale. Se la base delle fonti è sbilanciata verso un cluster, il prompt dell’editor dice a Venice di riconoscere quello sbilanciamento ed evitare di presentarlo come rappresentativo dell’intero campo. Aggiungi gli helper di digest:

def _chunk_digest(chunks: tuple[EvidenceChunk, ...], max_chars: int) -> str:
    parts = []
    for chunk in chunks:
        quote_text = "; ".join(chunk.quotes)
        parts.append(
            f"{chunk.chunk_id}: {chunk.summary}"
            + (f"\nQuotes: {quote_text}" if quote_text else "")
        )
    return "\n\n".join(parts)[:max_chars]


def _source_digest(notes: list[SourceNote], max_chars: int) -> str:
    chunks = [
        "\n".join(
            [
                f"[{note.source_id}] {note.title}",
                f"URL: {note.final_url or note.url}",
                f"Canonical URL: {note.canonical_url}",
                f"Found via: {note.query}",
                f"Provider/rank: {note.provider}/{note.rank}",
                f"Retrieved: {note.retrieved_at}",
                f"Content hash: {note.content_hash}",
                f"Note: {note.summary}",
                f"Chunk evidence: {_chunk_digest(note.chunks, max_chars=1000)}",
            ]
        )
        for note in notes
    ]
    return "\n\n".join(chunks)[:max_chars]

Infine, aggiungi la registrazione degli errori:

    def _record_error(
        self,
        stage: str,
        exc: Exception,
        *,
        query: str = "",
        url: str = "",
        source_id: str = "",
        provider: str = "",
    ) -> None:
        message = str(exc)
        self.progress(f"{stage.replace('_', ' ').title()} failed: {message}")
        self.artifacts.write(
            "errors",
            CollectionError(
                stage=stage,
                message=message,
                query=query,
                url=url,
                source_id=source_id,
                provider=provider,
            ),
        )

A questo punto, il loop di ricerca principale è in piedi.

Aggiungere la CLI

Ora ci serve un entry point da command-line. Crea main.py:

from __future__ import annotations

import argparse
from pathlib import Path

from dotenv import load_dotenv

from research_agent.agent import (
    DEFAULT_ITERATIONS,
    DEFAULT_MAX_CHUNKS_PER_SOURCE,
    DEFAULT_MAX_SOURCES,
    DEFAULT_QUERY_COUNT,
    DEFAULT_REPORT_STYLE,
    DEFAULT_RESULTS_PER_QUERY,
    ResearchAgent,
)
from research_agent.artifacts import ArtifactWriter
from research_agent.venice import VeniceClient, VeniceError
from research_agent.web import WebSearch


def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        description="Run a minimal deep research agent powered by Venice AI.",
    )
    parser.add_argument("topic", nargs="+", help="Research topic, wrapped in quotes for best results.")
    parser.add_argument("--model", help="Venice model name. Defaults to VENICE_MODEL or openai-gpt-55.")
    parser.add_argument("--iterations", type=int, default=DEFAULT_ITERATIONS)
    parser.add_argument("--queries", type=int, default=DEFAULT_QUERY_COUNT)
    parser.add_argument("--results", type=int, default=DEFAULT_RESULTS_PER_QUERY)
    parser.add_argument("--output", "--markdown-output", dest="output", type=Path)
    parser.add_argument("--artifacts", type=Path, help="Optional directory for JSONL research artifacts.")
    parser.add_argument("--providers", default="duckduckgo", help="Comma-separated providers: duckduckgo, arxiv.")
    parser.add_argument("--max-sources", type=int, default=DEFAULT_MAX_SOURCES)
    parser.add_argument("--chunk-chars", type=int, default=3000)
    parser.add_argument("--max-chunks-per-source", type=int, default=DEFAULT_MAX_CHUNKS_PER_SOURCE)
    parser.add_argument(
        "--report-style",
        choices=["brief", "standard", "deep"],
        default=DEFAULT_REPORT_STYLE,
        help=f"Final report depth. Default: {DEFAULT_REPORT_STYLE}.",
    )
    parser.add_argument("--quiet", action="store_true", help="Hide progress messages.")
    return parser.parse_args()

La CLI espone le manopole che effettivamente regolerai durante la ricerca:

Opzione	Cosa controlla
`--iterations`	Numero di pass di ricerca
`--queries`	Query di ricerca generate per pass
`--results`	Risultati letti per provider per ogni query
`--providers`	Provider di ricerca, come `duckduckgo` o `duckduckgo,arxiv`
`--max-sources`	Numero massimo di fonti utili da raccogliere
`--chunk-chars`	Dimensione approssimativa dei chunk prima dell’estrazione dell’evidenza dalla fonte
`--max-chunks-per-source`	Numero di chunk riassunti per fonte
`--report-style`	Profondità del report finale: `brief`, `standard` o `deep`
`--artifacts`	Directory per record JSONL di audit
`--output`	Path per il report Markdown finale

Ora cabla tutto insieme:

def main() -> int:
    load_dotenv()
    args = parse_args()
    topic = " ".join(args.topic)

    try:
        venice = VeniceClient.from_env(model=args.model)
        progress = None if args.quiet else lambda message: print(f"[agent] {message}")
        provider_names = [name.strip() for name in args.providers.split(",") if name.strip()]

        with WebSearch.from_provider_names(
            provider_names,
            chunk_chars=args.chunk_chars,
            scraper=venice.scrape,
        ) as web:
            agent = ResearchAgent(
                venice=venice,
                web=web,
                artifacts=ArtifactWriter(args.artifacts),
                progress=progress,
                max_sources=args.max_sources,
                max_chunks_per_source=args.max_chunks_per_source,
                report_style=args.report_style,
            )
            report = agent.run(
                topic,
                iterations=args.iterations,
                query_count=args.queries,
                results_per_query=args.results,
            )
    except ValueError as exc:
        print(f"Configuration error: {exc}")
        return 1
    except VeniceError as exc:
        print(f"Venice API error: {exc}")
        return 1

    if args.output:
        args.output.parent.mkdir(parents=True, exist_ok=True)
        args.output.write_text(report.markdown, encoding="utf-8")
        print(f"\nSaved report to {args.output}")
    else:
        print()
        print(report.markdown)

    if report.artifacts_dir:
        print(f"Saved research artifacts to {report.artifacts_dir}")

    return 0


if __name__ == "__main__":
    raise SystemExit(main())

Questo ci dà una CLI di ricerca locale funzionante.

Eseguire l’agente

Esegui un rapido pass di ricerca:

uv run python main.py "How are AI agents changing software engineering workflows?"

Scrivi il report in un file Markdown:

uv run python main.py "state of open source LLM inference in 2026" \
  --output reports/inference.md

Usa più fonti e più provider:

uv run python main.py "agentic coding research" \
  --providers duckduckgo,arxiv \
  --iterations 3 \
  --queries 5 \
  --results 4 \
  --max-sources 12

Scegli lo stile del report finale:

uv run python main.py "AI agents in software engineering" --report-style deep

Usa brief per un briefing conciso source-backed, standard per un survey più completo e deep per il workflow a fasi outline/section/editor. Salva artefatti auditable:

uv run python main.py "privacy tradeoffs in hosted LLM APIs" \
  --output reports/privacy.md \
  --artifacts runs/privacy

Quando gli artefatti sono abilitati, vedrai file come:

runs/privacy/
  queries.jsonl
  research_gaps.jsonl
  search_results.jsonl
  fetches.jsonl
  source_chunks.jsonl
  chunk_summaries.jsonl
  source_notes.jsonl
  dedupe.jsonl
  errors.jsonl
  report_outline.jsonl
  report_sections.jsonl
  report_editor.jsonl
  reports.jsonl

Questi file sono utili quando vuoi capire come l’agente ha raggiunto una conclusione. Ad esempio, source_notes.jsonl mostra l’evidenza riassunta delle fonti, research_gaps.jsonl mostra perché sono state generate ricerche di follow-up ed errors.jsonl mostra le pagine che sono fallite durante search, scraping o sommarizzazione.

Note su privacy e affidabilità

Un agente di ricerca tocca diversi sistemi, quindi aiuta essere precisi su cosa va dove:

Confini dei dati dell'agente di ricerca privato

Layer	Cosa vede i dati
CLI locale	Argomento, configurazione, source notes, artefatti e report finali restano sulla tua macchina
Provider di ricerca	Le query di ricerca vengono inviate al provider che scegli, come DuckDuckGo o arXiv
Venice scrape	Gli URL di fonti pubbliche vengono inviati all’endpoint di scrape di Venice
Venice chat completions	Prompt, chunk delle fonti, source notes e istruzioni di generazione del report vengono inviati a Venice
File di output	I report Markdown e gli artefatti JSONL vengono scritti localmente

Se vuoi mantenere più della catena di ricerca all’interno di Venice, puoi adattare il layer del provider per chiamare l’endpoint POST /augment/search di Venice invece di interrogare DuckDuckGo direttamente. L’implementazione di riferimento usa provider pubblici leggeri così la demo resta facile da eseguire e comprendere. Per l’affidabilità, mantieni questi default conservativi:

Usa retry per le chiamate Venice e le richieste web.
Aggiungi un piccolo --request-delay se stai leggendo molte pagine dallo stesso host.
Limita --max-sources così argomenti ampi non corrono indefinitamente.
Salva --artifacts per report importanti così puoi auditare l’output finale.
Tratta il report come un briefing, non come verità di base. Segui le citazioni fino alla fonte originale quando l’accuratezza conta.

Testare i pezzi

Non hai bisogno di richieste web live o chiamate Venice per testare la maggior parte del sistema. Il repo di riferimento usa classi fake di Venice e fake web per testare il loop di ricerca, il comportamento di dedupe, gli artefatti e i prompt del report. Un utile primo test è la canonicalizzazione degli URL:

from research_agent.models import canonicalize_url


def test_canonicalize_url_removes_tracking_params():
    url = "https://example.com/post?utm_source=x&b=2&a=1#section"
    assert canonicalize_url(url) == "https://example.com/post?a=1&b=2"

Poi testa che il contenuto duplicato venga saltato:

from research_agent.models import SearchResult, WebPage, chunk_text


class FakeWeb:
    def search(self, query: str, limit: int = 5) -> list[SearchResult]:
        return [
            SearchResult(title="First source", url="https://example.com/a", snippet="snippet"),
            SearchResult(title="Mirror", url="https://example.com/b", snippet="snippet"),
        ]

    def fetch(self, result: SearchResult) -> WebPage:
        text = "This page contains relevant evidence. " * 5
        return WebPage(
            title=result.title,
            url=result.url,
            final_url=result.url,
            text=text,
            content_hash="same-content",
            chunks=chunk_text(text, chunk_chars=80, overlap=10),
        )

I fake rendono i test degli agent molto più veloci e meno flaky. Puoi verificare la logica di orchestrazione senza affidarti a risultati di ricerca live, condizioni di rete o output del modello.

Benchmarking

Molti provider AI hanno ora i propri workflow di deep research, quindi il repo di riferimento include un semplice benchmark contro il tool Deep Research di Perplexity. A entrambi gli agenti è stato chiesto di scrivere un report sull’architettura dei framework di AI agent, poi i report generati sono stati committati nel repo GitHub. Non è inteso essere un benchmark formale. È un modo pratico per ispezionare la struttura del report, la copertura delle fonti, la qualità delle citazioni e se l’agente si concentra troppo su un singolo cluster di fonti. È anche per questo che l’implementazione aggiornata traccia research_gaps.jsonl e il bilanciamento delle fonti prima delle ricerche di follow-up.

Estendere questo esempio

Una volta che l’agente baseline funziona, ecco alcuni modi pratici per migliorarlo:

Aggiungi un Venice search provider usando POST /augment/search.
Memorizza report e artefatti in un piccolo database SQLite invece che in file JSONL.
Aggiungi allowlist o blocklist di fonti per domini di ricerca affidabili.
Aggiungi il supporto PDF combinando Venice scrape con il parsing di documenti per fonti che non espongono HTML pulito.
Aggiungi un evaluation set di argomenti e tipi di fonti attesi così puoi confrontare la qualità della ricerca dopo modifiche ai prompt.
Aggiungi uno step di review che chieda a Venice di trovare affermazioni non supportate nel report finale prima di salvarlo.

L’upgrade più grande è di solito una migliore selezione delle fonti. La generazione di query aiuta, ma puoi anche migliorare la qualità preferendo fonti primarie, documenti di standard, doc ufficiali, paper, changelog e pagine di dataset rispetto ai riassunti a basso segnale.

Per concludere

Grazie per aver letto! Speriamo che ti abbia aiutato a costruire un agente di ricerca privato pratico con Python e l’API Venice. Il pattern utile qui non è solo “chiedi a un modello di ricercare qualcosa”. È spezzare la ricerca in passi auditable: pianifica le ricerche, raccogli le fonti, estrai l’evidenza, scrivi le source notes, fai follow-up sulle lacune e sintetizza con citazioni. Mantenendo quei passi espliciti, otteniamo un workflow di ricerca più facile da ispezionare, testare e migliorare nel tempo.

​Cosa costruiremo

​Configurare il progetto

​Creare i data model

​Costruire il client Venice

​Aggiungere i search provider

​Scrivere artefatti locali

​Costruire l’agente di ricerca

​Pianificare ricerche iniziali e di follow-up

​Leggere e riassumere le fonti

​Scrivere il report finale

​Aggiungere la CLI

​Eseguire l’agente

​Note su privacy e affidabilità

​Testare i pezzi

​Benchmarking

​Estendere questo esempio

​Per concludere