Grados

Latest version: v0.6.33

Safety actively analyzes 945810 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 4

0.6.9

Added
- Added `scripts/release.py` to bump plugin manifest versions, create the release commit/tag sequence, and optionally push in one command.

Changed
- Changed package versioning from dual static declarations (`pyproject.toml` + `__init__.py`) to `hatch-vcs` dynamic versioning derived from git tags, so normal releases no longer require a manual Python package version bump.
- Changed `src/grados/__init__.py` to read the installed package version via `importlib.metadata.version()` instead of a hardcoded string.
- Changed `publish.yml` to drop the redundant tag-vs-pyproject version verification step after switching to git-tag-derived package versions.

0.6.8

Changed
- Changed the local indexing defaults from Harrier 0.6B / `max_length=32768` to `microsoft/harrier-oss-v1-270m` with `max_length=4096`; Harrier 0.6B remains available as an explicit opt-in for roomier machines.
- Changed section-aware chunking so overlong single paragraphs are re-split by sentence or clause with small overlap before embedding, preventing giant one-paragraph chunks from exploding memory during `grados reindex`.
- Changed embedding runtime diagnostics so `grados setup`, `grados status`, `grados update-db`, and `grados reindex` now surface `max_length`, batch sizing, and clearer OOM guidance instead of opaque allocator failures.

Tests
- Added regression coverage for overlong single-paragraph chunk splitting, conservative local batching, and OOM diagnostic surfacing in the embedding backend.

0.6.7

Added
- Added Phase A indexing configuration (`config.indexing`) with Harrier 0.6B as the default local embedding model.
- Added a dedicated embedding backend abstraction with explicit query/document separation, Harrier prompt support, and model warmup in `grados setup`.
- Added `grados reindex` plus index-manifest compatibility checks so model/chunking changes fail loudly instead of silently mixing old and new embeddings.
- Added `grados client install|list|doctor|remove` so Claude Code and Codex can be registered from the GRaDOS CLI, including bundled skill installation.
- Added native plugin distribution metadata for Claude Code and Codex, including `.claude-plugin/plugin.json`, `.claude-plugin/marketplace.json`, `.agents/plugins/marketplace.json`, `plugins/grados/.codex-plugin/plugin.json`, and plugin-scoped `plugin.mcp.json` copies for both plugin surfaces.
- Added Stage B research-state persistence in `database/research.sqlite3`, including reusable artifact storage and local failure memory.
- Added 8 Stage B MCP tools: `save_research_artifact`, `query_research_artifacts`, `manage_failure_cases`, `get_citation_graph`, `get_papers_full_context`, `build_evidence_grid`, `compare_papers`, and `audit_draft_support`.
- Added a lightweight local citation graph layer by extracting reference DOIs into canonical paper metadata (`cites_json`) and exposing neighbor/common-reference/reverse-citation queries.
- Added a canonical full-text normalization layer so publisher-native XML/HTML and document-style inputs are converted into a shared Markdown contract before indexing and deep reading.
- Added absolute paragraph-coordinate metadata (`paragraph_start`, `paragraph_count`) to retrieval chunks so search hits can be mapped back to canonical source paragraphs in `papers/*.md`.
- Added a typed `PaperSearchResult` boundary for the high-frequency local retrieval path so internal search results no longer have to propagate as another loose `dict[str, Any]` contract.
- Added `tests/LIVE_CHECKS.md` to separate offline contract fixtures from manually triggered live checks for Elsevier, Springer, browser fetch, and local import validation.

Changed
- Changed semantic retrieval from chunk-only search to abstract-first docs → chunks two-stage retrieval.
- Changed chunking from fixed 1000-character paragraph packing to section-aware chunking with overlap metadata.
- Changed `grados setup` to always prepare browser and embedding runtime assets directly, instead of splitting them across `--all` / `--with`.
- Changed `grados status` to report embedding runtime details, active model, and reindex requirements.
- Changed the repo-local MCP example from the removed `uvx grados[all]` path to the current `uvx grados`.
- Changed the Codex plugin packaging to follow the official local marketplace layout more closely, with `.agents/plugins/marketplace.json` pointing at the self-contained `plugins/grados/` bundle instead of the repo root.
- Changed the local paper contract from "search and deep read only" to a broader Stage B research surface with explicit artifacts, failure memory, citation graph, CAG context packs, and draft-support auditing.
- Changed the skill and README documentation to reflect the expanded 16-tool MCP surface, the `grados client install ...` workflow, and the merged writing-stage guidance in `skills/grados/SKILL.md`.
- Changed the default parser/install surface so `uv tool install grados` now includes Docling by default; `grados[docling]` remains as a compatibility alias and `PyMuPDF` is now a fallback parser behind `Docling -> MinerU -> PyMuPDF`.
- Changed source-of-truth semantics so `papers/*.md` is now the user-facing canonical full-text store, while `database/chroma` is treated as a rebuildable retrieval index.
- Changed `search_saved_papers` from returning index-resident snippets to an "index recall + canonical reread" flow that resolves final evidence windows from `papers/*.md`.
- Changed Elsevier full-text handling from JSON `originalText` as the primary path to XML-first deterministic parsing, preserving publisher-native sections, authors, keywords, and references before rendering canonical Markdown.
- Changed Springer native full-text handling so publisher XML/HTML now enters the shared normalization pipeline instead of being flattened early into ad hoc plain text.
- Changed canonical paper frontmatter and reindex behavior so `authors/year/journal` survive in `papers/*.md`, allowing `grados reindex` to rebuild retrieval metadata from the source library alone.
- Changed the index manifest to schema version `3` with chunking strategy `section-aware-v2`; existing local indexes must be rebuilt with `grados reindex`.
- Changed save/import/parse receipts so Chroma indexing failures are surfaced as warnings / partial-success instead of being silently swallowed after the canonical Markdown file is written.
- Changed Marker parsing so `config.extract.parsing.marker_timeout` now enforces a real subprocess timeout instead of being a dead config knob; timed-out Marker runs now fall back cleanly to the next parser.
- Changed parser runtime setup and diagnostics so `grados setup` now prewarms Docling models, while Docling/Marker failures are surfaced through standardized warning/debug messages instead of silent fallbacks.
- Changed local saved-paper retrieval so lexical fallback and result snippets now prefer canonical content from `papers/*.md` when available, instead of continuing to lean on Chroma doc copies for the final returned evidence text.
- Changed local saved-paper retrieval so overlapping chunk hits for the same paper are merged into a single canonical paragraph window before evidence is returned, reducing duplicate or fragmented excerpts.
- Changed canonical save ordering so `save_paper_markdown()` writes `papers/*.md` before refreshing Chroma, preventing index-only state when canonical Markdown writes fail.
- Changed canonical paper frontmatter handling to use `python-frontmatter` + `PyYAML` for save/read/list flows, so multiline YAML values and colon-rich metadata round-trip correctly through `papers/*.md`.
- Changed `list_saved_papers()` frontmatter scanning to read until the closing `---` marker (bounded to 4 KB) instead of truncating metadata after 500 characters.
- Changed publisher fetch handling so `metadata_only` outcomes, typed publisher metadata, and asset hints now survive the TDM waterfall into user-visible extraction receipts, instead of collapsing into generic fetch failures.
- Changed OA/Sci-Hub fetch failures and Chroma filter/projection fallbacks to surface warnings, degraded-filter markers, and logged exceptions instead of silently dropping into opaque fallback behavior.
- Changed embedding backend loading to use a process-local cache keyed by backend-significant config, so repeated `grados setup`, `index_paper()`, and `search_papers()` calls in one process reuse the same heavy model runtime instead of reinitializing it.
- Changed local citation-graph analysis so `research_tools.get_citation_graph` now rebuilds local citation relationships from canonical records in `papers/*.md` instead of depending on Chroma doc listings as an internal source.
- Changed the canonical paper-store boundary so `load_paper_record()` and `list_saved_papers()` now return explicit dataclasses, with `server`, `importing`, and `research_tools` migrated to attribute-based access instead of loose dict payloads.
- Changed typed local-search results from transitional dict-compatible wrappers to plain `PaperSearchResult` dataclasses, removing temporary `.get(...)` / item-access compatibility shims after callers were migrated.
- Changed Stage B research helpers so their internal result boundaries are now explicit dataclasses, with MCP-facing handlers serializing them only at the outer boundary instead of propagating nested dict payloads through the service layer.
- Changed browser fetch and local index-stat payloads to typed result objects, reducing remaining high-frequency `dict[str, Any]` contracts in fetch/search orchestration paths.
- Changed the MCP server layout from one monolithic `server.py` file to a thin entrypoint plus domain registration modules: `search_tools`, `library_tools`, `research_tools_api`, and `admin_tools`.
- Changed `fetch`, `parse`, and browser automation orchestration from hard-coded `if/elif` waterfalls to static strategy registries, so new publishers, parsers, and browser flows can be added without inflating the core dispatch loops.
- Changed the TDM stage from publisher-name branching to a provider registry, and changed non-PDF normalization to a format resolver that maps `markdown/text/html/xml` inputs onto explicit normalization strategies.
- Changed local saved-paper retrieval to use an index-first candidate pipeline before canonical hydration, so search and Stage B audit tools only reread candidate `papers/*.md` files instead of reopening the whole library on each query.
- Changed `audit_draft_support` so `misattributed` remains reserved for resolvable author-year citations; numeric citations now stay in a conservative support-only mode until bibliography mapping exists.
- Changed storage internals so `storage/vector.py` now acts as a thinner facade over dedicated `chunking`, `chroma_client`, and `hydration` helpers, and `research_tools` now consumes public chunking APIs instead of importing private vector symbols.

Removed
- Removed the Claude-only startup hook at `hooks/hooks.json`.
- Removed the separate `grados-writing` skill split; its useful Stage B writing guidance now lives in `skills/grados/SKILL.md`.

Tests
- Added regression coverage for keychain-backed secret resolution, automatic `config.json` secret migration + plaintext clearing, and the `grados auth` CLI flows.
- Added regression coverage for remote-metadata helper upserts/queries, search-time metadata-cache population, extract-time `metadata_only`/`challenge`/`fulltext` status backfills, and canonical `paper_id` / `doc_id` metadata joins.
- Added regression coverage for phase-1 corpus defaults so new canonical saves write `corpus/tier/workset` metadata and older Chroma records without those fields still hydrate as `canonical/stable`.
- Added regression coverage for browser-first fetch strategy defaults, legacy fetch-strategy alias compatibility, preserved browser challenge states, browser success short-circuiting, and user-facing `Via/State` receipt lines.
- Added Stage B smoke coverage for research artifacts, failure memory, citation graphs, full-context retrieval, evidence grids, paper comparison, and draft-support auditing.
- Added smoke coverage for client install flows and plugin manifests.
- Added regression coverage for Docling-first parsing, Elsevier XML deterministic normalization, and canonical paragraph reread after Chroma retrieval.
- Added end-to-end regression coverage for the full "index recall + canonical reread" path, including user-facing `search_saved_papers` output after an indexed paper's canonical Markdown file is updated.
- Added regression coverage for fetch/parser/browser strategy registries so order preservation and unknown-strategy filtering stay stable during future extensions.
- Added regression coverage for canonical-Markdown-first saves so failed `papers/*.md` writes cannot leave Chroma in an index-only state.
- Added regression coverage for YAML frontmatter round-trips, long-header saved-paper listing, and visible Chroma/OA/Sci-Hub fallback warnings.
- Added regression coverage for process-local embedding cache reuse and invalidation, including shared backend reuse across `index_paper()` and `search_papers()`.
- Added offline contract-fixture coverage for Elsevier metadata fallback, Springer waterfall fallback, browser anti-bot HTML masquerading as PDF, and nested local-import warning paths.
- Added regression coverage for metadata-only extraction receipts, typed publisher metadata persistence, candidate-only canonical hydration, and numeric-citation support-only auditing.

0.6.6

**GRaDOS 完成了从 TypeScript/Node.js 到 Python 的完整重写。** 自本版本起,GRaDOS 是一个纯 Python MCP 服务器,以标准 PyPI 包形式分发,不再需要 Node.js 运行时。0.6.5 中的全部 TS 能力已在 Python 实现中延续。

Added — Runtime & Packaging

- Rewrote the entire codebase (~6K LoC TypeScript → ~3.5K LoC Python) as a `hatchling`-built Python package (`src/grados/`).
- Added `uv tool install "grados[all]"` as the primary installation path; `uvx "grados[all]"` for zero-install MCP client configuration.
- Added 7 optional dependency groups: `semantic`, `zotero`, `ocr`, `marker`, `docling`, `all`, `full`.
- Historical note (2026-04-05): the packaging surface was later simplified after runtime and dependency audits. Current public install paths are `uv tool install grados`, `uvx grados`, and compatibility extras `grados[docling]`, `grados[marker]`, `grados[full]`.
- Added `py.typed` (PEP 561) marker for downstream type-checker support.
- Added `[tool.hatch.build.targets.sdist]` exclude rules to keep source distributions clean.
- Added CI workflow for pre-publish verification and post-publish PyPI smoke tests.

Added — CLI

- Added `grados setup [--all] [--with browser,models]`: interactive setup wizard with runtime asset downloads.
- Added `grados status`: health check displaying versions, dependencies, API keys, and runtime assets.
- Added `grados paths`: file path overview with file counts and mode detection.
- Added `grados update-db`: batch-index `papers/` into ChromaDB.
- Added `grados import-pdfs --from /path [--recursive] [--glob] [--copy-to-library]`: bulk local PDF library import.
- Added `grados migrate-config`: legacy TS installation migration (compatibility command).
- Added `grados version`: version display.

Added — MCP Tools & Resources

- Added `get_saved_paper_structure` tool: deterministic structural navigation (title, section outline, preview, word count, assets summary) for low-token decision-making before deep reads.
- Added `import_local_pdf_library` tool: agent-facing entry point for batch PDF import with DOI inference, content-hash dedup, and progress summary.
- Added `grados://papers/index` resource: list all saved papers with canonical metadata.
- Added `grados://papers/{safe_doi}` resource template: low-token paper overview (not full text).

Added — Canonical Storage (ChromaDB-first)

- Added canonical-first Chroma architecture with two collections:
- `papers_docs`: one document-level record per paper (full normalized Markdown + structured metadata).
- `papers_chunks`: retrieval-optimized chunks with DOI and section metadata.
- Added canonical paper schema: `doi`, `safe_doi`, `title`, `authors`, `year`, `journal`, `source`, `fetch_outcome`, `content_markdown`, `section_headings`, `assets_manifest_path`, `content_hash`, `indexed_at`.
- Added `search_saved_papers` metadata prefilter: `doi`, `authors`, `year_from`, `year_to`, `journal`, `source`.
- Added hybrid retrieval: dense embedding search + `where_document` lexical constraints + paper-level aggregation + lightweight heuristic reranking.
- Added in-process ChromaDB with ONNX all-MiniLM-L6-v2 default embedding (no PyTorch required).

Added — Asset Management

- Added manifest-first asset model: `save_asset_manifest` persists figure/table/object metadata to `papers/_assets/{safe_doi}.json`.
- Added Elsevier and Springer asset hint passthrough from publisher APIs to the extraction save pipeline.
- Added asset summary integration in `get_saved_paper_structure` and paper resources.

Added — Configuration

- Added `~/GRaDOS/` as the default non-hidden data root (cross-platform; customizable via `GRADOS_HOME`).
- Added Pydantic v2 configuration model hierarchy: `GRaDOSConfig`, `SearchConfig`, `ExtractConfig`, `FetchStrategyConfig`, `TDMConfig`, `SciHubConfig`, `HeadlessBrowserConfig`, `ParsingConfig`, `QAConfig`, `ZoteroConfig`, `ApiKeysConfig`.
- Added `extract.tdm.order` / `extract.tdm.enabled` for per-publisher TDM configuration.
- Added automatic camelCase-to-snake_case JSON key conversion for backward compatibility with existing `config.json` files.

Added — Tests

- Added 9 test files (30 test functions) covering CLI, server tools, resources, storage, search, browser, parsing, PDF import, and migration.
- Added `[tool.pytest.ini_options]` filterwarnings for upstream ChromaDB/ONNX deprecation warnings.

Changed

- Replaced Node.js + TypeScript runtime with pure Python (≥ 3.11).
- Replaced `puppeteer-core` (already migrated to Patchright in 0.6.1) with Python Patchright for browser automation.
- Replaced `mcp-local-rag` external MCP server dependency with in-process ChromaDB semantic search.
- Changed paper storage from Markdown-file-as-truth to ChromaDB-canonical with optional Markdown mirror.
- Changed `search_saved_papers` from title/DOI-only lexical fallback to metadata-filtered dense retrieval with paper-level aggregation.
- Changed `extract_paper_full_text` return contract to compact receipt (not full text), leaving deep reading to `read_saved_paper`.
- Changed `grados://papers/{safe_doi}` from full-text resource to low-token overview, separating navigation from deep reading.
- Changed fetch waterfall TDM stage from hardcoded publisher order to config-driven `extract.tdm.order` / `extract.tdm.enabled`.
- Updated both READMEs to reflect Python installation, CLI, tool contracts, and citation-aware writing workflow.
- Updated `skills/grados/SKILL.md` and `skills/grados/references/tools.md` for the citation-aware `search → structure → deep read → cite → verify` protocol.
- Updated `.mcp.json` to use `uvx` as the MCP server command.

Fixed

- Fixed 42 mypy strict-mode type errors across 11 source files (return types, BS4 attribute casts, generic parameters, ChromaDB `Any` returns).
- Fixed duplicate `dev` dependency declaration (removed from `[project.optional-dependencies]`, kept in `[dependency-groups]`).
- Fixed non-standard `__dataclass_fields__` access in `resumable.py` with `dataclasses.fields()`.
- Fixed Playwright `ViewportSize` type mismatch with targeted `type: ignore[arg-type]`.

Removed

- Removed the entire TypeScript codebase (`src/index.ts`, `src/resumable-search.ts`, `tsconfig.json`, `package.json`, `package-lock.json`).
- Removed all Node.js test scripts (`tests/*.mjs`).
- Removed the Claude Code plugin distribution path (`.claude-plugin/`, `commands/`); retained MCP + skill structure.
- Removed `SemanticScholar` and `OpenAlex` from default search source configuration (not present in TS original).

Docs

- Added `grados-python-implementation-plan.md` as the authoritative engineering plan and completion ledger.
- Consolidated documentation roles: `grados-python-migration-plan.md`, `status.md`, `docs/claude-code-plugin-guide.md`, `docs/global-install-guide.md` retained as historical references.
- Updated both READMEs with Python installation, `uv`/`uvx` commands, tool contract descriptions, and citation-aware writing workflow.
- Updated skill protocol to `search → structure → deep read → cite → verify`.

0.6.5

Final TypeScript-era feature release. These capabilities were subsequently carried forward into the Python rewrite (0.6.6).

Added
- Added a structured publisher-fetch outcome model covering cases such as `native_full_text`, `metadata_only`, `publisher_challenge`, `publisher_pdf_obtained`, and `publisher_html_instead_of_pdf`.
- Added centralized ScienceDirect candidate extraction, intermediate redirect parsing, PDF validation, Elsevier metadata extraction, and benchmark-log helpers.
- Added debug-gated fetch benchmarking and diagnostics output, including optional benchmark summaries in failure paths.
- Added `status.md` as the project-wide engineering status document.
- Added a managed-browser bootstrap flow:
- `grados --init` best-effort prepares a dedicated Playwright-managed Chrome for Testing cache
- `grados --prepare-browser` can re-run browser bootstrap later without regenerating the config
- a dedicated persistent GRaDOS browser profile under the managed data root
- Added a managed browser data layout designed to stay stable across future packaging work.

Changed
- Refactored Elsevier retrieval so no-view metadata responses are treated as first-class `metadata_only` results instead of hard failures.
- Refactored browser automation toward a reusable visible-window model with a retained control page and automatic closing of spawned PDF tabs after successful capture.
- Simplified browser behavior to use a visible Chromium window only for publisher automation, removing the earlier hidden-first/escalate-later split.
- Hardened the ScienceDirect browser state machine so that:
- `View PDF` is only clicked on ScienceDirect article landing pages
- PDF-flow pages such as `/pdfft`, `craft/capi/cfts/init`, and `pdf.sciencedirectassets.com` are observed rather than recursively re-opened
- actual PDF capture happens only after the flow reaches a real PDF URL/content state
- Updated `grados-config.example.json` with debug controls and browser-session reuse options.
- Updated the browser configuration model so GRaDOS prefers its own managed Chrome/profile first, then falls back to configured or system Chromium browsers.
- Updated setup documentation in both READMEs so browser bootstrap is part of the normal installation flow.
- Updated browser-install defaults to favor a single GRaDOS-managed data root.

Fixed
- Fixed duplicate ScienceDirect PDF-tab openings caused by racing the explicit `View PDF` click path against a second candidate-link fallback.
- Fixed earlier Elsevier fallback behavior so metadata signals such as `openaccess`, `pii`, `eid`, and `scidir` are retained when full text is unavailable.
- Preserved compatibility of the AIP browser flow after introducing ScienceDirect-specific browser hardening.
- Fixed managed-browser resolution so `preferManagedBrowser=true` now genuinely prefers the GRaDOS-managed Chrome runtime before any configured executable path.

Removed
- Removed the experimental Privacy Pass integration from browser bootstrap, runtime launch, configuration, and documentation after it proved too inconsistent to justify keeping it in the main product flow.

Docs
- Documented the latest managed-browser findings, including the dedicated GRaDOS browser/profile direction and the removal of the experimental Privacy Pass route.
- Recorded the Python packaging direction inspired by `zotero-mcp`: Python package distribution, optional extras, and setup/bootstrap commands for heavyweight managed assets.
- Clarified the intended managed runtime layout so browser binaries and profiles can move into stable GRaDOS-controlled data directories.
- Consolidated Python-migration documentation roles:
- `grados-python-implementation-plan.md` is now the authoritative engineering plan / completion ledger
- `TODO.md` is the concise execution snapshot
- `grados-python-migration-plan.md`, `status.md`, `docs/claude-code-plugin-guide.md`, and `docs/global-install-guide.md` are retained as historical references

0.6.4

Fixed
- Supported project-scoped Marker installs so GRaDOS can discover and use Marker workers inside the active project layout.

Page 1 of 4

© 2026 Safety CLI Cybersecurity Inc. All Rights Reserved.