Readium

Latest version: v0.3.1

Safety actively analyzes 722491 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

0.3.1

Fixing dependencies

0.3.0

🚀 New Features

Web Page to Markdown Conversion
Readium now supports direct web page content extraction and conversion to Markdown! This exciting update allows users to:

- 🔗 Convert web pages to clean, readable Markdown
- 🛠 Process URLs alongside local directories and repositories
- 🎛 Configure content extraction with flexible modes:
- `clean`: Extract only main content (default)
- `full`: Preserve most page content

Key Enhancements

- **Powerful Web Scraping**: Leveraging Trafilatura for intelligent content extraction
- **Configurable Processing**:
- Control table, image, and link inclusion
- Choose between focused and comprehensive extraction modes
- **Seamless Integration**: New functionality works alongside existing Readium features

🛠 CLI and API Updates

Command Line Examples
bash
Convert webpage to Markdown
readium https://example.com/docs

Full content mode
readium https://example.com/docs --url-mode full

Save to specific output file
readium https://example.com/docs -o webpage.md


Python API
python
from readium import Readium, ReadConfig

config = ReadConfig(
url_mode='clean', 'clean' or 'full'
include_tables=True,
include_images=True
)

reader = Readium(config)
summary, tree, content = reader.read_docs('https://example.com/docs')


🔍 Processing Modes

- **Clean Mode (Default)**:
- Focuses on main content
- Removes menus, ads, and navigation elements
- Ideal for documentation and technical content

- **Full Mode**:
- Preserves more page structure
- Includes additional elements
- Useful for comprehensive content capture

📦 Dependencies

- Added [[Trafilatura](https://github.com/adbar/trafilatura)](https://github.com/adbar/trafilatura) for intelligent web content extraction

🔒 Compatibility

- Python 3.10-3.12
- Minimal impact on existing Readium workflows
- Optional web processing functionality

0.2.0

New Features
🌿 Git Branch Selection
Now you can analyze documentation from specific Git branches using the new `-b/--branch` option.

bash
Analyze a specific branch
readium https://github.com/username/repo -b feature-branch

Analyze a private repository's branch
readium https://tokengithub.com/username/repo -b develop


Python API Support
python
reader = Readium(config)
summary, tree, content = reader.read_docs(
'https://github.com/username/repo',
branch='feature-branch'
)

0.1.3

Release Notes: Enhanced Dependency Management and Error Handling

🚀 **New Features and Enhancements**

1. **Dependencies and Configuration Updates**
- **Workflow Improvements**:
- Updated `.github/workflows/test.yml` to use `pip install ".[dev]"`, streamlining the installation of development dependencies.
- Retained `pytest` execution with `-p no:warnings` for cleaner test output.
- **Dependency Management**:
- Moved and separated dependencies into:
- `[tool.poetry.dependencies]` for main dependencies.
- `[tool.poetry.group.dev.dependencies]` for development-specific dependencies.
- Adjusted dependencies like `black`, `isort`, `mypy`, `pypdf`, and others for better organization.
- **Configuration Enhancements**:
- Added `isort` configuration in `pyproject.toml` for consistent import sorting across the project.

2. **Code Enhancements**
- **Error Handling**:
- Introduced a `print_error` function in `error_handling.py` for safer error handling with fallback support for unprintable content.
- Integrated `print_error` across various modules for consistent error handling.
- **CLI Improvements**:
- Added detailed help text, examples, and enhanced the description of the `output` option in `cli.py`.
- Improved error handling for unprintable content in CLI outputs.
- **Core Enhancements**:
- Refined type hinting in `core.py` with `overload` and more specific annotations for improved code clarity and safety.
- Enhanced debug logging and error handling during file processing.

3. **Testing**
- **New Unit Tests**:
- `test_cli.py`: Validated CLI help text, examples, and the functionality of the `output` option.
- `test_error_handling.py`: Tested the `print_error` function under various scenarios (e.g., normal text, rich markup, fallback support).
- **Test Updates**:
- Updated `test_basic.py` by removing obsolete comments for better readability and relevance.

📋 **Key Benefits**
- Streamlined **dependency management** for clearer separation between main and development requirements.
- Improved **error handling** mechanisms ensure safer and more robust handling of edge cases.
- Enhanced **developer experience** with better documentation, consistent configurations, and comprehensive testing.
- **User experience** improvements through enriched CLI help text and more intuitive output options.

0.1.1

0.1.0

Initial release of Readium, a documentation extraction and analysis tool.

What's New

Core Features
- Documentation extraction from local directories and Git repositories
- Support for multiple document formats through MarkItDown integration
- Configurable file processing with size limits and exclusion patterns
- Debug mode for detailed processing information

File Support
- Documentation: `.md`, `.mdx`, `.rst`, `.txt`
- Office documents (via MarkItDown): `.pdf`, `.docx`, `.xlsx`, `.pptx`
- Source code: Multiple programming languages supported
- Configuration: `.yml`, `.toml`, `.json`, etc.

Command Line Interface
- Basic directory/repository processing
- Output file generation
- Configurable options for processing control
- Debug mode support

Python API
- `ReadConfig` class for flexible configuration
- `Readium` class for programmatic access
- Integration with MarkItDown for document conversion

Installation

bash
pip install readium


Known Issues
- Binary files are excluded by default unless processed through MarkItDown
- Git repository processing requires git to be installed

Dependencies
- Python ≥ 3.10
- Required packages: click, rich, markitdown, black, isort

Links

Releases

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.