Gittxt

Latest version: v1.5.0

Safety actively analyzes 722491 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

1.5.0

> πŸš€ **LLM Dataset Extractor from GitHub Repos** | AI & NLP-ready text pipelines

πŸ“ Gittxt: Get text from Git repositories in AI-ready formats.

[![Python Version](https://img.shields.io/badge/python-β‰₯3.8-blue)](pyproject.toml)
[![PyPI version](https://badge.fury.io/py/gittxt.svg)](https://pypi.org/project/gittxt/)
[![Release](https://img.shields.io/github/release/sandy-sp/gittxt.svg)](https://github.com/sandy-sp/gittxt/releases)
[![Tested with Pytest](https://img.shields.io/badge/tested%20with-pytest-9cf.svg)](https://docs.pytest.org/en/stable/)
[![PyPI Downloads](https://img.shields.io/pypi/dm/gittxt)](https://pypi.org/project/gittxt/)
![GitHub repo size](https://img.shields.io/github/repo-size/sandy-sp/gittxt)
![GitHub top language](https://img.shields.io/github/languages/top/sandy-sp/gittxt)
[![Build Status](https://github.com/sandy-sp/gittxt/actions/workflows/release.yml/badge.svg)](https://github.com/sandy-sp/gittxt/actions)
[![Made for LLMs](https://img.shields.io/badge/LLM%20ready-Yes-brightgreen)](https://github.com/sandy-sp/gittxt)
[![Linted with Ruff](https://img.shields.io/badge/linter-ruff-%23007ACC.svg)](https://github.com/charliermarsh/ruff)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

---

✨ What is Gittxt?

**Gittxt** is a developer-focused CLI tool that extracts AI-ready text from **Git repositories**. Whether you're preparing datasets for **AI models**, **NLP pipelines**, or **LLM fine-tuning**, Gittxt automates the tedious task of repository scanning and text conversion.

Built with speed, flexibility, and modularity in mind, Gittxt is ideal for:
- Preparing **training data for LLMs** (e.g., ChatGPT, Claude, Mistral)
- **Documentation extraction** for knowledge bases
- **Code summarization** pipelines
- **Repository analysis** for machine learning workflows

---

πŸš€ Features

- βœ… **Dynamic File-Type Filtering** (`--file-types=code,docs,images,csv,media,all`)
- βœ… **Automatic Tree Generation** with clean filtering (excludes `.git/`, `__pycache__`, etc.)
- βœ… **Multiple Output Formats**: TXT, JSON, Markdown
- βœ… **Optional ZIP Packaging** for non-text assets
- βœ… **CLI-friendly Progress Bars**
- βœ… **Built-in Summary Reports** (`--summary`)
- βœ… **Interactive & CI-ready Modes** (`--non-interactive`)

---

πŸ—οΈ Installation

πŸ“¦ Using Poetry
bash
git clone https://github.com/sandy-sp/gittxt.git
cd gittxt
poetry install
poetry run gittxt install


🐍 Using pip (stable)
bash
pip install gittxt


---

βš™οΈ Quickstart Example

bash
gittxt scan https://github.com/sandy-sp/gittxt.git --output-format txt,json --file-types code,docs --summary


πŸ‘‰ This will:
- Scan a GitHub repository
- Extract code & docs files
- Output `.txt` + `.json` summaries
- Show a summary report

---

πŸ–₯️ CLI Usage

bash
gittxt scan [REPOS]... [OPTIONS]

Options:
--include TEXT Include patterns (e.g., *.py)
--exclude TEXT Exclude patterns (e.g., tests/, node_modules)
--size-limit INTEGER Max file size in bytes
--branch TEXT Specify branch (for GitHub URLs)
--file-types TEXT code, docs, images, csv, media, all
--output-format TEXT txt, json, md, or comma-separated list
--output-dir PATH Custom output directory
--summary Show post-scan summary
--non-interactive Skip prompts for CI/CD workflows
--progress Enable scan progress bars
--debug Enable debug logs
--help Show this message and exit


---

πŸ“‚ Output Structure


<output_dir>/
β”œβ”€β”€ text/
β”‚ └── repo-name.txt
β”œβ”€β”€ json/
β”‚ └── repo-name.json
β”œβ”€β”€ md/
β”‚ └── repo-name.md
└── zips/
└── repo-name_bundle.zip Optional ZIP for assets (images, csv, etc.)


---

πŸ›  How It Works

1. πŸ”— Clone GitHub/local repo (supports branch/subdir URLs)
2. 🌳 Dynamically generate directory tree (excluding `.git`, `__pycache__`, etc.)
3. πŸ—‚οΈ Filter files based on type (code, docs, csv, media)
4. πŸ“ Generate formatted outputs (TXT, JSON, MD)
5. πŸ“¦ Package assets (optional ZIP for non-text)
6. 🧹 Cleanup temporary files (cache-free design)

---

πŸ“Š Example Summary Output


πŸ“Š Summary Report:
- Total files processed: 45
- Output formats: txt, json
- File type breakdown: {'code': 31, 'docs': 14}


---

πŸ” Security Policy
Please report security issues to: **sandeep.paidipatigmail.com**
[View Security Policy](docs/SECURITY.md)

---

🀝 Contributing
We welcome community contributions!
- [Contributing Guidelines](docs/CONTRIBUTING.md)
- [Code of Conduct](docs/CODE_OF_CONDUCT.md)
- [Open an Issue](https://github.com/sandy-sp/gittxt/issues/new/choose)

---

πŸ›£οΈ Roadmap
- FastAPI-powered web UI
- AI-powered summaries (GPT/OpenAI integration)
- Support YAML/CSV as additional output formats
- Async file scanning (speed boost)

---

πŸ“„ License
MIT License Β© [Sandeep Paidipati](https://github.com/sandy-sp)

---

Gittxt β€” **β€œGittxt: Get text from Git repositories in AI-ready formats.”**

---

1.4.1

First-Time Setup (Interactive)
After installing, run:
bash
gittxt install

This command will prompt you to configure:
- Your default output directory (automatically set based on your OS, e.g., `~/Gittxt/` on Linux/Mac)
- Logging level and file logging preferences

---

πŸ“Œ How to Use Gittxt

1. Scanning Repositories
Use the `scan` subcommand to extract text and generate outputs.

Scan a Local Repository
bash
gittxt scan .

Extracts all readable text into the default output directories.

Scan a Remote GitHub Repository
bash
gittxt scan https://github.com/sandy-sp/sandy-sp

Automatically clones the repository, scans it, and extracts text.

Scan Multiple Repositories with Advanced Options
bash
gittxt scan /path/to/repo1 https://github.com/user/repo2 --output-format txt,json --docs-only --auto-filter --summary


---

πŸ”§ CLI Options

| Option | Description |
|--------------------------|---------------------------------------------------------------------------|
| `--include` | Include only files matching these patterns. |
| `--exclude` | Exclude files matching these patterns. |
| `--size-limit` | Exclude files larger than the specified size (in bytes). |
| `--branch` | Specify a Git branch (for remote repositories). |
| `--output-dir` | Override the default output directory. |
| `--output-format` | Comma-separated list of output formats (e.g., `txt,json,md`). |
| `--max-lines` | Limit the number of lines per file. |
| `--summary` | Display a summary report after scanning. |
| `--debug` | Enable debug mode for detailed logging. |
| `--docs-only` | Only extract documentation files (e.g., README, docs folder). |
| `--auto-filter` | Automatically skip common unwanted or binary files. |

---

πŸ“„ Output Formats

- **TXT:** Simple text extraction for AI chat and quick analysis.
- **JSON:** Structured output ideal for LLM training and data preprocessing.
- **Markdown (MD):** Neatly formatted documentation for GitHub or project READMEs.

When specifying multiple formats (e.g., `--output-format txt,json`), Gittxt generates separate files in their respective output directories.

---

πŸ—‚ Directory Structure

By default, outputs are stored in your configured output directory, which is organized as follows:

<output_dir>/
β”œβ”€β”€ text/ Plain text outputs (.txt)
β”œβ”€β”€ json/ JSON outputs (.json)
β”œβ”€β”€ md/ Markdown outputs (.md)
└── cache/ Caching for incremental scans


---

βš™οΈ Configuration

Gittxt uses a configuration file (`gittxt-config.json`) to store user preferences. You can update this configuration via the interactive install command:
bash
gittxt install

Or edit the file manually. Key settings include:
- **Output Directory:** Auto-determined based on your OS (e.g., `~/Gittxt/`).
- **Logging Options:** Logging level and file logging preferences.
- **Filtering Options:** Include/exclude patterns, file size limits, etc.

---

πŸ“Œ Contribute & Develop

1. **Run Tests:**
bash
pytest tests/

2. **Format Code:**
bash
black src/

3. **Submit a PR:**
- Fork the repo.
- Create a new branch (e.g., `feature/my-change`).
- Push your changes.
- Submit a PR.

For more details, see the [Contributing Guide](CONTRIBUTING.md).

---

πŸ’‘ Future Roadmap

Our future plans include enhancements to the user interface and further AI-based features. We’re working on a lightweight web-based UI and additional improvements that streamline repository analysis and documentation extraction.

---

πŸ“œ License

Gittxt is licensed under the **MIT License**.

---

**Made by [Sandeep Paidipati](https://github.com/sandy-sp)**
πŸš€ **Gittxt: Get Text of Your Repo for AI, LLMs & Docs!**

---

1.4.0

First-Time Setup (Interactive)
After installing, run:
bash
gittxt install

This command will prompt you to configure:
- Your default output directory (automatically set based on your OS, e.g., `~/Gittxt/` on Linux/Mac)
- Logging level and file logging preferences

---

πŸ“Œ How to Use Gittxt

1. Scanning Repositories
Use the `scan` subcommand to extract text and generate outputs.

Scan a Local Repository
bash
gittxt scan .

Extracts all readable text into the default output directories.

Scan a Remote GitHub Repository
bash
gittxt scan https://github.com/sandy-sp/sandy-sp

Automatically clones the repository, scans it, and extracts text.

Scan Multiple Repositories with Advanced Options
bash
gittxt scan /path/to/repo1 https://github.com/user/repo2 --output-format txt,json --docs-only --auto-filter --summary


---

πŸ”§ CLI Options

| Option | Description |
|--------------------------|---------------------------------------------------------------------------|
| `--include` | Include only files matching these patterns. |
| `--exclude` | Exclude files matching these patterns. |
| `--size-limit` | Exclude files larger than the specified size (in bytes). |
| `--branch` | Specify a Git branch (for remote repositories). |
| `--output-dir` | Override the default output directory. |
| `--output-format` | Comma-separated list of output formats (e.g., `txt,json,md`). |
| `--max-lines` | Limit the number of lines per file. |
| `--summary` | Display a summary report after scanning. |
| `--debug` | Enable debug mode for detailed logging. |
| `--docs-only` | Only extract documentation files (e.g., README, docs folder). |
| `--auto-filter` | Automatically skip common unwanted or binary files. |

---

πŸ“„ Output Formats

- **TXT:** Simple text extraction for AI chat and quick analysis.
- **JSON:** Structured output ideal for LLM training and data preprocessing.
- **Markdown (MD):** Neatly formatted documentation for GitHub or project READMEs.

When specifying multiple formats (e.g., `--output-format txt,json`), Gittxt generates separate files in their respective output directories.

---

πŸ—‚ Directory Structure

By default, outputs are stored in your configured output directory, which is organized as follows:

<output_dir>/
β”œβ”€β”€ text/ Plain text outputs (.txt)
β”œβ”€β”€ json/ JSON outputs (.json)
β”œβ”€β”€ md/ Markdown outputs (.md)
└── cache/ Caching for incremental scans


---

βš™οΈ Configuration

Gittxt uses a configuration file (`gittxt-config.json`) to store user preferences. You can update this configuration via the interactive install command:
bash
gittxt install

Or edit the file manually. Key settings include:
- **Output Directory:** Auto-determined based on your OS (e.g., `~/Gittxt/`).
- **Logging Options:** Logging level and file logging preferences.
- **Filtering Options:** Include/exclude patterns, file size limits, etc.

---

πŸ“Œ Contribute & Develop

1. **Run Tests:**
bash
pytest tests/

2. **Format Code:**
bash
black src/

3. **Submit a PR:**
- Fork the repo.
- Create a new branch (e.g., `feature/my-change`).
- Push your changes.
- Submit a PR.

For more details, see the [Contributing Guide](CONTRIBUTING.md).

---

πŸ’‘ Future Roadmap

Our future plans include enhancements to the user interface and further AI-based features. We’re working on a lightweight web-based UI and additional improvements that streamline repository analysis and documentation extraction.

---

πŸ“œ License

Gittxt is licensed under the **MIT License**.

---

**Made by [Sandeep Paidipati](https://github.com/sandy-sp)**
πŸš€ **Gittxt: Get Text of Your Repo for AI, LLMs & Docs!**

---

1.3.1

First-Time Setup (Interactive)
After installing, run:
bash
gittxt install

This command will prompt you to configure:
- Your default output directory (automatically set based on your OS, e.g., `~/Gittxt/` on Linux/Mac)
- Logging level and file logging preferences

---

πŸ“Œ How to Use Gittxt

1. Scanning Repositories
Use the `scan` subcommand to extract text and generate outputs.

Scan a Local Repository
bash
gittxt scan .

Extracts all readable text into the default output directories.

Scan a Remote GitHub Repository
bash
gittxt scan https://github.com/sandy-sp/sandy-sp

Automatically clones the repository, scans it, and extracts text.

Scan Multiple Repositories with Advanced Options
bash
gittxt scan /path/to/repo1 https://github.com/user/repo2 --output-format txt,json --docs-only --auto-filter --summary


---

πŸ”§ CLI Options

| Option | Description |
|--------------------------|---------------------------------------------------------------------------|
| `--include` | Include only files matching these patterns. |
| `--exclude` | Exclude files matching these patterns. |
| `--size-limit` | Exclude files larger than the specified size (in bytes). |
| `--branch` | Specify a Git branch (for remote repositories). |
| `--output-dir` | Override the default output directory. |
| `--output-format` | Comma-separated list of output formats (e.g., `txt,json,md`). |
| `--max-lines` | Limit the number of lines per file. |
| `--summary` | Display a summary report after scanning. |
| `--debug` | Enable debug mode for detailed logging. |
| `--docs-only` | Only extract documentation files (e.g., README, docs folder). |
| `--auto-filter` | Automatically skip common unwanted or binary files. |

---

πŸ“„ Output Formats

- **TXT:** Simple text extraction for AI chat and quick analysis.
- **JSON:** Structured output ideal for LLM training and data preprocessing.
- **Markdown (MD):** Neatly formatted documentation for GitHub or project READMEs.

When specifying multiple formats (e.g., `--output-format txt,json`), Gittxt generates separate files in their respective output directories.

---

πŸ—‚ Directory Structure

By default, outputs are stored in your configured output directory, which is organized as follows:

<output_dir>/
β”œβ”€β”€ text/ Plain text outputs (.txt)
β”œβ”€β”€ json/ JSON outputs (.json)
β”œβ”€β”€ md/ Markdown outputs (.md)
└── cache/ Caching for incremental scans


---

βš™οΈ Configuration

Gittxt uses a configuration file (`gittxt-config.json`) to store user preferences. You can update this configuration via the interactive install command:
bash
gittxt install

Or edit the file manually. Key settings include:
- **Output Directory:** Auto-determined based on your OS (e.g., `~/Gittxt/`).
- **Logging Options:** Logging level and file logging preferences.
- **Filtering Options:** Include/exclude patterns, file size limits, etc.

---

πŸ“Œ Contribute & Develop

1. **Run Tests:**
bash
pytest tests/

2. **Format Code:**
bash
black src/

3. **Submit a PR:**
- Fork the repo.
- Create a new branch (e.g., `feature/my-change`).
- Push your changes.
- Submit a PR.

For more details, see the [Contributing Guide](CONTRIBUTING.md).

---

πŸ’‘ Future Roadmap

Our future plans include enhancements to the user interface and further AI-based features. We’re working on a lightweight web-based UI and additional improvements that streamline repository analysis and documentation extraction.

---

πŸ“œ License

Gittxt is licensed under the **MIT License**.

---

**Made by [Sandeep Paidipati](https://github.com/sandy-sp)**
πŸš€ **Gittxt: Get Text of Your Repo for AI, LLMs & Docs!**

---

1.3.0

First-Time Setup (Interactive)
After installing, run:
bash
gittxt install

This command will prompt you to configure:
- Your default output directory (automatically set based on your OS, e.g., `~/Gittxt/` on Linux/Mac)
- Logging level and file logging preferences

---

πŸ“Œ How to Use Gittxt

1. Scanning Repositories
Use the `scan` subcommand to extract text and generate outputs.

Scan a Local Repository
bash
gittxt scan .

Extracts all readable text into the default output directories.

Scan a Remote GitHub Repository
bash
gittxt scan https://github.com/sandy-sp/sandy-sp

Automatically clones the repository, scans it, and extracts text.

Scan Multiple Repositories with Advanced Options
bash
gittxt scan /path/to/repo1 https://github.com/user/repo2 --output-format txt,json --docs-only --auto-filter --summary


---

πŸ”§ CLI Options

| Option | Description |
|--------------------------|---------------------------------------------------------------------------|
| `--include` | Include only files matching these patterns. |
| `--exclude` | Exclude files matching these patterns. |
| `--size-limit` | Exclude files larger than the specified size (in bytes). |
| `--branch` | Specify a Git branch (for remote repositories). |
| `--output-dir` | Override the default output directory. |
| `--output-format` | Comma-separated list of output formats (e.g., `txt,json,md`). |
| `--max-lines` | Limit the number of lines per file. |
| `--summary` | Display a summary report after scanning. |
| `--debug` | Enable debug mode for detailed logging. |
| `--docs-only` | Only extract documentation files (e.g., README, docs folder). |
| `--auto-filter` | Automatically skip common unwanted or binary files. |

---

πŸ“„ Output Formats

- **TXT:** Simple text extraction for AI chat and quick analysis.
- **JSON:** Structured output ideal for LLM training and data preprocessing.
- **Markdown (MD):** Neatly formatted documentation for GitHub or project READMEs.

When specifying multiple formats (e.g., `--output-format txt,json`), Gittxt generates separate files in their respective output directories.

---

πŸ—‚ Directory Structure

By default, outputs are stored in your configured output directory, which is organized as follows:

<output_dir>/
β”œβ”€β”€ text/ Plain text outputs (.txt)
β”œβ”€β”€ json/ JSON outputs (.json)
β”œβ”€β”€ md/ Markdown outputs (.md)
└── cache/ Caching for incremental scans


---

βš™οΈ Configuration

Gittxt uses a configuration file (`gittxt-config.json`) to store user preferences. You can update this configuration via the interactive install command:
bash
gittxt install

Or edit the file manually. Key settings include:
- **Output Directory:** Auto-determined based on your OS (e.g., `~/Gittxt/`).
- **Logging Options:** Logging level and file logging preferences.
- **Filtering Options:** Include/exclude patterns, file size limits, etc.

---

πŸ“Œ Contribute & Develop

1. **Run Tests:**
bash
pytest tests/

2. **Format Code:**
bash
black src/

3. **Submit a PR:**
- Fork the repo.
- Create a new branch (e.g., `feature/my-change`).
- Push your changes.
- Submit a PR.

For more details, see the [Contributing Guide](CONTRIBUTING.md).

---

πŸ’‘ Future Roadmap

Our future plans include enhancements to the user interface and further AI-based features. We’re working on a lightweight web-based UI and additional improvements that streamline repository analysis and documentation extraction.

---

πŸ“œ License

Gittxt is licensed under the **MIT License**.

---

**Made by [Sandeep Paidipati](https://github.com/sandy-sp)**
πŸš€ **Gittxt: Get Text of Your Repo for AI, LLMs & Docs!**

---

1.2.1

πŸš€ Gittxt: Get Text of Your Repo for AI, LLMs & Docs!

**Gittxt** is a **lightweight CLI tool** that extracts text from **Git repositories** and formats it into **AI-friendly outputs** (`.txt`, `.json`, `.md`).
Whether you’re using **ChatGPT, Grok, or Ollama**, or any LLM, Gittxt helps process repositories for insights, training, and documentation.

✨ Why Use Gittxt?
βœ… **Extract Readable Text from Git Repos**
βœ… **Convert Code & Docs into AI-Friendly Formats**
βœ… **Generate JSON for LLM Training** (Ideal for AI Preprocessing)
βœ… **Create Markdown Files for Documentation**
βœ… **Summarize & Analyze GitHub Repositories**

---

πŸ“Œ Installation (From PyPI)
bash
pip install gittxt

Verify installation:
bash
gittxt --help

Expected Output:

Usage: gittxt [OPTIONS] SOURCE
Options:
--include TEXT
--exclude TEXT
--size-limit INTEGER
--branch TEXT
--output-dir TEXT
--output-format [txt|json|md]
--max-lines INTEGER
--summary
--debug
--help Show this message and exit.


---

πŸ“Œ How to Use Gittxt

**1️⃣ Extract Text from a Local Repository**
bash
gittxt .

βœ… Extracts all readable text from your repo into **gittxt-outputs/text/**.

---

**2️⃣ Extract from a Remote GitHub Repo**
bash
gittxt https://github.com/sandy-sp/sandy-sp

βœ… Automatically clones the repo, scans it, and **extracts text**.

---

**3️⃣ Use AI-Friendly Output Formats**
**🧠 JSON (Best for AI & LLM Training)**
bash
gittxt . --output-format json --output repo_dump.json

**Why JSON?**
- **Perfect format for AI & LLMs** (GPT-4, Grok, LLaMA).
- **Prepares structured data for AI training**.
- **Can be used to fine-tune models with repository insights**.

**πŸ“œ TXT (For AI Chat & Analysis)**
bash
gittxt . --output-format txt --output repo_dump.txt

**Why TXT?**
- **Extracts pure text**, making it easy for AI-powered chat analysis.
- **Good for summarization and AI-assisted code review**.

**πŸ“ Markdown (Best for Documentation)**
bash
gittxt . --output-format md --output repo_dump.md

**Why Markdown?**
- **Great for GitHub docs & project READMEs**.
- **LLMs like ChatGPT use Markdown for structured responses**.
- **Retains headings, code snippets, and structure**.

---

**4️⃣ Get a Summary Report**
bash
gittxt . --summary

Example Output:

πŸ“Š Summary Report:
- Scanned 105 text files
- Total Size: 3.2 MB
- File Types: .py, .md, .txt
- Saved in: gittxt-outputs/text/repo_dump.txt

βœ… **Helps quickly analyze repositories for AI training**.

---

Page 1 of 2

Β© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.