Product Research Enterprise Plans Docs

Openscenesense-ollama

Latest version: v1.0.2

Safety actively analyzes 723650 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

1.0.2

Fixed the import errors

1.0.1

Fixed default model bug and updated the version

1.0.0

I'm excited to announce the initial release of OpenSceneSense Ollama, a powerful Python package for local video analysis using Ollama's models!

🌟 Major Features

Local Video Analysis
- **Frame Analysis Engine** powered by Ollama's vision models
- **Audio Transcription** using local Whisper models
- **Dynamic Frame Selection** for optimal scene coverage
- **Comprehensive Video Summaries** integrating visual and audio elements
- **Metadata Extraction** for detailed video information

Privacy & Control
- 🔒 Fully local processing - no cloud dependencies
- 🛠️ Customizable analysis pipelines
- 💪 GPU acceleration support
- 🎯 Fine-tuning capabilities for specific use cases

⚙️ Technical Features

Core Components
- Modular architecture supporting custom components
- Flexible frame selection strategies
- Configurable model selection for different analysis tasks
- Extensible prompt system for customized analysis

Performance
- Optimized frame processing pipeline
- GPU acceleration support with CUDA 12.1
- Memory-efficient frame selection
- Configurable processing parameters

Integration
- FFmpeg integration for robust video handling
- PyTorch backend for ML operations
- Whisper integration for audio processing
- Compatible with all Ollama vision models

📋 Requirements

Minimum Requirements
- Python 3.10+
- FFmpeg
- Ollama installed and running
- 8GB RAM
- 4GB storage space

Recommended Specifications
- NVIDIA GPU with CUDA 12.1+
- 16GB RAM
- SSD storage
- 8-core CPU

🛠️ Configuration Options

Models
- Support for multiple Ollama vision models:
- llava (default)
- minicpm-v
- bakllava
- Configurable summary models:
- llama3.2
- mistral
- claude-3-haiku (default)

Frame Selection
- Adjustable frame rate (default: 4.0 fps)
- Min frames: 8 (configurable)
- Max frames: 64 (configurable)
- Multiple selection strategies:
- Dynamic (scene-aware)
- Uniform
- Content-aware

Audio Processing
- Whisper model selection
- GPU acceleration support
- Multiple output formats
- Timestamp alignment

🔧 API Improvements

New Classes
- `OllamaVideoAnalyzer`: Main analysis pipeline
- `WhisperTranscriber`: Audio processing
- `DynamicFrameSelector`: Smart frame selection
- `AnalysisPrompts`: Customizable prompts

Enhanced Configuration
- Flexible host configuration
- Custom frame processors
- Configurable logging levels
- Modular component architecture

📝 Documentation

- Comprehensive README
- Detailed API documentation
- Example scripts and notebooks
- Configuration guides
- Best practices documentation

🐛 Known Issues

1. High memory usage with large frame counts
2. Potential GPU memory issues with 4GB cards
3. Limited support for some video codecs

🚀 Next Steps

We're already working on:
1. Memory optimization
2. Additional frame selection strategies
3. Enhanced error handling
4. More example notebooks
5. Performance improvements

🙏 Acknowledgments

Special thanks to:
- The Ollama team for their amazing models
- OpenAI for Whisper
- The open-source community for valuable feedback

📦 Installation

bash
pip install openscenesense-ollama

🔗 Links

- [Examples](https://github.com/ymrohit/openscenesense-ollama/tree/master/Examples)
- [Issue Tracker](https://github.com/yourusername/openscenesense-ollama/issues)

📄 License

MIT License - See LICENSE file for details

Releases

Has known vulnerabilities