I'm excited to announce the initial release of OpenSceneSense Ollama, a powerful Python package for local video analysis using Ollama's models!
🌟 Major Features
Local Video Analysis
- **Frame Analysis Engine** powered by Ollama's vision models
- **Audio Transcription** using local Whisper models
- **Dynamic Frame Selection** for optimal scene coverage
- **Comprehensive Video Summaries** integrating visual and audio elements
- **Metadata Extraction** for detailed video information
Privacy & Control
- 🔒 Fully local processing - no cloud dependencies
- 🛠️ Customizable analysis pipelines
- 💪 GPU acceleration support
- 🎯 Fine-tuning capabilities for specific use cases
⚙️ Technical Features
Core Components
- Modular architecture supporting custom components
- Flexible frame selection strategies
- Configurable model selection for different analysis tasks
- Extensible prompt system for customized analysis
Performance
- Optimized frame processing pipeline
- GPU acceleration support with CUDA 12.1
- Memory-efficient frame selection
- Configurable processing parameters
Integration
- FFmpeg integration for robust video handling
- PyTorch backend for ML operations
- Whisper integration for audio processing
- Compatible with all Ollama vision models
📋 Requirements
Minimum Requirements
- Python 3.10+
- FFmpeg
- Ollama installed and running
- 8GB RAM
- 4GB storage space
Recommended Specifications
- NVIDIA GPU with CUDA 12.1+
- 16GB RAM
- SSD storage
- 8-core CPU
🛠️ Configuration Options
Models
- Support for multiple Ollama vision models:
- llava (default)
- minicpm-v
- bakllava
- Configurable summary models:
- llama3.2
- mistral
- claude-3-haiku (default)
Frame Selection
- Adjustable frame rate (default: 4.0 fps)
- Min frames: 8 (configurable)
- Max frames: 64 (configurable)
- Multiple selection strategies:
- Dynamic (scene-aware)
- Uniform
- Content-aware
Audio Processing
- Whisper model selection
- GPU acceleration support
- Multiple output formats
- Timestamp alignment
🔧 API Improvements
New Classes
- `OllamaVideoAnalyzer`: Main analysis pipeline
- `WhisperTranscriber`: Audio processing
- `DynamicFrameSelector`: Smart frame selection
- `AnalysisPrompts`: Customizable prompts
Enhanced Configuration
- Flexible host configuration
- Custom frame processors
- Configurable logging levels
- Modular component architecture
📝 Documentation
- Comprehensive README
- Detailed API documentation
- Example scripts and notebooks
- Configuration guides
- Best practices documentation
🐛 Known Issues
1. High memory usage with large frame counts
2. Potential GPU memory issues with 4GB cards
3. Limited support for some video codecs
🚀 Next Steps
We're already working on:
1. Memory optimization
2. Additional frame selection strategies
3. Enhanced error handling
4. More example notebooks
5. Performance improvements
🙏 Acknowledgments
Special thanks to:
- The Ollama team for their amazing models
- OpenAI for Whisper
- The open-source community for valuable feedback
📦 Installation
bash
pip install openscenesense-ollama
🔗 Links
- [Examples](https://github.com/ymrohit/openscenesense-ollama/tree/master/Examples)
- [Issue Tracker](https://github.com/yourusername/openscenesense-ollama/issues)
📄 License
MIT License - See LICENSE file for details